Designing an evergreen cache

When ones app is challenged with poor performances, its easy to set up a cache in front of ones SQL database. It doesnt fix the root cause (e.g. bad schema design, bad SQL query, etc.) but it gets the job done. If the app is the only component that writes to the underlying database, its a no-brainer to update the cache accordingly, so the cache is always up-to-date with the data in the database.

Things start to go sour when the app is not the only component writing to the DB. Among other sources of writes, there are batches, other apps (shared databases exist unfortunately), etc. One might think about a couple of ways to keep data in sync i.e. polling the DB every now and then, DB triggers, etc. Unfortunately, they all have issues that make them unreliable and/or fragile.

You might have read about Change-Data-Capture before. Its been described by Martin Kleppmann as turning the database inside out: it means the DB can send change events (SELECT, DELETE and UPDATE) that one can register to. Just opposite to Event Sourcing that aggregates events to produce state, CDC is about getting events out of states. Once CDC is implemented, one can subscribe to its events and update the cache accordingly. However, CDC is quite in its early stage, and implementations are quite specific.

In this talk, Ill describe an easy-to-setup architecture that leverages CDC to have an evergreen cache.

Abstract

Nicolas Frankel