Skip to main content

Change Data Capture

It's a system design technique. You read the change logs from a database. You then publish an event for another system to consume.

cdc-outbox
database change log content

The database change logs hold the full data that was written to the database.

Outbox Pattern

The outbox pattern is a system design pattern. You store the request in a database table called the outbox. Another system polls this table. It then sends the data to a message queue or another service.

achieves atomicity

The main gain from this pattern is atomicity. Say one app writes to a database and a queue. It can't be atomic, since they aren't in one database transaction.

CDC extends the outbox pattern. Here the second system doesn't poll the database. Instead, it reads the insert logs for the outbox table from the change logs. It then writes them to the queues. This avoids constant polling of the table.

reading change logs

CDC tools work in one of two ways. They use the database's libraries to read from a log sequence. Or they use standard system calls to read the file from where they last stopped.

The second way is exactly how the tail command works. It knows the last offset from the file's start. Then it polls or waits for OS events to see if there is new data.

Separate Event Table

A separate event table is necessary for the following reasons:

  1. To avoid writing metadata to the main table to achieve separation of concerns.
  2. The event table can be purged to keep it small always.