Watermark + CDC

The high watermark + CDC composite pattern combines the two stateful patterns into one. This pattern is similar to the window capture pattern in that the read operation may return duplicate records from time to time; however, this pattern is normally safer to implement because its use implies that the source system does have a high watermark, and as a result will never miss sending updates from the source system.

Generally, this pattern is used when the precision of the high watermark is higher than the supported filter on the source system. For example, if the milliseconds of a high watermark value are ignored by the source system, duplicate records may be returned.

This pattern supports the detection of deleted records.

Pattern Overview

This pattern describes a forward-only read from the source system using a high watermark value that may return duplicate data from time to time, and an additional CDC operation that filters out the duplicate data as a secondary step.

For example, if the source system returns an updatedDate field with a value of 2024-01-31T08:01:01.112 but the source filter can only specify a value as yyyy-MM-ddThh:nn:ss, hence dropping the milliseconds, it may be possible for the source request to return the same record multiple times.

The high watermark value usually works with datetime values, numeric fields, or timestamp bytes.

DataZen Implementation

Using a high watermark value varies depending on the source system; however, the Synthetic CDC setting is similar regardless of the source system.

See the Watermark Pattern for more information on how to implement it depending on the source system.

Once the high watermark configuration is implemented, the Synthetic CDC pattern is then applied. See the Change Capture Pattern for more information.