CDC Stream

A CDC Stream allows you to pull changes from a source system using the source system's native capabilities to identify changes. Most systems that provide a native CDC capability are database platforms, such as:

  • SQL Server CDC or Change Tracking
  • Snowflake CHANGES
  • Oracle CDC

Pattern Overview

This pattern describes a mechanism to extract changes from a source system directly, allowing the integration platform to rely of the source system's ability to identify changes in its data. Many relational databases support this capability, but not all. In some cases, administrators use a third-party library or tool to read from database logs directly.

While this pattern can be efficient, some limitations may apply depending on the source system in addition to potential increases in memory, processing, and storage requirements.

DataZen Implementation

DataZen supports this pattern natively for SQL Server, and can support additional relational databases that offer change tables that can be queries, as explained below.

Source System Implementation Comments
SQL Server Supports both CDC and Change Tracking Full support is provided
Snowflake Partial support for tables with CHANGE Tracking Requires querying Metadata fields and applying Synthetic CDC with a sliding window
Oracle Not tested Oracle provides both change tables and subcriptions to query changes
Other Platforms Not tested Other platforms may be supported but were not tested

SQL Server

DataZen fully supports SQL Server and Azure SQL Server. Use Enzo Manager to create a Job Reader and choose which CDC or Tracking Table you would like to inspect for changes. DataZen keeps an internal high watermark to return records that were newly captured automatically. In addition, DataZen removes the additional tracking metadata fields automatically.

Snowflake

Capturing changes with Snowflake is partially supported in DataZen when a Snowflake table has Change Tracking enabled and specific configuration settings are set on the job reader. A sliding window is required for this implementation.

Once the sliding window is implemented, the Synthetic CDC pattern is then applied. See the Synthetic CDC Pattern for more information.

  • SQL: The SQL command specifically references the change table and uses an offset:
    SELECT * FROM DATALAKE.TEST.CUSTOMER                          -- the source table (with Change capture enabled)
    CHANGES (INFORMATION => DEFAULT) AT(OFFSET => -60*60)         -- last 60 minutes of changes (sliding window)
    WHERE METADATA$ACTION='INSERT' AND METADATA$ISUPDATE=TRUE     -- keep updated records only
                        
  • Synthetic CDC: Synthetic CDC is engaged by specifying the CDC Keys using the primary keys of the source table
  • Top N/Limit Heuristics: this advanced setting is disabled on the job
  • Metadata columns: Metadata columns are removed from the data using the Source Data Pipeline