Snapshot

A snapshot pattern allows you to request data from a source system and optionally push the data into a target system. In its simplest form, this pattern is used to perform a copy of a source system's data in an intermediate staging environment and optionally push it into a target system as an initial load, or reload.

Pattern Overview

This pattern describes a full read from the source system with the intent to perform an initial load, a full reload, or a Change Capture resync (without pushing the data to the target system). In some cases, this pattern can be used to simply provide a copy of the source data from a non-relational system into another system that provides better data discovery and analysis, when ongoing replication of changes is not necessary.

DataZen Implementation

The snapshot pattern depends on a Job Reader to function with an optional Target, which varies based on the type of system; however, performing the snapshot operation only varies based on the intent of the operation.

Intent Implementation Operation
Copy data for future use Read all records from source without a target Run Job Reader (with no target)
Initial Load or
Reinitial target with source data
Read all records from source and push to target Reinitialize with Change Log
Reinitial CDC with Source Data Reinitializes a CDC table with source data Reinitialize without Change Log
Reinitial Target from Change Logs Reapply one ore more available Change Logs Replay Change Logs

Copy Data for Future Use

Regardless of the source, this operation reads all available records from the source system and creates a Change Log. This operation can be performed with a Job Reader only. If the job has a Target defined, create a copy of the job first and remove the target from the newly created job before running it. Running a Job Reader by itself creates a Change Log than can be inspected using DataZen Manager.

Initial Load

Performing an initial load on a target system depends on the job type you have created:

  • Job Reader: you cannot use a Job Reader to perform an initial load; however, you can reinitialize a Job Reader to recreate a full change log
  • Job Writer: you can create a Job Writer from an existing Job Reader (or by loading an existing Change Log), and run the job
  • Direct Job: running a direct job (both reader and writer) the first time performs an initial load, unless you choose to opt out
You can also perform a Reinitialize operation on a Direct Job to reload all available records in the target system.

Performing an initial load operation on a job that has CDC enabled creates a statefull intermediate state table used to detect future changes in the data

Reinitialize CDC with Source Data

Performing a CDC Reinitialization may be needed from time to time if the source data was modified but target systems should not be updated. For example, if a lastUpdatedDateTime field was modified on 1 million records on a source table, which would normally trigger the CDC to identify 1 million changes, but you do not want to trigger the replication for some valid reason (ex: due to performance concerns or the change was made manually on the target system already), you can reinitialize the CDC table only. Performing a CDC reinitialization does not create a change log.

Because reinitializing the CDC does not create a change log, there is a possibility that after performing this operation reinitializing a target system using existing change logs may not successfully recreate the target in full.

Reinitialize Target from Change Logs

In some cases, all the data is available in one or more change logs and a target system needs to be created (or a new target system needs to be loaded in full from available change logs). When replaying all change logs from a known/valid initial state will reload the target system to its last known state.