Snapshot
A snapshot pattern allows you to request data from a source system and optionally push the data into a target system. In its simplest form, this pattern is used to perform a copy of a source system's data in an intermediate staging environment and optionally push it into a target system as an initial load, or reload.
Pattern Overview
This pattern describes a full read from the source system with the intent to perform an initial load, a full reload, or a Change Capture resync (without pushing the data to the target system). In some cases, this pattern can be used to simply provide a copy of the source data from a non-relational system into another system that provides better data discovery and analysis, when ongoing replication of changes is not necessary.
DataZen Implementation
The snapshot pattern depends on a Job Reader to function with an optional Target, which varies based on the type of system; however, performing the snapshot operation only varies based on the intent of the operation.
Intent | Implementation | Operation |
---|---|---|
Copy data for future use | Read all records from source without a target | Run Job Reader (with no target) |
Initial Load or Reinitial target with source data |
Read all records from source and push to target | Reinitialize with Change Log |
Reinitial CDC with Source Data | Reinitializes a CDC table with source data | Reinitialize without Change Log |
Reinitial Target from Change Logs | Reapply one ore more available Change Logs | Replay Change Logs |
Copy Data for Future Use
Regardless of the source, this operation reads all available records from the source system and creates a Change Log. This operation can be performed with a Job Reader only. If the job has a Target defined, create a copy of the job first and remove the target from the newly created job before running it. Running a Job Reader by itself creates a Change Log than can be inspected using DataZen Manager.
Initial Load
Performing an initial load on a target system depends on the job type you have created:
- Job Reader: you cannot use a Job Reader to perform an initial load; however, you can reinitialize a Job Reader to recreate a full change log
- Job Writer: you can create a Job Writer from an existing Job Reader (or by loading an existing Change Log), and run the job
- Direct Job: running a direct job (both reader and writer) the first time performs an initial load, unless you choose to opt out
Performing an initial load operation on a job that has CDC enabled creates a stateful intermediate state table used to detect future changes in the data
Reinitialize CDC with Source Data
Performing a CDC Reinitialization may be needed from time to time if the source data was modified but target systems should not be updated. For example, if a lastUpdatedDateTime field was modified on 1 million records on a source table, which would normally trigger the CDC to identify 1 million changes, but you do not want to trigger the replication for some valid reason (ex: due to performance concerns or the change was made manually on the target system already), you can reinitialize the CDC table only. Performing a CDC reinitialization does not create a change log.
Because reinitializing the CDC does not create a change log, there is a possibility that after performing this operation reinitializing a target system using existing change logs may not successfully recreate the target in full.
Reinitialize Target from Change Logs
In some cases, all the data is available in one or more change logs and a target system needs to be created (or a new target system needs to be loaded in full from available change logs). When replaying all change logs from a known/valid initial state will reload the target system to its last known state.