One-Way Sync

A one-way sync composite pattern implements a data replication topology that keeps two or more systems synchronized for inserts, updates, and optionally deletes. It assumes that one system is used as the System of Record (SoR) for the data in scope, and that the target systems will eventually receive and accept incoming changes.

This pattern allows great flexibility on the target system, including the ability to manually add orphan records and additional fields. These considerations may impact the implementation presented in this documentation in order to accommodate the desired result and normally result in treating the target system as a source itself in order to dynamically infer changes.

Pattern Overview

This pattern describes a mechanism to keep records between two or more systems synchronized, assuming one system is the System of Record (always containing the latest information). For example, if Quickbooks Online (QBO) is used to create invoices, and a relational database contains a copy of these invoices, the SoR is assumed to be QBO because any new invoice, or updates to invoices, can only happen in QBO.

Some integration scenarios can be more complex, when the target system can itself be used to modify other records. For example, if the Medius AP SaaS platform is used to approve or reject invoices created in QBO, the state of these invoices in QBO may need to be modified through additional integration pipelines.

DataZen Implementation

Implementing this pattern in DataZen is a multi-step process since it requires the use of both a reader and writer operation.

Step 1: Source Changes

First, you should decide which pattern you will be using when reading from the source system. The selection of the pattern depends on the capabilities of the source system and may be changed at virtually any time. In general, if the source system supports high watermarks, the Watermark pattern will be the most efficient.

CDC Stream
Watermark
Synthetic CDC
Window Capture
Watermark + CDC

Step 2: Target Changes

Next, you will need to decide whether changes from the source system can be sent directly to the target system "as-is", or if you need to store the data in a stage database first. A few scenarios may require staging the data first, including:

The target system may contain additional fields, and their values must be part of the Update operation to keep them intact
The target system may allow records to be deleted, and the records should be recreated

These scenarios describe a target system that could be changed at any time externally. This means that the internal state of the replication topology cannot represent the true delta between the two systems at any point in time. When this condition is true, a staging database is required, and both source and target systems must be replicated first into the staging database.

However, if the target system cannot be modified externally, you may be able to forward the changes directly into the target system without staging the data.

Step 3: Push Changes

The last step consists of pushing changes to the target system; the approach differs depending on whether Step 2 required the creation of a staging database or not.

Direct: a writer can be created as part of the same pipeline job
Staged: a seperate pipeline job must be created by first reading both staged tables

Direct Push

Pushing data directly from source to target is a trivial pattern requiring a single pipeline job with a Reader operation as discussed in Step 1, and a single Writer that pushes data into the target system as needed. In this example, a direct job is created that reads from MailChimp and pushes new and updated records into a relational database using a Change Capture pattern.

Staged Push

A staged push requires multiple pipelines in order to logically determine which records need to be pushed, and how.

In this diagram, the data from the source system is staged using the CDC pattern; the target system may also be staged as discussed in Step 2. When both source and target systems are staged, a simple logical SQL operation is then needed that joins both tables in order to determine which records are missing in the target system; this logical SQL operation essentially becomes a CDC Stream itself.