Overview of Data Pipelines Integration Patterns
The patterns provided below provide support for batch, mini-batch (near-time), and real-time data integration processing. While they support virtually any enterprise integration architecture individually or in combination such as the Remote Site Replication architecture, using both mini-batch and real-time streaming data capture together can enable other well-known architectures such as the Lambda Architecture and the Kappa architecture.
The patterns presented here are divided in three main groups: stateless, stateful, and composite.
- Stateless: Pushes data from one system to another in its simplest form
- Stateful: Pushes data from one system to another selectively using a watermark or a synthetic change capture mechanism
- Composite: Combines two or more patterns previously defined, possibly mixing stateless and stateful patterns to provide the desired integration outcome
Stateless Patterns | Snapshot |
Captures the entire data set from the source system, with or without a filter Learn more... | |
Hook |
Forwards data that was received by a listener Learn more... | ||
CDC Stream |
Forwards data provided by a native CDC engine Learn more... | ||
Stateful Patterns | Watermark |
Forward-only read mechanism that keeps track of a high watermark value Learn more... | |
CDC |
Synthetic operation that reads source records and only pushes data that changed Learn more... | ||
Window |
Read operation that goes back in time to recapture previously captured data Learn more... | ||
Composite Patterns (partial list) |
Watermark + CDC |
Forward-only read mechanism that keeps track of a high watermark value and filters out records that were previously captured Learn more... | |
One-Way Sync |
Integration strategy that keeps a source system synchronized with one or more target systems Learn more... | ||
Two-Way Sync |
Integration strategy that keeps two or more systems synchronized, but data can be updated in any system | ||
Aggregation |
Centralizing data from multiple sources or multiple locations to build data lakes, data hubs, or enable
advanced analytics, reporting, or AI/ML scenarios Learn more... | ||