CAPTURE
Overview
The capture operation allows you to persist the data set from the pipeline into a change log. Change logs can be created with a full data set or a differential data set from a previous pipeline execution. Write pipelines then play or replay change logs.
Change logs contain two data streams: an upsert and an optional delete stream. When an initialization on the source data is performed, or if synthetic change capture is not engaged, the upsert stream contains inserted records only.
Use the KEYS setting to enable Synthetic CDC. The KEYS setting allows you to specify one or more columns from the source pipeline identifying a unique record.
When the KEYS setting is used and the UPSERT option is provided, you can further limit which fields are used to detect changes using the OBSERVE setting. If not specified, all the fields from the source pipeline are checked for changes.
Syntax
Captures the data set of the pipeline and creates a change log with optional change data capture. Only one Capture operation is allowed in a pipeline.
CAPTURE -- optional change log name { '...' } -- Capture options { { INSERT } { UPDATE } { DELETE } ON KEYS '<field>,...' { OBSERVE '<field>,...' } { INTO [CONNECTION] '...' } { WITH { INIT_ON_FIRST_RUN } { IGNORE_DUPS } { PATH { [CONNECTION] } '...' } { CDCMARKER '...' } { ENCRYPT_CERT '...' } { SIGN_CERT '...' } } } ;
INSERT |
Limit the synthetic CDC to inserted records since the last capture |
UPDATE |
Limit the synthetic CDC to updated records since the last capture |
DELETE |
Limit the synthetic CDC to deleted records since the last capture (assumes the source returns all available records all the time) |
KEYS |
Engage synthetic CDC using the fields in a comma-separated list identifying a unique record |
OBSERVE |
Optional comma-separated list of fields to observe for changes in a synthetic CDC operation; if blank, all fields are observed |
INTO |
A custom client-provided connection and table to use to keep track of changes (must be a SQL Server connection) |
INIT_ON_FIRST_RUN |
Captures all available records in a change log the first time the operation is executed in a synthetic CDC operation |
IGNORE_DUPS |
Discard records that would otherwise be considered duplicates when using the KEYS provided in a synthetic CDC operation |
CDCMARKER |
The name of a field to add indicating which records are inserted (I) or updated (U) in a synthetic CDC operation |
PATH |
A local path or cloud drive or FTP site in which change logs will be stored |
ENCRYPT_CERT |
The thumbprint of an X.509 certificate used to encrypt the change log |
SIGN_CERT |
The thumbprint of an X.509 certificate used to sign the change log |
Example 1
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- We are not capture changes since we are not using the KEYS property, -- so the entire data set will be captured every time -- Also, since we are adding a file extension to the pipeline name, -- we are forcing the full name of the change log, so it will be -- overriden everytime the pipeline runs. CAPTURE 'myrssfeed.eds' WITH PATH [adls] '/logs';
Example 2
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- Capture new records only; ignore changes to previous records -- The first time this runs, all records will be captured -- Since there is no file extension, new change logs will be created -- every time using a timestamp on the file -- (ex: myrssfeedcdc_1748304020815.eds) CAPTURE 'myrssfeedcdc' INSERT ON KEYS 'guid' WITH PATH [adls] '/logs' ;
Example 3
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- Capture new and updated records; capture updates if anything has changed -- in the data CAPTURE 'myrssfeedcdc' INSERT UPDATE ON KEYS 'guid' WITH PATH [adls] '/logs' ;
Example 4
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- Capture new and updated records; limit the capture of updates -- to changes in the title and link fields only -- The first time this runs, the CDC state will be initialized but -- no change log will be created CAPTURE 'myrssfeedcdc' INSERT UPDATE ON KEYS 'guid' OBSERVE 'title,link' WITH INIT_ON_FIRST_RUN PATH [adls] '/logs' ;