CAPTURE
Overview
The capture operation allows you to persist the data set from the pipeline into a change log. Change logs can be created with either a full or differential data set from a previous pipeline execution. Write pipelines then play or replay change logs.
Change logs contain two data streams: an upsert and an optional delete stream. When an initialization on the source data is performed, or if synthetic change capture is not engaged, the upsert stream contains inserted records only.
Use the KEYS setting to enable Synthetic CDC. The KEYS setting allows you to specify one or more columns from the source pipeline identifying a unique record.
When the KEYS setting is used and the UPSERT option is provided, you can further limit which fields are used to detect changes using the OBSERVE setting. If not specified, all the fields from the source pipeline are checked for changes.
Syntax
Captures the data set of the pipeline and creates a change log with optional change data capture. Only one Capture operation is allowed in a pipeline.
CAPTURE
-- optional change log name
{ '...' }
-- Capture options
{
{ INSERTS } { UPDATES } { DELETES } ON KEYS '... {,...} '
{ OBSERVE '... {,...} ' }
{ WITH
{ INIT_ON_FIRST_RUN }
{ IGNORE_DUPS }
{ PATH { [CONNECTION] } '...' }
{ CDCMARKER '...' }
{ ENCRYPT_CERT '...' }
{ SIGN_CERT '...' }
}
}
;INSERTS |
Limit the synthetic CDC to inserted records since the last capture |
UPDATES |
Limit the synthetic CDC to updated records since the last capture |
DELETES |
Limit the synthetic CDC to deleted records since the last capture (assumes the source returns all available records all the time) |
KEYS |
Engage synthetic CDC using the fields in a comma-separated list identifying a unique record |
OBSERVE |
Optional comma-separated list of fields to observe for changes in a synthetic CDC operation; if blank, all fields are observed |
INIT_ON_FIRST_RUN |
Captures all available records in a change log the first time the operation is executed in a synthetic CDC operation |
IGNORE_DUPS |
Discard records that would otherwise be considered duplicates when using the KEYS provided in a synthetic CDC operation |
CDCMARKER |
The name of a field to add indicating which records are inserted (I) or updated (U) in a synthetic CDC operation |
PATH |
A local path or cloud drive or FTP site in which change logs will be stored |
ENCRYPT_CERT |
The thumbprint of an X.509 certificate used to encrypt the change log |
SIGN_CERT |
The thumbprint of an X.509 certificate used to sign the change log |
Example 1
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- We are not capture changes since we are not using the KEYS property, -- so the entire data set will be captured every time -- Also, since we are adding a file extension to the pipeline name, -- we are forcing the full name of the change log, so it will be -- overriden everytime the pipeline runs. CAPTURE 'myrssfeed.eds' WITH PATH [adls] '/logs';
Example 2
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- Capture new records only; ignore changes to previous records -- The first time this runs, all records will be captured -- Since there is no file extension, new change logs will be created -- every time using a timestamp on the file -- (ex: myrssfeedcdc_1748304020815.eds) CAPTURE 'myrssfeedcdc' INSERTS ON KEYS 'guid' WITH PATH [adls] '/logs' ;
Example 3
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- Capture new and updated records; capture updates if anything has changed -- in the data CAPTURE 'myrssfeedcdc' INSERTS UPDATES ON KEYS 'guid' WITH PATH [adls] '/logs' ;
Example 4
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using an existing Cloud Connection -- You can also capture to a local drive if running a Self-Hosted agent -- Capture new and updated records; limit the capture of updates -- to changes in the title and link fields only -- The first time this runs, the CDC state will be initialized but -- no change log will be created CAPTURE 'myrssfeedcdc' INSERTS UPDATES ON KEYS 'guid' OBSERVE 'title,link' WITH INIT_ON_FIRST_RUN PATH [adls] '/logs' ;
Example 5
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Capture the data into a change log using the internal CLOUD Agent store -- This option is only available for CLOUD Agents; not self-hosted Agents -- Corresponding load operation: LOAD UPDATES FROM 'myrssfeedcdc'; CAPTURE 'myrssfeedcdc' INSERTS UPDATES ON KEYS 'guid' ;