MERGE
Overview
The Merge operation compares records available in the data pipeline with a target table in a supported database engine, and perform an Insert, Update, and/or a Delete operation based on the options specified. Similarly to the SINK operation, the target table will be created automatically if needed, but the table schema will not be upgraded unless the SCHEMA_DRIFT option is also specified; this is a different behavior than the SINK operation by design.
If none of the operations are specified (INSERTS, UPDATES, DELETES), all of them are assumed.
The DELETES option should be used only if the data pipeline is expected to always return all available records from the source system. To avoid accidental deletes, consider adding an EXIT condition first if the record count is 0 or unexpetedly low.
The Merge operation is designed to work with Key Columns; if no key columns are provided, the Merge command will ignore UPDATES and DELETES, and all the records from the data pipeline will be appended to the table.
When using the Merge operation in an Inner Pipeline, the use of the DELETES option may yield unexpected results because the operations in this block may be executed with partial record sets.
Syntax
Implements an automated merge operation for Inserts, Updates, and Deletes based on a unique row identifier
MERGE INTO [CONNECTION]
TABLE '...'
{ WITH
{ KEY COLUMNS '...' }
{ { INSERTS } { UPDATES } { DELETES } }
{ STAGE ON [CONNECTION] '...' CREDENTIALS '...' }
{ SCHEMA_DRIFT }
{ SCHEMA_UPGRADE }
}
;TABLE |
The table name to use; may contain DataZen functions and pipeline variables; the table will be created automatically is not found using the schema of the pipeline data |
STAGE ON |
Specifies the connection and folder/container to use in order to stage the data first depending on the target database |
CREDENTIALS |
When a stage environment is specified, this option is used to provide the necessary credentials |
KEY COLUMNS |
A comma-separated list of fields to use as the unique record identifier |
SCHEMA_DRIFT |
When specified, this option will attempt to add columns if needed |
SCHEMA_UPGRADE |
When specified, this option will attempt to upgrade data types; some database engines have limited support for this operation |
INSERTS |
When specified, this option will add missing records based on the key columns provided, or all records if no key columns are specified |
UPDATES |
When specified, this option will update existing records based on the key columns provided; key columns must be provided for this option |
DELETES |
When specified, this option will delete records if not found in the data pipeline based on the key columns provided; key columns must be provided for this option |
Example 1
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- merges the data set into a table but does not delete records -- that may be missing from the source -- the table will be created if it does not exist and columns will -- be adjusted or added in necessary MERGE INTO [sql2017] TABLE '[feed].[rssdata]' WITH KEY COLUMNS 'guid' INSERTS UPDATES SCHEMA_DRIFT ;
Example 2
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- merges the data set into a table in a Fabric environment -- the table will be created if it does not exist and columns will -- be adjusted or added in necessary; the ADLS connection provided -- will be used to stage the data first; credentials stored in the -- environment variable called adlsSecret will be used -- When no operations are specified, all three are assumed: -- INSERTS UPDATES DELETES MERGE INTO [fabric] TABLE '[feed].[rssdata]' WITH STAGE ON [adls2.0] 'stagedata' CREDENTIALS (#getvar(adlsSecret)) KEY COLUMNS 'guid' SCHEMA_DRIFT SCHEMA_UPGRADE ;