SINK INTO DB
Overview
This operation dumps the current pipeline data set into a database table. Dumping the data in database tables can assist with troubleshooting tasks and long-running operations that require access the data in a previous state. You can use pipeline variables and DataZen functions to calculate the name of the database object to create. The table schema is automatically detected based on the data types of the pipeline columns at the time this command is executed.
Using the ON COLUMN option changes the behavior of this component by reading the
column name (or index) provided as an XML payload, detect parent/child relationships within
the data, and create one or more tables of all individual nodes with children. The table names
created will start with the table name provided as a prefix, and the table name suffix will be the name
of the node. The REPLACE option will automatically join all these tables appropriately
and return the fully denormalized data of the XML document.
Use this option instead of the inline APPLY TX operation when extracting large XML payloads for improved performance.
Syntax
Saves the pipeline data set into a database table with optional XML parsing of a column; when XML parsing is engaged multiple tables may be created
SINK INTO DB [CONNECTION]
TABLE '...'
{ ON COLUMN '...' } -- optional xml column
{ WITH
{ TRUNCATE | RECREATE | TRUNCATE_APPEND | RECREATE_APPEND }
{ REPLACE }
{ AUTO_CLEAN }
{ STAGE ON [...] '...' }
{ CREDENTIALS '...' }
}
;TABLE |
The table name to create; may contain DataZen functions and pipeline variables; when using the ON COLUMN setting, this property will be used as the prefix of the table(s) that will be created after the XML transformation |
STAGE ON |
Specifies the connection and folder/container to use in order to stage the data first depending on the target database |
CREDENTIALS |
When a stage environment is specified, this option is used to provide the necessary credentials |
TRUNCATE |
The target table is first truncated if it exists (ignored if ON COLUMNS is specified) |
RECREATE |
The target table is always dropped first if it exists (ignored if ON COLUMNS is specified) |
TRUNCATE_APPEND |
The target table is first truncated if it exists then appended to when multiple pages of data are being processed (ignored if ON COLUMNS is specified) |
RECREATE_APPEND |
The target table is always dropped first if it exists then appended to when multiple pages of data are being processed (ignored if ON COLUMNS is specified) |
ON COLUMN |
Indicates an XML column needs to be parsed into one or more tables; each table will use the TABLE setting as their prefix |
CLEANUP |
When using a column transformation, and using the With Replace option, this setting drops the tables that were created by the Sink operation automatically after the join operations are completed; this turns these tables into transient storage for the resulting XML JOIN operation |
REPLACE |
When using a column transformation, this option performs a full LEFT JOIN operation on all the tables created to return a single output of all the XML inner tables; if inner tables are not in a parent/child relationship, a CROSS JOIN operation is performed instead |
CONTINUE_ON_ERROR |
Continues processing even if errors are detected |
Example 1
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- dumps the data set as-is into a table; the name of the table -- will be calculated at execution time and will be recreated -- if it exists SINK INTO DB [sql2017] TABLE '[feed].[rss_@executionid]' WITH RECREATE;
Example 2
-- Get data from an ADP endpoint
-- Append all incoming records into a staging table for future processing
SELECT
-- the ADP HTTP connection
* FROM HTTP [ADP]
-- the HTTP command fetching 100 records at a time
(GET /v2/worker-demographics?limit=100&select=workers/person/legalName)
-- page as many times as needed
WITH PAGING 'offset' -- page using an offset (count) of records retrieved
PAGING_PARAM 'offset' -- the url query parameter to use for next pages
PAGING_PATH 'workers' -- the path used to count the number of records returned
DEBUG_MAX_PAGES 2 -- limit to 2 pages during debugging
-- auto document transformation
-- and add the raw JSON payload for each row returned
APPLY TX (workers || person)
WITH RAWCOL '_raw'
APPLY PIPELINE (
PRINT 'Adding @@rowcount records to the demographics table...';
-- for each page returned by ADP, recreate then append the data set as-is into a table
-- the table is created only on the first page of data retrieved from ADP
SINK INTO DB [sql2017] TABLE '[adp].[demographics]'
WITH RECREATE_APPEND;
)
;
Example 3
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml); -- Dumps the XML payload into one or more tables -- based on its internal parent/child nodes and -- return a full LEFT JOIN of all its nodes -- as the new pipeline data and delete the staged tables SINK INTO DB [sql2017] TABLE '[rss_]' ON COLUMN 'payload' WITH REPLACE CLEANUP;