SINK INTO DB

Overview

This operation dumps the current pipeline data set into a database table. Dumping the data in database tables can assist with troubleshooting tasks and long-running operations that require access the data in a previous state. You can use pipeline variables and DataZen functions to calculate the name of the database object to create. The table schema is automatically detected based on the data types of the pipeline columns at the time this command is executed.

Using the ON COLUMN option changes the behavior of this component by reading the column name (or index) provided as an XML payload, detect parent/child relationships within the data, and create one or more tables of all individual nodes with children. The table names created will start with the table name provided as a prefix, and the table name suffix will be the name of the node. The REPLACE option will automatically join all these tables appropriately and return the fully denormalized data of the XML document.

Use this option instead of the inline APPLY TX operation when extracting large XML payloads for improved performance.

Syntax

Saves the pipeline data set into a database table with optional XML parsing of a column; when XML parsing is engaged multiple tables may be created

SINK INTO DB [CONNECTION] 
    TABLE '...' 
    { ON COLUMN '...'  }    -- optional xml column 
    { WITH  
        { TRUNCATE | RECREATE | TRUNCATE_APPEND | RECREATE_APPEND } 
        { REPLACE } 
        { AUTO_CLEAN }
        { STAGE ON [...] '...' }
        { CREDENTIALS '...' }
    }
;

TABLE

The table name to create; may contain DataZen functions and pipeline variables; when using the ON COLUMN setting, this property will be used as the prefix of the table(s) that will be created after the XML transformation

STAGE ON

Specifies the connection and folder/container to use in order to stage the data first depending on the target database

CREDENTIALS

When a stage environment is specified, this option is used to provide the necessary credentials

TRUNCATE

The target table is first truncated if it exists (ignored if ON COLUMNS is specified)

RECREATE

The target table is always dropped first if it exists (ignored if ON COLUMNS is specified)

TRUNCATE_APPEND

The target table is first truncated if it exists then appended to when multiple pages of data are being processed (ignored if ON COLUMNS is specified)

RECREATE_APPEND

The target table is always dropped first if it exists then appended to when multiple pages of data are being processed (ignored if ON COLUMNS is specified)

ON COLUMN

Indicates an XML column needs to be parsed into one or more tables; each table will use the TABLE setting as their prefix

CLEANUP

When using a column transformation, and using the With Replace option, this setting drops the tables that were created by the Sink operation automatically after the join operations are completed; this turns these tables into transient storage for the resulting XML JOIN operation

REPLACE

When using a column transformation, this option performs a full LEFT JOIN operation on all the tables created to return a single output of all the XML inner tables; if inner tables are not in a parent/child relationship, a CROSS JOIN operation is performed instead

CONTINUE_ON_ERROR

Continues processing even if errors are detected

Example 1

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);

-- dumps the data set as-is into a table; the name of the table
-- will be calculated at execution time and will be recreated
-- if it exists
SINK INTO DB [sql2017] TABLE '[feed].[rss_@executionid]'
WITH RECREATE;


Example 2

-- Get data from an ADP endpoint
-- Append all incoming records into a staging table for future processing

SELECT 
    -- the ADP HTTP connection
    * FROM HTTP [ADP]

    -- the HTTP command fetching 100 records at a time
    (GET /v2/worker-demographics?limit=100&select=workers/person/legalName)	
    
    -- page as many times as needed
    WITH PAGING 'offset' 			    -- page using an offset (count) of records retrieved
              PAGING_PARAM 'offset'		-- the url query parameter to use for next pages
              PAGING_PATH 'workers'     -- the path used to count the number of records returned
              DEBUG_MAX_PAGES 2         -- limit to 2 pages during debugging

    -- auto document transformation
    -- and add the raw JSON payload for each row returned
    APPLY TX (workers || person)
    WITH RAWCOL '_raw'

    APPLY PIPELINE (
        
        PRINT 'Adding @@rowcount records to the demographics table...';

        -- for each page returned by ADP, recreate then append the data set as-is into a table
        -- the table is created only on the first page of data retrieved from ADP
        SINK INTO DB [sql2017] TABLE '[adp].[demographics]'
        WITH RECREATE_APPEND;

    )
;




Example 3

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml);

-- Dumps the XML payload into one or more tables
-- based on its internal parent/child nodes and 
-- return a full LEFT JOIN of all its nodes 
-- as the new pipeline data and delete the staged tables 
SINK INTO DB [sql2017] TABLE '[rss_]'
	ON COLUMN 'payload'
	WITH 
		REPLACE
		CLEANUP;