PUSH BIGDATA

Overview

The Push BigData operation allows you to send changes previously captured to either a CosmosDB or Google BigQuery endpoint. Since CosmosDB is a document database storing JSON documents, you can optionally construct the JSON Document as needed; however, if the PAYLOAD option is not provided, a default (flat) JSON document will be constructed.

Syntax

Sends data for upserted records to a big data endpoint (CosmosDB, BigQuery).

PUSH INTO <COSMOSDB | BIGQUERY> [CONNECTION]
        -- Common options
        { AUTO_CREATE }
        { FIELDS '<field>,...' }
        { MAP '...' }

        -- CosmosDB options
        { DATABASE '...' }
        { TABLE '...' }
        { KEYPATH '...'    }
        { PARTITION '...'  }
        { PAYLOAD '...' }
    
        -- BigQuery options
        { DATASET '...' }
        { CONTAINER '...' }
        { WRITE_BEHAVIOR < 'append' | 'truncate' | 'writeifempty' > }
    
    { WITH
        { TIMEOUT <n> }
        { BATCH <n> }
        { DISCARD_ON_SUCCESS }
        { DISCARD_ON_ERROR }
        { DLQ_ON_ERROR '...' }
        { RETRY < LINEAR | EXPONENTIAL > (<n>,<n>) }
    }
;

AUTO_CREATE

Automatically creates the target databases/tables/containers/datasets if not found

FIELDS

Comma-separated list of fields to keep from the data set; leave blank to push all available fields

MAP

An array of key-value pair identifying which fields should be renamed (leave blank to leave all fields unchanged)

DATABASE

The CosmosDB database name

TABLE

The CosmosDB table name

KEYPATH

A JSON Path identifying the record unique identifier in the payload for CosmosDB

PARTITION

The document property to use as the partition key for CosmosDB; if not a field name the function #rndguid() is allowed in this field

PAYLOAD

A custom payload when building complex JSON documents for CosmosDB

DATASET

The Google BigQuery dataset name

CONTAINER

The Google BigQuery container name

WRITE_BEHAVIOR

The Google BigQuery write behavior (append,truncate,writeifempty)

TIMEOUT

A timeout value in seconds

BATCH

The number of records to process in a single call

RETRY_INTERVAL

The retry interval in seconds (default: 1)

RETRY_COUNT

The maximum number of retries to perform (default: 1)

WITH_PAYLOAD

When tracing is enabled, indicates that the response payload should also be logged

DISCARD_ON_SUCCESS

Deletes the change log after successful completion of the push operation

DISCARD_ON_ERROR

Deletes the change log if the push operation failed

DLQ_ON_ERROR

Moves the change log to a subfolder or directory if the push operation failed

RETRY EXPONENTIAL

Retries the operation on failure up to N times, waiting P seconds exponentially longer every time (N,P)

RETRY LINEAR

Retries the operation on failure up to N times, waiting P seconds every time (N,P)

Example 1


-- Load the next inserts/updates previously captured in a cloud folder using
-- CAPTURE 'mycdc' INSERT UPDATE DELETE ON KEYS 'guid' WITH PATH [adls2.0] 'container'

LOAD UPSERTS FROM [adls2.0] 'mycdc' PATH '/container' KEY_COLUMNS 'guid';

-- Send all updates and inserts into a target CosmosDB table
-- and create the table if it doesn't exist; build a default, flat JSON 
-- document using all available fields
PUSH INTO BIGDATA [cosmosDBConnection] 
	DATABASE 'testdb'
	TABLE 'rssdata'
	KEYPATH '/guid'
	AUTO_CREATE
  WITH
	BATCH 1000
;