PUSH BIGDATA
Overview
The Push BigData operation allows you to send changes previously captured to either a CosmosDB or Google BigQuery endpoint. Since CosmosDB is a document database storing JSON documents, you can optionally construct the JSON Document as needed; however, if the PAYLOAD option is not provided, a default (flat) JSON document will be constructed.
Syntax
Sends data for upserted records to a big data endpoint (CosmosDB, BigQuery).
PUSH INTO <COSMOSDB | BIGQUERY> [CONNECTION] -- Common options { AUTO_CREATE } { FIELDS '<field>,...' } { MAP '...' } -- CosmosDB options { DATABASE '...' } { TABLE '...' } { KEYPATH '...' } { PARTITION '...' } { PAYLOAD '...' } -- BigQuery options { DATASET '...' } { CONTAINER '...' } { WRITE_BEHAVIOR < 'append' | 'truncate' | 'writeifempty' > } { WITH { TIMEOUT <n> } { BATCH <n> } { DISCARD_ON_SUCCESS } { DISCARD_ON_ERROR } { DLQ_ON_ERROR '...' } { RETRY < LINEAR | EXPONENTIAL > (<n>,<n>) } } ;
AUTO_CREATE |
Automatically creates the target databases/tables/containers/datasets if not found |
FIELDS |
Comma-separated list of fields to keep from the data set; leave blank to push all available fields |
MAP |
An array of key-value pair identifying which fields should be renamed (leave blank to leave all fields unchanged) |
DATABASE |
The CosmosDB database name |
TABLE |
The CosmosDB table name |
KEYPATH |
A JSON Path identifying the record unique identifier in the payload for CosmosDB |
PARTITION |
The document property to use as the partition key for CosmosDB; if not a field name the function #rndguid() is allowed in this field |
PAYLOAD |
A custom payload when building complex JSON documents for CosmosDB |
DATASET |
The Google BigQuery dataset name |
CONTAINER |
The Google BigQuery container name |
WRITE_BEHAVIOR |
The Google BigQuery write behavior (append,truncate,writeifempty) |
TIMEOUT |
A timeout value in seconds |
BATCH |
The number of records to process in a single call |
RETRY_INTERVAL |
The retry interval in seconds (default: 1) |
RETRY_COUNT |
The maximum number of retries to perform (default: 1) |
WITH_PAYLOAD |
When tracing is enabled, indicates that the response payload should also be logged |
DISCARD_ON_SUCCESS |
Deletes the change log after successful completion of the push operation |
DISCARD_ON_ERROR |
Deletes the change log if the push operation failed |
DLQ_ON_ERROR |
Moves the change log to a subfolder or directory if the push operation failed |
RETRY EXPONENTIAL |
Retries the operation on failure up to N times, waiting P seconds exponentially longer every time (N,P) |
RETRY LINEAR |
Retries the operation on failure up to N times, waiting P seconds every time (N,P) |
Example 1
-- Load the next inserts/updates previously captured in a cloud folder using -- CAPTURE 'mycdc' INSERT UPDATE DELETE ON KEYS 'guid' WITH PATH [adls2.0] 'container' LOAD UPSERTS FROM [adls2.0] 'mycdc' PATH '/container' KEY_COLUMNS 'guid'; -- Send all updates and inserts into a target CosmosDB table -- and create the table if it doesn't exist; build a default, flat JSON -- document using all available fields PUSH INTO BIGDATA [cosmosDBConnection] DATABASE 'testdb' TABLE 'rssdata' KEYPATH '/guid' AUTO_CREATE WITH BATCH 1000 ;