SELECT BIGDATA

Overview

The SELECT BIG DATA operation is designed to call CosmosDB or Google Big Query endpoints.

Considering the volume of data that may be returned, using a high watermark is highly recommended. See the Watermark Pattern section for details on how to use high watermark values with HTTP requests.

If specified, the APPLY TX operation happens before the APPLY PIPELINE operation. It can be used to transform payload responses into rows and columns directly. While this can be a handy way to work on a data set immediately, you lose the ability to inspect HTTP Response headers and other properties. For debugging purposes, it is recommended to also use the TRACE INTO operation so that you have a way to inspect actual response codes, full payload, and response headers.

The APPLY PIPELINE operation declared within the SELECT operation allows you to add one or more transformations before capturing the final data set. When using this approach, consider that the pipeline will execute for every paged request. For example, if you use the PAGING option, and 5 pages of data are retrieved, this pipeline will execute five times. If you would like to execute a pipeline transformation after all pages have been retrieved, but before the capture operation, use the APPLY PIPELINE operation after the SELECT operation.

Syntax

Executes a request against the bigdata source specified (CosmosDB or Google BigQuery) and returns rows and columns.

SELECT  
  {TOP <n>}
  < ... | * >
FROM BIGDATA [CONNECTION] 
  -- SQL operation to execute
  ( ... )
  { WITH 
    { DATABASE '...' }
    { TABLE '...' }
  }
  { APPLY TX '...' 
    { WITH RAWCOL '...' } 
  }  

  -- Inner pipeline to be executed per paged results
  { APPLY PIPELINE (...) }

;

DATABASE

The database name to use instead of the default one specified in the connection (CosmosDB only)

TABLE

The table name to use instead of the default one specified in the connection (CosmosDB only)

APPLY TX

Applies a document transformation on the payload assuming the payload is an XML or JSON document using the xpath provided

APPLY PIPELINE

Executes an inline pipeline consisting of one or more components for the entire data set retrieved or for each batch of data retrieved

RAWCOL

When content transformation is enabled, indicates the column to add to the response that contains the raw content of each row

Example 1

-- Read data from a CosmosDB database and transform the 
-- JSON data into rows and columns automatically
-- Apply a high watermark which is updated upon completion
-- of the read operation

SELECT * FROM BIGDATA [cosmosdbTweets] 
  -- use a default minimum _ts value for first-time execution
  (select * from c where c._ts > #isnullorempty(@highwatermark, 1649261846))
  WITH HWM = '_ts'
APPLY TX '$';