SELECT HTTP

Overview

The SELECT HTTP operation is designed to call any HTTP endpoint, typically involving a GET, POST, PUT, or PATCH operation. This operation supports all the options provided by DataZen's HTTP Reader; for details on all the available options, see the HTTP/S Reader help section.

The HTTP command to send is specified in parenthesis right after the HTTP connection, and needs to include the HTTP Verb to execute. For example, the following executes a POST operation on an Intacct endpoint:
(POST https://api.intacct.com/ia/xml/xmlgw.phtml)

Use the high watermark settings when you need DataZen to remember the last high value of a filter in order to retrieve only changed (or new) records since the last pipeline was executed. See the Watermark Pattern section for details on how to use high watermark values with HTTP requests.

If specified, the APPLY TX operation happens before the APPLY PIPELINE operation. It can be used to transform payload responses into rows and columns directly. While this can be a handy way to work on a data set immediately, you lose the ability to inspect HTTP Response headers and other properties. For debugging purposes, it is recommended to also use the TRACE INTO operation so that you have a way to inspect actual response codes, full payload, and response headers.

The APPLY PIPELINE operation declared within the SELECT operation allows you to add one or more transformations before capturing the final data set. When using this approach, consider that the pipeline will execute for every paged request. For example, if you use the PAGING option, and 5 pages of data are retrieved, this pipeline will execute five times. If you would like to execute a pipeline transformation after all pages have been retrieved, but before the capture operation, use the APPLY PIPELINE operation after the SELECT operation.

Syntax

Executes an HTTP operation and returns the HTTP result including its payload and optionally returns rows and columns by transforming the response payload.

{ USING 
    (...) 
    ON DB [CONNECTION]
    { WITH 
        { SEPARATE_EXECUTION} 
        { INCLUDE_FIELDS }
    }
}
SELECT  
  {TOP <n>}
  < ... | * >
FROM HTTP [CONNECTION] 
  (< GET | POST | PUT | PATCH | DELETE > ... )
  { WITH 
      { HWM '<field>'
        { HWM_PARAM '...' }
        { HWM_AS_DATETIME }
      }
      { TIMEOUT <n> }
      { PAYLOAD (...) }
      { HEADERS '...' }
      { PAGING < 'none' | 'offset' | 'page' | 'link' | 'token' >
        { PAGING_PARAM '...' }
        { PAGING_PATH '...' }
        { DEBUG_MAX_PAGES n }
      }
      { CONTENT_TYPE '...' }
      { RETURN_BYTES }
  }
  { TRACE INTO [CONNECTION] 
    TABLE '...'
    { WITH_PAYLOAD }
  }
  { APPLY TX '...' 
    { WITH RAWCOL '...' } 
  }
  
  -- Inner pipeline to be executed per paged results
  { APPLY PIPELINE (...) }
;

TOP	Client-side operator that limits the number of records after the inner pipeline has executed
USING	Specifies dynamic parameters retrieved from a SQL database (making an HTTP call per record returned) to use as part of the HTTP request URL and/or Payload using {{@param.<field>}} markers
SEPARATE_EXECUTION	Each dynamic row will be treated as a separate execution causing a change log to be created per dynamic row
INCLUDE_FIELDS	Add the dynamic row as additional columns to the output returned by the HTTP operation
HWM	The column name to use as the field containing the next high watermark value if found; this option requires the use of the APPLY TX transformation; the high watermark value can be used as part of the URL or the PAYLOAD using the @highwatermark variable, or will automatically be added to the URL as a parameter if the HWM_PARAM is also set
HWM_PARAM	The query parameter to be added or updated in the URL; leave blank to use the @highwatermark variable
HWM_AS_DATETIME	The high watermark value will be treated as a date/time field so the next high watermark can be correctly calculated
HEADERS	An array of key/value pair headers to add to the HTTP request (ex: [ { "key1": "value1" }, { "key2": "value2" } } ])
PAYLOAD	For POST, PUT and PATCH operations, represents the optional payload to use
TIMEOUT	A timeout value in seconds
PAGING	The paging strategy to use; must be one of: 'none', 'offset', 'page', 'link', 'token'
PAGING_PARAM	The paging parameter to add to the URL as a query string; if not specified, paging variables can be used in the URL or the PAYLOAD (@pagingindex, @pagingmarker, @recordcount, @recordindex)
PAGING_PATH	The path within the raw response to use to identify the paging token or link to use on the next call
CONTENT_TYPE	The HTTP Request Content-Type setting for POST, PUT, and PATCH operations
RETURN_BYTES	Indicates that the HTTP call returns binary data (ex: an image or PDF document) instead of a string
APPLY TX	Applies a document transformation on the payload assuming the payload is an XML or JSON document using the xpath provided
TRACE INTO	Saves each request and response to a database table for raw data inspection and debugging purposes without the payload response unless WITH_PAYLOAD is also specified
WITH_PAYLOAD	When tracing is enabled, indicates that the response payload should also be logged
APPLY PIPELINE	Executes an inline pipeline consisting of one or more components for each page of data retrieved by the HTTP call
RAWCOL	When content transformation is enabled, indicates the column to add to the response that contains the raw content of each row

Example 1

-- This HTTP request uses the Rss Feed connection and calls the /econi.xml relative URL
-- The Rss Feed connection points to: https://www.govinfo.gov/rss
-- This returns the raw output as a single row, which includes response headers 
-- and the full response payload

SELECT * FROM HTTP [Rss Feed] (GET /econi.xml);

Example 2

-- This HTTP request uses the Rss Feed connection and calls the /econi.xml relative URL
-- The Rss Feed connection points to: https://www.govinfo.gov/rss
-- The APPLY operation transforms the HTTP response body using an XPath command that 
-- turns the payload into rows and columns automatically

SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);