SELECT HTTP

Overview

The SELECT HTTP operation is designed to call any HTTP endpoint, typically involving a GET, POST, PUT, or PATCH operation. This operation supports all the options provided by DataZen's HTTP Reader; for details on all the available options, see the HTTP/S Reader help section.

The HTTP command to send is specified in parenthesis right after the HTTP connection, and needs to include the HTTP Verb to execute. For example, the following executes a POST operation on an Intacct endpoint:
(POST https://api.intacct.com/ia/xml/xmlgw.phtml)

Use the high watermark settings when you need DataZen to remember the last high value of a filter in order to retrieve only changed (or new) records since the last pipeline was executed. See the Watermark Pattern section for details on how to use high watermark values with HTTP requests.

If specified, the APPLY TX operation happens before the APPLY PIPELINE operation. It can be used to transform payload responses into rows and columns directly. While this can be a handy way to work on a data set immediately, you lose the ability to inspect HTTP Response headers and other properties. For debugging purposes, it is recommended to also use the TRACE INTO operation so that you have a way to inspect actual response codes, full payload, and response headers.

The APPLY PIPELINE operation declared within the SELECT operation allows you to add one or more transformations before capturing the final data set. When using this approach, consider that the pipeline will execute for every paged request. For example, if you use the PAGING option, and 5 pages of data are retrieved, this pipeline will execute five times. If you would like to execute a pipeline transformation after all pages have been retrieved, but before the capture operation, use the APPLY PIPELINE operation after the SELECT operation.

Syntax

Executes an HTTP operation and returns the HTTP result including its payload and optionally returns rows and columns by transforming the response payload.

{ USING 
    (...) 
    ON DB [CONNECTION]
    { WITH 
        { SEPARATE_EXECUTION} 
        { INCLUDE_FIELDS }
    }
}
SELECT  
  {TOP <n>}
  < ... | * >
FROM HTTP [CONNECTION] 
  (< GET | POST | PUT | PATCH | DELETE > ... )
  { WITH 
      { HWM '<field>'
        { HWM_PARAM '...' }
        { HWM_AS_DATETIME }
      }
      { TIMEOUT <n> }
      { PAYLOAD (...) }
      { HEADERS '...' }
      { PAGING < 'none' | 'offset' | 'page' | 'link' | 'token' >
        { PAGING_PARAM '...' }
        { PAGING_PATH '...' }
      }
      { CONTENT_TYPE '...' }
      { RETURN_BYTES }
  }
  { TRACE INTO [CONNECTION] 
    TABLE '...'
    { WITH_PAYLOAD }
  }
  { APPLY TX '...' 
    { WITH RAWCOL '...' } 
  }
  
  -- Inner pipeline to be executed per paged results
  { APPLY PIPELINE (...) }
;

USING

Specifies dynamic parameters retrieved from a SQL database (making an HTTP call per record returned) to use as part of the HTTP request URL and/or Payload using {{@param.<field>}} markers

SEPARATE_EXECUTION

Each dynamic row will be treated as a separate execution causing a change log to be created per dynamic row

INCLUDE_FIELDS

Add the dynamic row as additional columns to the output returned by the HTTP operation

HWM

The column name to use as the field containing the next high watermark value if found; this option requires the use of the APPLY TX transformation; the high watermark value can be used as part of the URL or the PAYLOAD using the @highwatermark variable, or will automatically be added to the URL as a parameter if the HWM_PARAM is also set

HWM_PARAM

The query parameter to be added or updated in the URL; leave blank to use the @highwatermark variable

HWM_AS_DATETIME

The high watermark value will be treated as a date/time field so the next high watermark can be correctly calculated

HEADERS

An array of key/value pair headers to add to the HTTP request (ex: [ { "key1": "value1" }, { "key2": "value2" } } ])

PAYLOAD

For POST, PUT and PATCH operations, represents the optional payload to use

TIMEOUT

A timeout value in seconds

PAGING

The paging strategy to use; must be one of: 'none', 'offset', 'page', 'link', 'token'

PAGING_PARAM

The paging parameter to add to the URL as a query string; if not specified, paging variables can be used in the URL or the PAYLOAD (@pagingindex, @pagingmarker, @recordcount, @recordindex)

PAGING_PATH

The path within the raw response to use to identify the paging token or link to use on the next call

CONTENT_TYPE

The HTTP Request Content-Type setting for POST, PUT, and PATCH operations

RETURN_BYTES

Indicates that the HTTP call returns binary data (ex: an image or PDF document) instead of a string

APPLY TX

Applies a document transformation on the payload assuming the payload is an XML or JSON document using the xpath provided

TRACE INTO

Saves each request and response to a database table for raw data inspection and debugging purposes without the payload response unless WITH_PAYLOAD is also specified

WITH_PAYLOAD

When tracing is enabled, indicates that the response payload should also be logged

APPLY PIPELINE

Executes an inline pipeline consisting of one or more components for each page of data retrieved by the HTTP call

RAWCOL

When content transformation is enabled, indicates the column to add to the response that contains the raw content of each row

Example 1

-- This HTTP request uses the Rss Feed connection and calls the /econi.xml relative URL
-- The Rss Feed connection points to: https://www.govinfo.gov/rss
-- This returns the raw output as a single row, which includes response headers 
-- and the full response payload

SELECT * FROM HTTP [Rss Feed] (GET /econi.xml);


Example 2

-- This HTTP request uses the Rss Feed connection and calls the /econi.xml relative URL
-- The Rss Feed connection points to: https://www.govinfo.gov/rss
-- The APPLY operation transforms the HTTP response body using an XPath command that 
-- turns the payload into rows and columns automatically

SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);