APPLY HTTP

Overview

The APPLY HTTP operation is used to make calls to HTTP/S endpoints as part of a data pipeline and provides options to enhance the current pipeline data set or replace it. Use this component when you would like to:

  • Call an HTTP endpoint once then continue processing the data pipeline
  • Call an HTTP endpoint once and add the HTTP response data as additional columns
  • Call an HTTP endpoint once, transform the returned payload, and replace the pipeline data set with this one
  • Call an HTTP endpoint once per row from the pipeline data set and add the HTTP response data as additional columns
  • Call an HTTP endpoint once per row from the pipeline data set, transform the response of each request, and perform a LEFT or INNER join with the response

This component can be useful to enhance the current data with the output provided by an HTTP/S endpoint. For example, if the pipeline data contains a list of items with unique item IDs, you can use this component to make a call to an API that returns item details. Another useful scenario is to call an AI endpoint, such as an image description endpoint, and return information about the content of the image.

Syntax

Executes and inline HTTP operation and optionally enhances the data set with the result.

APPLY HTTP [CONNECTION] 
  (< GET | POST | PUT | PATCH | DELETE > ... )
  { WITH 
      { PAGING < 'none' | 'offset' | 'page' | 'link' | 'token' >
        { PAGING_PARAM '...' }
        { PAGING_PATH '...' }
      }
      { TIMEOUT <n> }
      { HEADERS '...' }
      { PAYLOAD (...) }
      { CONTENT_TYPE '...' }
      { RETURN_BYTES }

      { PROCESSING < 'continue' | 'replace' | 'inner join' | 'left join' | 'enhance' }
      { SHIFT_ROWS <n> }
      { CONTINUE_ON_ERRORS }
      { CONTINUE_ON_HTTP '...' }
      {
        < { RETRY_ON '...'  } | { RETRY_ONREGEX '...' } >
        { RETRY_PATH '...'  }
        { RETRY_INTERVAL <n>  }
        { RETRY_COUNT <n> }
      }
  }
  { TRACE INTO [CONNECTION] 
    TABLE '...'
    { WITH_PAYLOAD }
  }
  { APPLY TX '...' 
    { WITH RAWCOL '...' } 
  }
;

HEADERS

An array of key/value pair headers to add to the HTTP request (ex: [ { "key1": "value1" }, { "key2": "value2" } } ])

PAYLOAD

For POST, PUT and PATCH operations, represents the optional payload to use

TIMEOUT

A timeout value in seconds

PAGING

The paging strategy to use; must be one of: 'none', 'offset', 'page', 'link', 'token'

PAGING_PARAM

The paging parameter to add to the URL as a query string; if not specified, paging variables can be used in the URL or the PAYLOAD (@pagingindex, @pagingmarker, @recordcount, @recordindex)

PAGING_PATH

The path within the raw response to use to identify the paging token or link to use on the next call

CONTENT_TYPE

The HTTP Request Content-Type setting for POST, PUT, and PATCH operations

RETURN_BYTES

Indicates that the HTTP call returns binary data (ex: an image or PDF document) instead of a string

PROCESSING

The processing mode to use: continue, replace, left join, inner join, enhance - default: continue

SHIFT_ROWS

When enhance is used, the number of rows to skip from the original data set; this option is useful when applying statistical operations to a dataset but the results has fewer records, such as a moving average calculation

CONTINUE_ON_ERRORS

Continues processing even if errors are detected

CONTINUE_ON_HTTP

Continues processing even if the HTTP request returns specific HTTP codes (ex: '400,404')

RETRY_ON

Retry the HTTP operation if the text provided is found in the response body or in the node identified by the RETRY_PATH parameter

RETRY_ONREGEX

Retry the HTTP operation if the regular expression provided results in a positive match when applied on the response body or in the node identified by the RETRY_PATH parameter

RETRY_PATH

The path to limit the test use when searching in a JSON or XML document

RETRY_INTERVAL

The retry interval in seconds (default: 1)

RETRY_COUNT

The maximum number of retries to perform (default: 1)

WITH_PAYLOAD

When tracing is enabled, indicates that the response payload should also be logged

RAWCOL

When content transformation is enabled, indicates the column to add to the response that contains the raw content of each row

Example 1

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);

-- Call an HTTP/S cloud function and send the entire payload
-- as a JSON document, built as an array of objects containing
-- the guid, title, and link of the RSS feed, then continue 

APPLY HTTP [CloudFunction]
	(POST https://mycloudfunction.com/api/ready)
	CONTENT_TYPE 'application/json'
	PAYLOAD (@concatjsonarr({"guid" : "{{guid}}", "title": "{{title}}", "link": "{{link}}"}))
	PROCESSING 'continue'
	CONTINUE_ON_ERRORS
	;

Example 2

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);

-- Call an HTTP/S cloud function and send one row at a time 
-- as a JSON document, then transform the response into 
-- rows and columns and perform a LEFT JOIN using the source 
-- data; add a "_raw" column for each row transformed that contains
-- the full response object for the row

APPLY HTTP [CloudFunction]
	(POST https://mycloudfunction.com/api/ready)
	CONTENT_TYPE 'application/json'
	PAYLOAD ({"guid" : "{{guid}}", "title": "{{title}}", "link": "{{link}}"})
	PROCESSING 'left join'
	APPLY TX '$.data' WITH RAWCOL '_raw'
	CONTINUE_ON_ERRORS
	;