APPLY HTTP

Overview

The APPLY HTTP operation is used to make calls to HTTP/S endpoints as part of a data pipeline and provides options to enhance the current pipeline data set or replace it. Use this component when you would like to:

Call an HTTP endpoint once then continue processing the data pipeline
Call an HTTP endpoint once and add the HTTP response data as additional columns
Call an HTTP endpoint once, transform the returned payload, and replace the pipeline data set with this one
Call an HTTP endpoint once per row from the pipeline data set and add the HTTP response data as additional columns
Call an HTTP endpoint once per row from the pipeline data set, transform the response of each request, and perform a LEFT or INNER join with the response

This component can be useful to enhance the current data with the output provided by an HTTP/S endpoint. For example, if the pipeline data contains a list of items with unique item IDs, you can use this component to make a call to an API that returns item details. Another useful scenario is to call an AI endpoint, such as an image description endpoint, and return information about the content of the image.

Syntax

Executes and inline HTTP operation and optionally enhances the data set with the result.

APPLY HTTP [CONNECTION] 
  (< GET | POST | PUT | PATCH | DELETE > ... )
  { WITH 
      { PAGING < 'none' | 'offset' | 'page' | 'link' | 'token' >
        { PAGING_PARAM '...' }
        { PAGING_PATH '...' }
      }
      { TIMEOUT <n> }
      { HEADERS '...' }
      { PAYLOAD (...) }
      { CONTENT_TYPE '...' }
      { RETURN_BYTES }

      { PROCESSING < 'continue' | 'replace' | 'inner join' | 'left join' | 'enhance' }
      { SHIFT_ROWS <n> }
      { CONTINUE_ON_ERRORS }
      { CONTINUE_ON_HTTP '...' }
      {
        < { RETRY_ON '...'  } | { RETRY_ONREGEX '...' } >
        { RETRY_PATH '...'  }
        { RETRY_INTERVAL <n>  }
        { RETRY_COUNT <n> }
      }
  }
  { TRACE INTO [CONNECTION] 
    TABLE '...'
    { WITH_PAYLOAD }
  }
  { APPLY TX '...' 
    { WITH RAWCOL '...' } 
  }
;

HEADERS	An array of key/value pair headers to add to the HTTP request (ex: [ { "key1": "value1" }, { "key2": "value2" } } ])
PAYLOAD	For POST, PUT and PATCH operations, represents the optional payload to use
TIMEOUT	A timeout value in seconds
PAGING	The paging strategy to use; must be one of: 'none', 'offset', 'page', 'link', 'token'
PAGING_PARAM	The paging parameter to add to the URL as a query string; if not specified, paging variables can be used in the URL or the PAYLOAD (@pagingindex, @pagingmarker, @recordcount, @recordindex)
PAGING_PATH	The path within the raw response to use to identify the paging token or link to use on the next call, or the node to use as the count of items returned when using an offset strategy (may be an object array or a node containing a value representing the total count of items retrieved so far)
DEBUG_MAX_PAGES	During preview operations, the maximum number of paging operations to perform
CONTENT_TYPE	The HTTP Request Content-Type setting for POST, PUT, and PATCH operations
RETURN_BYTES	Indicates that the HTTP call returns binary data (ex: an image or PDF document) instead of a string
PROCESSING	The processing mode to use: continue, replace, left join, inner join, enhance - default: continue
SHIFT_ROWS	When enhance is used, the number of rows to skip from the original data set; this option is useful when applying statistical operations to a dataset but the results has fewer records, such as a moving average calculation
CONTINUE_ON_ERRORS	Continues processing even if errors are detected
CONTINUE_ON_HTTP	Continues processing even if the HTTP request returns specific HTTP codes (ex: '400,404')
RETRY_ON	Retry the HTTP operation if the text provided is found in the response body or in the node identified by the RETRY_PATH parameter
RETRY_ONREGEX	Retry the HTTP operation if the regular expression provided results in a positive match when applied on the response body or in the node identified by the RETRY_PATH parameter
RETRY_PATH	The path to limit the test use when searching in a JSON or XML document
RETRY_INTERVAL	The retry interval in seconds (default: 1)
RETRY_COUNT	The maximum number of retries to perform (default: 1)
WITH_PAYLOAD	When tracing is enabled, indicates that the response payload should also be logged
RAWCOL	When content transformation is enabled, indicates the column to add to the response that contains the raw content of each row

Example 1

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);

-- Call an HTTP/S cloud function and send the entire payload
-- as a JSON document, built as an array of objects containing
-- the guid, title, and link of the RSS feed, then continue 

APPLY HTTP [CloudFunction]
	(POST https://mycloudfunction.com/api/ready)
	CONTENT_TYPE 'application/json'
	PAYLOAD (@concatjsonarr({"guid" : "{{guid}}", "title": "{{title}}", "link": "{{link}}"}))
	PROCESSING 'continue'
	CONTINUE_ON_ERRORS
	;

Example 2

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);

-- Call an HTTP/S cloud function and send one row at a time 
-- as a JSON document, then transform the response into 
-- rows and columns and perform a LEFT JOIN using the source 
-- data; add a "_raw" column for each row transformed that contains
-- the full response object for the row

APPLY HTTP [CloudFunction]
	(POST https://mycloudfunction.com/api/ready)
	CONTENT_TYPE 'application/json'
	PAYLOAD ({"guid" : "{{guid}}", "title": "{{title}}", "link": "{{link}}"})
	PROCESSING 'left join'
	APPLY TX '$.data' WITH RAWCOL '_raw'
	CONTINUE_ON_ERRORS
	;