APPLY SCHEMA

Overview

The Apply Schema operation allows you to enforce a schema on the data, coerce data types, add or remove multiple columns, and set default values to new columns.

The data types used by this component leverage the C# notation.

Syntax

Applies and enforces a schema definition and optionally limits which fields are retained from the data set. Each line represents a column definition in the following format: {type} fieldName{|description{|< default | @sql(...) | #func(...) >}}

APPLY SCHEMA 
(
  -- one line per column definition
  {type} fieldName{|description{|< default | @sql(...) | #func(...) >}}
)
WITH 
    { CONTINUE_ON_ERROR } 
    { STRICT_COLUMNS } 
    { FX_BY_ROW } 
;

CONTINUE_ON_ERROR	Continues processing even if errors are detected
STRICT_COLUMNS	Remove any fields not listed in the schema definition
FX_BY_ROW	Processes any DataZen function for each row instead of once for the entire schema definition

Example 1

-- Get data from an RSS Feed
SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item);

-- Enforce a schema
-- The schema follows a c#-like notation for data types; valid data types: 
-- int, short, long, datetime,guid,bool,float,double,byte,byte[],char,string,decimal
-- Certain data types accept a length - byte, char and string (ex: string, or string[500])
-- The decimal data type supports length and precision (ex: decimal(4,2))
-- The default value can be a hard-coded value, a pipeline variable (ex: @executionId), 
-- a datazen function (ex: #rndguid()) or operate on the current row as a 
-- SQL operation (ex: @sql([field1] + [field1]))
-- 
APPLY SCHEMA (
	string(36) guid
	string(255) link
	string(4000) desc||#trim({{description}})
	)
	FX_BY_ROW			-- this is necessary because the #trim() function is "per row"
	STRICT_COLUMNS		-- remove any other column not listed above
;

-- Now, add a GUID value, with the same GUID applied to all rows (no more FX_BY_ROW)
-- The default data type is string if no data type is provided
APPLY SCHEMA (
	_controlId|new column with a default value|#rndguid()
	)
;