APPLY SCHEMA
Overview
The Apply Schema operation allows you to enforce a schema on the data, coerce data types, add or remove multiple columns, and set default values to new columns.
The data types used by this component leverage the C# notation.
Syntax
Applies and enforces a schema definition and optionally limits which fields are retained from the data set. Each line represents a column definition in the following format: {type} fieldName{|description{|< default | @sql(...) | #func(...) >}}
APPLY SCHEMA ( -- one line per column definition {type} fieldName{|description{|< default | @sql(...) | #func(...) >}} ) WITH { CONTINUE_ON_ERROR } { STRICT_COLUMNS } { FX_BY_ROW } ;
CONTINUE_ON_ERROR |
Continues processing even if errors are detected |
STRICT_COLUMNS |
Remove any fields not listed in the schema definition |
FX_BY_ROW |
Processes any DataZen function for each row instead of once for the entire schema definition |
Example 1
-- Get data from an RSS Feed SELECT * FROM HTTP [Rss Feed] (GET /econi.xml) APPLY TX (//item); -- Enforce a schema -- The schema follows a c#-like notation for data types; valid data types: -- int, short, long, datetime,guid,bool,float,double,byte,byte[],char,string,decimal -- Certain data types accept a length - byte, char and string (ex: string, or string[500]) -- The decimal data type supports length and precision (ex: decimal(4,2)) -- The default value can be a hard-coded value, a pipeline variable (ex: @executionId), -- a datazen function (ex: #rndguid()) or operate on the current row as a -- SQL operation (ex: @sql([field1] + [field1])) -- APPLY SCHEMA ( string(36) guid string(255) link string(4000) desc||#trim({{description}}) ) FX_BY_ROW -- this is necessary because the #trim() function is "per row" STRICT_COLUMNS -- remove any other column not listed above ; -- Now, add a GUID value, with the same GUID applied to all rows (no more FX_BY_ROW) -- The default data type is string if no data type is provided APPLY SCHEMA ( _controlId|new column with a default value|#rndguid() ) ;