Job Writer: Drive
To save data into files, use a Drive connection. To use a specific file type, select the desired File Format. The following file types are supported:
- CSV
- JSON
- XML
- Parquet
- Raw
Date Field Identifier
The Date Field Identifier changes which date is used when using date tokens. Date tokens can be used as part of the path and/or file name itself. By default, the date token (if left blank) is set to the job execution date/time. However, to partition data based on the data itself, you can choose a source field that represents a date/time to use instead. Using a source column allows you to group records into time windows, which is useful for loading Delta Lake environments. The following date tokens are avaiable:
- yyyy
- YYYY
- yy
- YY
- mm (month)
- MM (month)
- dd
- DD
- dow
- DOW
- doy
- DOY
- hh
- HH
- nn (minutes)
- NN (minutes)
- ss
- SS
For YY, MM, DD, DOY, HH, NN and SS, upper-case values force leading zeroes to be added when needed.
Path and File Name
You can provide a specific path or folder in the Path Override field to write the file into. The File Name should include the file extention; it is not automatically added. Both fields accept DataZen functions to control where files will be created.
For example, the following settings will create a target folder every year, based on the Date_of_Birth field, and a seperate file per country field.
- Date Field Identifier: Date_of_Birth
- Path Override: c:\tmp\csv\[yyyy]\
- File Name: customer_{{country}}.txt
Example: Parquet Target
In this example, the settings use a Parquet file target in ADLS using the specified Container. The name of the file will be different for each execution since the name contains the @executionid variable.
The Parquet will use the Snappy compression algorythm. The Date Field Identifier used will be the execution date/time of the job; however, since no date token is being used this setting will be ignored.
Example: CSV Target
In this example, the settings use a CSV file target in ADLS using the specified Container. The name of the file will be different for each execution since the name contains the @executionid variable.
The CSV file will be a delimited file (since no fixed-length fields are specified). In addition, a header row will be added, and any date fields will be formatted using a sortable pattern.