SELECT DRIVE
Overview
The SELECT DRIVE operation is designed to read from one or more files from a cloud drive, local path, FTP site, or network drive. Reading one file at a time is usually the preferred implementation; however, you can use this operation to read all available files and combine them into a single data set.
The file format support the * wildcard regardless of the drive selected. Depending on the drive selected, file names may be case-sensitive.
The APPLY PIPELINE operation declared within the SELECT operation allows you to add one or more transformations before capturing the final data set. When using this approach, consider that the pipeline will execute for every paged request. For example, if you use the PAGING option, and 5 pages of data are retrieved, this pipeline will execute five times. If you would like to execute a pipeline transformation after all pages have been retrieved, but before the capture operation, use the APPLY PIPELINE operation after the SELECT operation.
Using both MOVE_TO
and INCLUDE_SUBFOLDERS
can create an unpredictable outcome if
the archive folder is itself a subdirectory.
Syntax
Reads data from one or more files located on a local folder or on cloud or FTP drives.
SELECT {TOP <n>} < ... | * > FROM DRIVE [CONNECTION] // File name or pattern ( ... ) WITH FORMAT < 'PARQUET' | 'CSV' | 'XML' | 'JSON' | 'RAW' | 'BYTES' > { CONTAINER '...' } { UNZIP } -- CSV Options { HAS_HEADERS } { TRIM_FIELDS } { SKIP N } { QUOTED_IDENTIFIER } { DELIMITERS '...' } { FIXED_LENGTHS '...' } { COMMENT_TOKENS '...' } -- RAW or BYTES Option { LIST } { SINGLE_FILE } { NEW_FILES_ONLY } { MOVE_TO '...' } { INCLUDE_SUBFOLDERS } { FILENAME_COL '...' } { TRACE INTO [CONNECTION] TABLE '...' { WITH_PAYLOAD } } { APPLY TX '...' { WITH RAWCOL '...' } } ;
FORMAT |
One of the file formats supported: PARQUET, CSV, XML, JSON, RAW, BYTES |
CONTAINER |
Overrides the path to the file when using a local folder or an FTP site, the AWS S3 bucket name, the Azure Container name, or the name of the Google Drive folder |
UNZIP |
Unzip the file first and processes each file with the FORMAT specified |
HAS_HEADERS |
Assumes the file has a header line (CSV only) |
TRIM_FIELDS |
Applies a trim operation to each field (CSV only) |
SKIP |
The number of records to skip before reading the content (CSV only) |
QUOTED_IDENTIFIER |
The character to use identifying data in quotes (ex: '"' ) (CSV only) |
DELIMITERS |
The list of characters to use as field delimiters (ex: ',' or ';|') - leave blank for automatic detection of the delimiter used by the file (CSV only) |
FIXED_LENGTHS |
Assumes the file does not use delimiters between fields and instead has fixed positions as provided and separated by a comma (ex: 0,15,55,200) (CSV only) |
COMMENT_TOKENS |
Comma-separated list of leading characters used to exclude lines (CSV only) |
LIST |
Returns the list of files without their content (RAW and BYTES only) |
SINGLE_FILE |
Indicates that each file should be processed individually as a separate execution (a new executionid will be created) |
NEW_FILES_ONLY |
Keep a high watermark of the last modified date of a file and only return newer files |
MOVE_TO |
The folder or subfolder to move the file into after successfully extracting its content |
INCLUDE_SUBFOLDERS |
Inspects subfolders to search for files (warning: use carefully when MOVE_TO points to a subfolder) |
FILENAME_COL |
Adds a column with the name provided containing the file name each record was extracted from |
TRACE INTO |
Saves the content of each file into a database for further inspection and troubleshooting purposes |
APPLY PIPELINE |
Executes an inline pipeline consisting of one or more components for each file retrieved |
Example 1
-- Retrieves all parquet files in a cloud drive -- and add a column with the name of the file SELECT * FROM DRIVE [adls2.0] (*.parquet) WITH FORMAT 'Parquet' CONTAINER 'logs' FILENAME_COL '_filename' ;
Example 2
-- Retrieves available files from an SFTP site in a -- given directory, expecting CSV files, and moving -- them to an archive folder upon completion -- FTP files are case-sensitive - return all files under the 'data' -- directory containing Weather in their names SELECT * FROM DRIVE [mysftp] (data/*Weather*) WITH FORMAT 'CSV' SINGLE_FILE -- process one file at a time per execution MOVE_TO '/archive' ;