Window Capture

A window capture pattern allows you to request data from a source system using a date range that overlaps with already processed data. This pattern is usually needed when the source system does not support the use of a high watermark. For example, a data set that has an updatedOn field, but the date field does not accept time values with seconds, may require this pattern.

If the source system does support high watermark but may return duplicate records from time to time, see the composite watermark + cdc pattern.

The selected window should be large enough to account for system downtime and network problems. In addition, since this pattern returns duplicate records, the use of Synthetic Change Capture is also necessary to filter out records that were already processed.

By design, this pattern cannot be used to detect deleted records. If identifying deleted records is important when using this pattern, another job pipeline should be created to perform this operation and should be sequenced in such a way that it would run immediately after the completion of the window capture job pipeline.

Pattern Overview

This pattern describes a forward-only read from the source system using a data selection approach that returns duplicate data on purpose, by using a sliding window, and an additional CDC operation that filters out the duplicate data as a secondary step.

Various fields can be used for implementing the sliding window, including datetime values, numeric fields, or timestamp bytes.

It is common for this pattern to use large sliding windows, such as 30 days or more, to avoid possible gaps in data due to network issues or source system downtime.

DataZen Implementation

The sliding window operation is implemented differently in DataZen depending on the source system; however, the Synthetic CDC setting is similar regardless of the source system.

Source System Implementation Comments
Relational Database Using a SQL WHERE clause A date range is a popular way to implement a sliding window (ex: all records that were modifed within the last 10 days)
HTTP Using a query parameter or payload The implementation varies widely depending on the HTTP API
Drive Not applicable Not applicable
Big Data / NoSQL Using a SQL WHERE clause If the engine supports it, a date range is a popular way to implement a sliding window (ex: all records that were modifed within the last 10 days)
Messaging Consumers No Not applicable


Once the sliding window is implemented, the Synthetic CDC pattern is then applied. See the Synthetic CDC Pattern for more information.

Database Source

Using a WHERE clause as part of the SELECT operation is the simplest way to implement a sliding window for relational databases. In the example below, a 5-day window is used as part of the WHERE clause.

To setup the CDC operation, go to the Replication Settings tab and set the fields that identify the unique records; in this case, the id and gameid fields identify a unique record.

HTTP/S Source

Using a sliding window for HTTP/S data sources vary widely depending on the API being called and may not always be supported. When the HTTP API supports the ability to filter based on a timestamp, either as part of the URI (as a query parameter) or the payload (usually in the form of a POST or PUT request) such as a GQL (Graph Query Language) request, craft the HTTP command accordingly, using DataZen functions or a calculated value provided by a Dynamic Parameter.

Using DataZen Functions

DataZen functions can be used as part of the HTTP URI or the payload. For example, the #dateadd() function, combined with #now() could be used to construct a URI with the desired window. The following URI assumes that the API accepts a fromdate parameter.

/getdata?fromdate=#dateadd(#utcnow(), -5d )

If precise date formatting is needed, you can also specify the format to use:

/getdata?fromdate=#formatdate(#dateadd(#utcnow(), -5d ), MM/dd/yy H:mm:ss)

In this example, we are providing two parameters: dateFrom and dateTo, with a 1-day sliding window from today's date.
dateFrom=#dateadd(#now(), -1d, yyyy-MM-dd)&dateTo=#dateformat(#now(), yyyy-MM-dd)

Using Dynamic Parameters

Dynamic Parameters allow you to specify input parameters for your HTTP operation as an SQL operation against a database. The primary difference is that with Dynamic Parameters, you can make multiple calls to the same HTTP endpoint, using the list of input parameters, each row representing a new set of input. Dynamic Parameters can be very simple SQL commands, or as complex as needed. If the result of the SQL command returns 0 records, the call to the HTTP endpoint is aborted.

In this example, a SELECT operation is performed against a database view. Each column can be accessed as part of the HTTP URI or payload using the {{@param.columnName}} syntax. For example, since the SQL command returns both the dateFrom and dateTo parameters needed, the URI syntax would become:
dateFrom=#dateadd({{@param.dateFrom}}, -1d, yyyy-MM-dd)&dateTo=#dateformat({{@param.dateTo}}, yyyy-MM-dd)
And if the dates are already propertly calculated and formatted as part of the SQL command, the URI could be simplified to:
dateFrom={{@param.dateFrom}}&dateTo={{@param.dateTo}}

Drive Source

Window handling is not supported for Messaging sources.

Big Data / NoSQL Source

Window handling may be supported if the source system supports SQL operations similar to what is described in the Database Source section above.

Messaging Source

Window handling is not supported for Messaging sources.