High Watermark Values
Some jobs support the ability to track the "last highest value" of a field from the data source so that future calls can retrieve only the data that changed. Normally, this value is a DateTime or Timestamp data type, or an integer (or long) value. For example, a database system may have a timestamp field that can be used for a high watermark. Twitter offers an id field that contains a numeric value that keeps growing. A SharePoint List contains a LastModified field that can be used for this purpose.
Generally speaking a high watermark is used as an optimization technique that limits how future data is retrieved so that only the changes are extracted. High watermark values are usually not necessary when the source system is a CDC stream itself or a messaging platform.
Using the High Watermark feature
High watermark values are used differently depending on the type of reader. See the Watermark Pattern documentation for details on how to implement this feature.
View/Edit High Watermark Values
When high watermark values are captured and stored by DataZen, you may view and edit them. In some implementations, when the high watermark value is managed externally, this feature is not available.
To edit high watermark values (last read or last deleted), select the desired job
from the list of jobs in DataZen Manager. Shortly after clicking on it, the right panel shows most
job settings, including
the current high watermark value in Last Read Pointer (in this example, a DateTime).
If the job holds a high watermark, the Edit Pointers button
on the right panel will be enabled; click on it.
This screen shows both the Last Read Pointer and Last Delete Pointer when available. You can manually edit the value. The value can be modified as follows:
- Reset (null): resets the value to NULL; all available data will be read again
- Date/Time Value: Selects a date/time value from a date picker
- Numeric Value: Enter a numeric value
- Custom Value: Free-form text
In some cases, this setting may be an array when a job holds multiple pointers. When entering a date as free-form, use the following notation: YYYY-MM-DD hh:mm:ss.nnn
If not set correctly, changing this value may cause the job to fail.