Building a .NET DLL

This section provides initial guidance on how to extend Data Pipeline Components using a custom-built .NET DLL. One of the major advantages of building a custom .NET DLL is the ability to inject additional logic into the data pipeline execution using highly tuned routines, including embedded C libraries, on the fly, without the need to stage the data first.

A practical example of such a feature is to inject .NET ML (Machine Learning) models directly as part of your data pipelines. If you have have trained a model and have it available for consumption, all you need to do is expose it using a wrapper .NET class and import the project DLLs into DataZen.

.NET Specification

You can use Visual Studio or Visual Code to build your .NET DLL, as long as it is .NET Framework 4.7 or higher. Choose a Class Library (.NET Framework) project type. By convention, only public methods with the following signatures will be exposed in DataZen:

  • void method()
  • void method(DataTable)
  • void method(string)
  • void method(DataTable, string)
  • byte[] method()
  • byte[] method(DataTable)
  • byte[] method(string)
  • byte[] method(DataTable, string)
  • DataTable method()
  • DataTable method(DataTable)
  • DataTable method(string)
  • DataTable method(DataTable, string)
  • string method()
  • string method(DataTable)
  • string method(string)
  • string method(DataTable, string)

Output Parameters

The data type of the method can be void, byte[], string, or a DataTable. Returning void is preferred when the intent is to let the pipeline data set flow through the component unchanged. When a DataTable is returned, it replaces the current set completely. To enhance the data set, use a DataTable input parameter, add data columns to it as needed, then return the modified data set.

When a string or byte[] data type is used, the data set of the pipeline will be replaced with a single-row and single-column data set holding the value returned. The name of the column will always be set to payload.

Input Parameters

The name of the method or the name of its input parameters is not relevant. The following convention is used for input data types:

  • string: the string parameter will receive the decrypted connection string selected in the configuration screen in DataZen. If no connection string is selected, NULL will be sent to this parameter
  • DataTable: the current data set of the data pipeline will be sent in this parameter

For example, declaring a method like this will accept both the selected connection string and the current data pipeline, but no data will be returned to DataZen.

public void Log(DataTable input, string connection)
{
    // In this example, connection is a full connection string to a database, as stored in DataZen 
    using (SqlConnection conn = new SqlConnection(connection))
    {
        conn.Open();
        foreach (DataRow row in input.Rows)
        {
            // do something with each row
        }
    }            
}

Cancellation

In order to be fully supported in a DataZen Data Pipeline, a single CancellationTokenSource field should be declared in your DLL. This will allow your custom DLL to participate fully in the cancellation operation, both while debugging and running in production. There should be only one public CancellationTokenSource field defined. If you do not add this field, a warning will be displayed in DataZen.

public CancellationTokenSource token = new CancellationTokenSource();

You can now use this token in your method as such:

public void Log(DataTable input, string connection)
{
    // In this example, connection is a full connection string to a database, as stored in DataZen 
    using (SqlConnection conn = new SqlConnection(connection))
    {
        conn.Open();
        foreach (DataRow row in input.Rows)
        {
            if (token.Token.IsCancellationRequested) break;
            // do something with each row
        }
    }            
}

Exceptions

You can also throw exceptions or let exceptions bubble up to DataZen; the actual exception will be captured and logged so you can further analyze any issues inside your custom .NET DLL when it is running within DataZen. It is a good practice to use a meaning exception type so you can more easily identify the source of the exception in your code.

Throwing an exception, or letting an exception bubble up, will immediately terminate the data pipeline. If you would like to silence exceptions, simply wrap your code with a try/catch.

public void Log(DataTable input, string connection)
{
    if (string.IsNullOrEmpty(connection)) throw new InvalidOperationException("ERROR: Connection missing!");
    // In this example, connection is a full connection string to a database, as stored in DataZen 
    using (SqlConnection conn = new SqlConnection(connection))
    {
        conn.Open();
        foreach (DataRow row in input.Rows)
        {
            if (token.IsCancellationRequested) break;
            // do something with each row
        }
    }            
}

Synchronous vs. Async Programming

Due to code isolation, memory management and other constraints, the only supported programming model is Synchronous. Each external DLL is loaded in its own AppDomain; as soon as the method call returns, the AppDomain may be unloaded. For performance reasons, the AppDomain may remain active longer, but this behavior is not guaranteed.

To implement an asynchronous programming model, you can use cloud-based Queues and Busses.