@@ -6,10 +6,12 @@ description: This document describes how to create user-defined functions (UDF),
The built-in functions of TDengine may not be sufficient for the use cases of every application. In this case, you can define custom functions for use in TDengine queries. These are known as user-defined functions (UDF). A user-defined function takes one column of data or the result of a subquery as its input.
TDengine supports user-defined functions written in C or C++. This document describes the usage of user-defined functions.
User-defined functions can be scalar functions or aggregate functions. Scalar functions, such as `abs`, `sin`, and `concat`, output a value for every row of data. Aggregate functions, such as `avg` and `max` output one value for multiple rows of data.
TDengine supports user-defined functions written in C or Python. This document describes the usage of user-defined functions.
# Implement a UDF in C
When you create a user-defined function, you must implement standard interface functions:
- For scalar functions, implement the `scalarfn` interface function.
- For aggregate functions, implement the `aggfn_start`, `aggfn`, and `aggfn_finish` interface functions.
...
...
@@ -17,7 +19,7 @@ When you create a user-defined function, you must implement standard interface f
There are strict naming conventions for these interface functions. The names of the start, finish, init, and destroy interfaces must be <udf-name\>_start, <udf-name\>_finish, <udf-name\>_init, and <udf-name\>_destroy, respectively. Replace `scalarfn`, `aggfn`, and `udf` with the name of your user-defined function.
## Implementing a Scalar Function
## Implementing a Scalar Function in C
The implementation of a scalar function is described as follows:
```c
#include "taos.h"
...
...
@@ -49,7 +51,7 @@ int32_t scalarfn_destroy() {
```
Replace `scalarfn` with the name of your function.
## Implementing an Aggregate Function
### Implementing an Aggregate Function in C
The implementation of an aggregate function is described as follows:
```c
...
...
@@ -100,7 +102,7 @@ int32_t aggfn_destroy() {
```
Replace `aggfn` with the name of your function.
## Interface Functions
## C UDF Interface Functions
There are strict naming conventions for interface functions. The names of the start, finish, init, and destroy interfaces must be <udf-name\>_start, <udf-name\>_finish, <udf-name\>_init, and <udf-name\>_destroy, respectively. Replace `scalarfn`, `aggfn`, and `udf` with the name of your user-defined function.
...
...
@@ -108,7 +110,7 @@ Interface functions return a value that indicates whether the operation was succ
For information about the parameters for interface functions, see Data Model
Replace `aggfn` with the name of your function. In the function, aggfn_start is called to generate a result buffer. Data is then divided between multiple blocks, and aggfn is called on each block to update the result. Finally, aggfn_finish is called to generate final results from the intermediate results. The final result contains only one or zero data points.
Replace `aggfn` with the name of your function. In the function, aggfn_start is called to generate a result buffer. Data is then divided between multiple blocks, and the `aggfn` function is called on each block to update the result. Finally, aggfn_finish is called to generate the final results from the intermediate results. The final result contains only one or zero data points.
The parameters in the function are defined as follows:
- interBuf: The intermediate result buffer.
...
...
@@ -135,15 +137,15 @@ The parameters in the function are defined as follows:
- result: The final result.
### Initializing and Terminating User-Defined Functions
### C UDF Initializing and Terminating User-Defined Functions
`int32_t udf_init()`
`int32_t udf_destroy()`
Replace `udf`with the name of your function. udf_init initializes the function. udf_destroy terminates the function. If it is not necessary to initialize your function, udf_init is not required. If it is not necessary to terminate your function, udf_destroy is not required.
Replace `udf`with the name of your function. udf_init initializes the function. udf_destroy terminates the function. If it is not necessary to initialize your function, udf_init is not required. If it is not necessary to terminate your function, udf_destroy is not required.
## Data Structure of User-Defined Functions
## Data Structure of C User-Defined Functions
```c
typedefstructSUdfColumnMeta{
int16_ttype;
...
...
@@ -193,7 +195,7 @@ typedef struct SUdfInterBuf {
```
The data structure is described as follows:
- The SUdfDataBlock block includes the number of rows (numOfRows) and number of columns (numCols). udfCols[i] (0 <= i <= numCols-1) indicates that each column is of type SUdfColumn.
- The SUdfDataBlock block includes the number of rows (numOfRows) and the number of columns (numCols). udfCols[i] (0 <= i <= numCols-1) indicates that each column is of type SUdfColumn.
- SUdfColumn includes the definition of the data type of the column (colMeta) and the data in the column (colData).
- The member definitions of SUdfColumnMeta are the same as the data type definitions in `taos.h`.
- The data in SUdfColumnData can become longer. varLenCol indicates variable-length data, and fixLenCol indicates fixed-length data.
...
...
@@ -201,9 +203,9 @@ The data structure is described as follows:
Additional functions are defined in `taosudf.h` to make it easier to work with these structures.
## Compile UDF
## Compile C UDF
To use your user-defined function in TDengine, first compile it to a dynamically linked library (DLL).
To use your user-defined function in TDengine, first, compile it to a shared library.
For example, the sample UDF `bit_and.c` can be compiled into a DLL as follows:
### C UDF Sample scalar function: [bit_and](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/bit_and.c)
The bit_and function implements bitwise addition for multiple columns. If there is only one column, the column is returned. The bit_and function ignores null values.
...
...
@@ -231,7 +230,7 @@ The bit_and function implements bitwise addition for multiple columns. If there
### C UDF Sample aggregate function 1: [l2norm](https://github.com/taosdata/TDengine/blob/3.0/tests/script/sh/l2norm.c)
The l2norm function finds the second-order norm for all data in the input column. This squares the values, takes a cumulative sum, and finds the square root.
...
...
@@ -243,3 +242,151 @@ The l2norm function finds the second-order norm for all data in the input column
```
</details>
### C UDF Sample aggregate function 2: [max_vol](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/max_vol.c)
The max_vol function returns a string concatenating the deviceId column, the row number and column number of the maximum voltage and the maximum voltage given several voltage columns as input.
-`input` is a data block two-dimension matrix-like object, of which method `data(row, col)` returns the Python object located at location (`row`, `col`)
- return a Python tuple object, of which each item is a Python object of type `output_type`
### Python UDF aggregate interface functions
```Python
def start() -> bytes:
def reduce(input: datablock, buf: bytes) -> bytes
def finish(buf: bytes) -> output_type:
```
- first `start()` is called to return the initial result in type `bytes`
- then the input data are divided into multiple data blocks and for each block `input`, `reduce` is called with the data block `input` and the current result `buf` bytes and generates a new intermediate result buffer.
- finally, the `finish` function is called on the intermediate result `buf` and outputs 0 or 1 data of type `output_type`
### Python UDF Initialization and Termination
```Python
def init()
def destroy()
```
Implement `init` for initialization and `destroy` for termination.
## TDengine SQL data type and Python UDF Data Type Mapping Table
The following table describes the mapping between TDengine SQL data type and Python UDF Data Type. The `NULL` value of all TDengine SQL types is mapped to the `None` value in Python.
| **TDengine SQL Data Type** | **Python Data Type** |
| :-----------------------: | ------------ |
|TINYINT / SMALLINT / INT / BIGINT | int |
|TINYINT UNSIGNED / SMALLINT UNSIGNED / INT UNSIGNED / BIGINT UNSIGNED | int |
|FLOAT / DOUBLE | float |
|BOOL | bool |
|BINARY / VARCHAR / NCHAR | bytes|
|TIMESTAMP | int |
|JSON and other types | Not Supported |
## Python UDF Installation
1. Install Python package `taospyudf` that executes Python UDF
```bash
sudo pip install taospyudf
ldconfig
```
2. If PYTHONPATH is needed to find Python packages when the Python UDF executes, include the PYTHONPATH contents into the udfdLdLibPath variable of the taos.cfg configuration file
## Python UDF Sample Code
### Python UDF Scalar Function Sample Code [pybitand](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/pybitand.py)
The `pybitand` function implements bitwise addition for multiple columns. If there is only one column, the column is returned. The `pybitand` function ignores null values.
<details>
<summary>pybitand.py</summary>
```Python
{{#include tests/script/sh/pybitand.py}}
```
</details>
### Python UDF Aggregate Function Sample Code [pyl2norm](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/pyl2norm.py)
The `pyl2norm` function finds the second-order norm for all data in the input column. This squares the values, takes a cumulative sum, and finds the square root.
<details>
<summary>pyl2norm.py</summary>
```c
{{#includetests/script/sh/pyl2norm.py}}
```
</details>
## Manage and Use User-Defined Functions
You can add UDF to TDengine before using it in SQL queries. For more information, see [User-Defined Functions](../12-taos-sql/26-udf.md).
@@ -7,17 +7,18 @@ description: This document describes the SQL statements related to user-defined
You can create user-defined functions and import them into TDengine.
## Create UDF
SQL command can be executed on the host where the generated UDF DLL resides to load the UDF DLL into TDengine. This operation cannot be done through REST interface or web console. Once created, any client of the current TDengine can use these UDF functions in their SQL commands. UDF are stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
SQL command can be executed on the host where the generated UDF DLL resides to load the UDF DLL into TDengine. This operation cannot be done through REST interface or web console. Once created, any client of the current TDengine can use these UDF functions in their SQL commands. UDF is stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
When creating UDF, the type of UDF, i.e. a scalar function or aggregate function must be specified. If the specified type is wrong, the SQL statements using the function would fail with errors. The input data type and output data type must be consistent with the UDF definition.
- function_name: The scalar function name to be used in SQL statement which must be consistent with the UDF name and is also the name of the compiled DLL (.so file).
- library_path: The absolute path of the DLL file including the name of the shared object file (.so). The path must be quoted with single or double quotes.
- OR REPLACE: if the UDF exists, the UDF properties are modified
- function_name: The scalar function name to be used in the SQL statement
- LANGUAGE 'C|Python': the programming language of UDF. Now C or Python is supported. If this clause is omitted, C is assumed as the programming language.
- library_path: For C programming language, The absolute path of the DLL file including the name of the shared object file (.so). For Python programming language, the absolute path of the Python UDF script. The path must be quoted with single or double quotes.
- output_type: The data type of the results of the UDF.
For example, the following SQL statement can be used to create a UDF from `libbitand.so`.
...
...
@@ -25,14 +26,20 @@ CREATE FUNCTION function_name AS library_path OUTPUTTYPE output_type;
For Example, the following SQL statement can be used to modify the existing function `bit_and`. The OUTPUT type is changed to BIGINT and the programming language is changed to Python.
- function_name: The aggregate function name to be used in SQL statement which must be consistent with the udfNormalFunc name and is also the name of the compiled DLL (.so file).
- library_path: The absolute path of the DLL file including the name of the shared object file (.so). The path must be quoted with single or double quotes.
- OR REPLACE: if the UDF exists, the UDF properties are modified
- function_name: The aggregate function name to be used in the SQL statement
- LANGUAGE 'C|Python': the programming language of the UDF. Now C or Python is supported. If this clause is omitted, C is assumed as the programming language.
- library_path: For C programming language, The absolute path of the DLL file including the name of the shared object file (.so). For Python programming language, the absolute path of the Python UDF script. The path must be quoted with single or double quotes.
- output_type: The output data type, the value is the literal string of the supported TDengine data type.
- buffer_size: The size of the intermediate buffer in bytes. This parameter is optional.
...
...
@@ -41,6 +48,11 @@ CREATE AGGREGATE FUNCTION function_name AS library_path OUTPUTTYPE output_type [