09-udf.md 12.0 KB
Newer Older
D
dingbo 已提交
1
---
2
sidebar_label: UDF
陶建辉(Jeff)'s avatar
陶建辉(Jeff) 已提交
3
title: User Defined Functions(UDF)
4
description: "Scalar functions and aggregate functions developed by users can be utilized by the query framework to expand query capability"
D
dingbo 已提交
5 6
---

7
In some use cases, built-in functions are not adequate for the query capability required by application programs. With UDF, the functions developed by users can be utilized by the query framework to meet business and application requirements. UDF normally takes one column of data as input, but can also support the result of a sub-query as input.
D
dingbo 已提交
8

9
From version 2.2.0.0, UDF written in C/C++ are supported by TDengine.
D
dingbo 已提交
10 11


12 13 14 15 16 17 18 19 20
## Types of UDF

Two kinds of functions can be implemented by UDF: scalar functions and aggregate functions.

Scalar functions return multiple rows and aggregate functions return either 0 or 1 row.

In the case of a scalar function you only have to implement the "normal" function template.

In the case of an aggregate function, in addition to the "normal" function, you also need to implement the "merge" and "finalize" function templates even if the implementation is empty. This will become clear in the sections below.
D
dingbo 已提交
21

22
### Scalar Function
D
dingbo 已提交
23

24
As mentioned earlier, a scalar UDF only has to implement the "normal" function template. The function template below can be used to define your own scalar function.
D
dingbo 已提交
25

26
`void udfNormalFunc(char* data, short itype, short ibytes, int numOfRows, long long* ts, char* dataOutput, char* interBuf, char* tsOutput, int* numOfOutput, short otype, short obytes, SUdfInit* buf)`
D
dingbo 已提交
27

28
`udfNormalFunc` is the place holder for a function name. A function implemented based on the above template can be used to perform scalar computation on data rows. The parameters are fixed to control the data exchange between UDF and TDengine.
D
dingbo 已提交
29

D
dingbo 已提交
30
- Definitions of the parameters:
D
dingbo 已提交
31

32 33 34 35 36 37 38 39
  - data:input data
  - itype:the type of input data, for details please refer to [type definition in column_meta](/reference/rest-api/), for example 4 represents INT
  - iBytes:the number of bytes consumed by each value in the input data
  - oType:the type of output data, similar to iType
  - oBytes:the number of bytes consumed by each value in the output data
  - numOfRows:the number of rows in the input data
  - ts: the column of timestamp corresponding to the input data
  - dataOutput:the buffer for output data, total size is `oBytes * numberOfRows`
40
  - interBuf:the buffer for an intermediate result. Its size is specified by the `BUFSIZE` parameter when creating a UDF. It's normally used when the intermediate result is not same as the final result. This buffer is allocated and freed by TDengine.
41 42 43
  - tsOutput:the column of timestamps corresponding to the output data; it can be used to output timestamp together with the output data if it's not NULL
  - numOfOutput:the number of rows in output data
  - buf:for the state exchange between UDF and TDengine
D
dingbo 已提交
44

45
  [add_one.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c) is one example of a very simple UDF implementation, i.e. one instance of the above `udfNormalFunc` template. It adds one to each value of a passed in column, which can be filtered using the `where` clause, and outputs the result.
46 47 48

### Aggregate Function

49
For aggregate UDF, as mentioned earlier you must implement a "normal" function template (described above) and also implement the "merge" and "finalize" templates.
D
dingbo 已提交
50

51
#### Merge Function Template
D
dingbo 已提交
52

53 54 55 56 57
The function template below can be used to define your own merge function for an aggregate UDF.

`void udfMergeFunc(char* data, int32_t numOfRows, char* dataOutput, int32_t* numOfOutput, SUdfInit* buf)`

`udfMergeFunc` is the place holder for a function name. The function implemented with the above template is used to aggregate intermediate results and can only be used in the aggregate query for STable.
D
dingbo 已提交
58

59
Definitions of the parameters:
D
dingbo 已提交
60

61 62 63 64 65
- data:array of output data, if interBuf is used it's an array of interBuf
- numOfRows:number of rows in `data`
- dataOutput:the buffer for output data, the size is same as that of the final result; If the result is not final, it can be put in the interBuf, i.e. `data`.
- numOfOutput:number of rows in the output data
- buf:for the state exchange between UDF and TDengine
D
dingbo 已提交
66

67
#### Finalize Function Template
D
dingbo 已提交
68

69
The function template below can be used to finalize the result of your own UDF, normally used when interBuf is used.
70

71
`void udfFinalizeFunc(char* dataOutput, char* interBuf, int* numOfOutput, SUdfInit* buf)`
D
dingbo 已提交
72

73
`udfFinalizeFunc` is the place holder of function name, definitions of the parameter are as below:
D
dingbo 已提交
74

75 76 77 78
- dataOutput:buffer for output data
- interBuf:buffer for intermediate result, can be used as input for next processing step
- numOfOutput:number of output data, can only be 0 or 1 for aggregate function
- buf:for state exchange between UDF and TDengine
D
dingbo 已提交
79

80 81 82 83 84 85 86
### Example abs_max.c

[abs_max.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c) is an example of a user defined aggregate function to get the maximum from the absolute values of a column.

The internal processing happens as follows. The results of the select statement are divided into multiple row blocks and `udfNormalFunc`, i.e. `abs_max` in this case, is performed on each row block to generate the intermediate results for each sub table. Then `udfMergeFunc`, i.e. `abs_max_merge` in this case, is performed on the intermediate result of sub tables to aggregate and generate the final or intermediate result of STable. The intermediate result of STable is finally processed by `udfFinalizeFunc`, i.e. `abs_max_finalize` in this example, to generate the final result, which contains either 0 or 1 row.

Other typical aggregation functions such as covariance, can also be implemented using aggregate UDF.
D
dingbo 已提交
87

88
## UDF Naming Conventions
D
dingbo 已提交
89

90 91 92 93 94 95
The naming convention for the 3 kinds of function templates required by UDF is as follows:
 - udfNormalFunc, udfMergeFunc, and udfFinalizeFunc are required to have same prefix, i.e. the actual name of udfNormalFunc. The udfNormalFunc doesn't need a suffix following the function name. 
 - udfMergeFunc should be udfNormalFunc followed by `_merge`
 - udfFinalizeFunc should be udfNormalFunc followed by `_finalize`. 
 
The naming convention is part of TDengine's UDF framework. TDengine follows this convention to invoke the corresponding actual functions.
D
dingbo 已提交
96

97
Depending on whether you are creating a scalar UDF or aggregate UDF, the functions that you need to implement are different.
D
dingbo 已提交
98

99 100 101 102 103 104
- Scalar function:udfNormalFunc is required.
- Aggregate function:udfNormalFunc, udfMergeFunc (if query on STable) and udfFinalizeFunc are required.

For clarity, assuming we want to implement a UDF named "foo":
- If the function is a scalar function, we only need to implement the "normal" function template and it should be named simply `foo`. 
- If the function is an aggregate function, we need to implement `foo`, `foo_merge`, and `foo_finalize`. Note that for aggregate UDF, even though one of the three functions is not necessary, there must be an empty implementation.
D
dingbo 已提交
105

106
## Compile UDF
D
dingbo 已提交
107

108
The source code of UDF in C can't be utilized by TDengine directly. UDF can only be loaded into TDengine after compiling to dynamically linked library (DLL).
D
dingbo 已提交
109

110
For example, the example UDF `add_one.c` mentioned earlier, can be compiled into DLL using the command below, in a Linux Shell.
D
dingbo 已提交
111 112 113 114 115

```bash
gcc -g -O0 -fPIC -shared add_one.c -o add_one.so
```

116
The generated DLL file `add_one.so` can be used later when creating a UDF. It's recommended to use GCC not older than 7.5.
117 118

## Create and Use UDF
D
dingbo 已提交
119

120 121
When a UDF is created in a TDengine instance, it is available across the databases in that instance.

122
### Create UDF
D
dingbo 已提交
123

124
SQL command can be executed on the host where the generated UDF DLL resides to load the UDF DLL into TDengine. This operation cannot be done through REST interface or web console. Once created, any client of the current TDengine can use these UDF functions in their SQL commands. UDF are stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
D
dingbo 已提交
125

126
When creating UDF, the type of UDF, i.e. a scalar function or aggregate function must be specified. If the specified type is wrong, the SQL statements using the function would fail with errors. The input type and output type don't need to be the same in UDF, but the input data type and output data type must be consistent with the UDF definition.
D
dingbo 已提交
127

128
- Create Scalar Function
D
dingbo 已提交
129 130

```sql
131
CREATE FUNCTION userDefinedFunctionName AS "/absolute/path/to/userDefinedFunctionName.so" OUTPUTTYPE <supported TDengine type> [BUFSIZE B];
D
dingbo 已提交
132 133
```

134 135 136 137
- userDefinedFunctionName:The function name to be used in SQL statement which must be consistent with the function name defined by `udfNormalFunc` and is also the name of the compiled DLL (.so file).
- path:The absolute path of the DLL file including the name of the shared object file (.so). The path must be quoted with single or double quotes.
- outputtype:The output data type, the value is the literal string of the supported TDengine data type.
- B:the size of intermediate buffer, in bytes; it is an optional parameter and the range is [0,512].
D
dingbo 已提交
138

139 140 141 142 143
For example, below SQL statement can be used to create a UDF from `add_one.so`.

```sql
CREATE FUNCTION add_one AS "/home/taos/udf_example/add_one.so" OUTPUTTYPE INT;
```
D
dingbo 已提交
144

145
- Create Aggregate Function
D
dingbo 已提交
146 147

```sql
148
CREATE AGGREGATE FUNCTION userDefinedFunctionName AS "/absolute/path/to/userDefinedFunctionName.so" OUTPUTTYPE <supported TDengine data type> [ BUFSIZE B ];
D
dingbo 已提交
149 150
```

151 152 153
- userDefinedFunctionName:the function name to be used in SQL statement which must be consistent with the function name defined by `udfNormalFunc` and is also the name of the compiled DLL (.so file).
- path:the absolute path of the DLL file including the name of the shared object file (.so). The path needs to be quoted by single or double quotes.
- OUTPUTTYPE:the output data type, the value is the literal string of the type
154
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
D
dingbo 已提交
155

156
For details about how to use intermediate result, please refer to example program [demo.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c).
D
dingbo 已提交
157

158
For example, below SQL statement can be used to create a UDF from `demo.so`.
D
dingbo 已提交
159

160 161 162
```sql
CREATE AGGREGATE FUNCTION demo AS "/home/taos/udf_example/demo.so" OUTPUTTYPE DOUBLE bufsize 14;
```
D
dingbo 已提交
163

164 165 166
### Manage UDF

- Delete UDF
D
dingbo 已提交
167 168 169 170 171

```
DROP FUNCTION ids(X);
```

172 173
- ids(X):same as that in `CREATE FUNCTION` statement

D
dingbo 已提交
174 175 176
```sql
DROP FUNCTION add_one;
```
177 178 179

- Show Available UDF

D
dingbo 已提交
180 181 182 183
```sql
SHOW FUNCTIONS;
```

184 185 186
### Use UDF

The function name specified when creating UDF can be used directly in SQL statements, just like builtin functions.
D
dingbo 已提交
187 188

```sql
G
gccgdb1234 已提交
189
SELECT X(c) FROM table/STable;
D
dingbo 已提交
190 191
```

192
The above SQL statement invokes function X for column c.
D
dingbo 已提交
193

194
## Restrictions for UDF
D
dingbo 已提交
195

196
In current version there are some restrictions for UDF
D
dingbo 已提交
197

198 199 200
1. Only Linux is supported when creating and invoking UDF for both client side and server side
2. UDF can't be mixed with builtin functions
3. Only one UDF can be used in a SQL statement
201
4. Only a single column is supported as input for UDF
202 203 204
5. Once created successfully, UDF is persisted in MNode of TDengineUDF
6. UDF can't be created through REST interface
7. The function name used when creating UDF in SQL must be consistent with the function name defined in the DLL, i.e. the name defined by `udfNormalFunc`
205
8. The name of a UDF should not conflict with any of TDengine's built-in functions
D
dingbo 已提交
206

207
## Examples
D
dingbo 已提交
208

209
### Scalar function example [add_one](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c)
D
dingbo 已提交
210 211 212 213 214 215 216 217 218 219

<details>
<summary>add_one.c</summary>

```c
{{#include tests/script/sh/add_one.c}}
```

</details>

220
### Aggregate function example [abs_max](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c)
D
dingbo 已提交
221 222 223 224 225 226 227 228 229 230

<details>
<summary>abs_max.c</summary>

```c
{{#include tests/script/sh/abs_max.c}}
```

</details>

231
### Example for using intermediate result [demo](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c)
D
dingbo 已提交
232 233 234 235 236 237 238 239 240

<details>
<summary>demo.c</summary>

```c
{{#include tests/script/sh/demo.c}}
```

</details>