08-udf.md 10.5 KB
Newer Older
D
dingbo 已提交
1
---
2 3 4
sidebar_label: UDF
title: User Defined Functions
description: "Scalar functions and aggregate functions developed by users can be utilized by the query framework to expand the query capability"
D
dingbo 已提交
5 6
---

7
In some use cases, the query capability required by application programs can't be achieved directly by builtin functions. With UDF, the functions developed by users can be utilized by query framework to meet some special requirements. UDF normally takes one column of data as input, but can also support the result of sub query as input.
D
dingbo 已提交
8

9
From version 2.2.0.0, UDF programmed in C/C++ language can be supported by TDengine.
D
dingbo 已提交
10

11
Two kinds of functions can be implemented by UDF: scalar function and aggregate function.
D
dingbo 已提交
12

13
## Define UDF
D
dingbo 已提交
14

15
### Scalar Function
D
dingbo 已提交
16

17
Below function template can be used to define your own scalar function.
D
dingbo 已提交
18

19
`void udfNormalFunc(char* data, short itype, short ibytes, int numOfRows, long long* ts, char* dataOutput, char* interBuf, char* tsOutput, int* numOfOutput, short otype, short obytes, SUdfInit* buf)`
D
dingbo 已提交
20

21
`udfNormalFunc` is the place holder of function name, a function implemented based on the above template can be used to perform scalar computation on data rows. The parameters are fixed to control the data exchange between UDF and TDengine.
D
dingbo 已提交
22

23
- Defintions of the parameters:
D
dingbo 已提交
24

25 26 27 28 29 30 31 32 33 34 35 36
  - data:input data
  - itype:the type of input data, for details please refer to [type definition in column_meta](/reference/rest-api/), for example 4 represents INT
  - iBytes:the number of bytes consumed by each value in the input data
  - oType:the type of output data, similar to iType
  - oBytes:the number of bytes consumed by each value in the output data
  - numOfRows:the number of rows in the input data
  - ts: the column of timestamp corresponding to the input data
  - dataOutput:the buffer for output data, total size is `oBytes * numberOfRows`
  - interBuf:the buffer for intermediate result, its size is specified by `BUFSIZE` parameter when creating a UDF. It's normally used when the intermediate result is not same as the final result, it's allocated and freed by TDengine.
  - tsOutput:the column of timestamps corresponding to the output data; it can be used to output timestamp together with the output data if it's not NULL
  - numOfOutput:the number of rows in output data
  - buf:for the state exchange between UDF and TDengine
D
dingbo 已提交
37

38 39 40 41 42
  [add_one.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c) is one example of the simplest UDF implementations, i.e. one instance of the above `udfNormalFunc` template. It adds one to each value of a column passed in which can be filtered using `where` clause and outputs the result.

### Aggregate Function

Below function template can be used to define your own aggregate function.
D
dingbo 已提交
43 44 45

`void abs_max_merge(char* data, int32_t numOfRows, char* dataOutput, int32_t* numOfOutput, SUdfInit* buf)`

G
gccgdb1234 已提交
46
`udfMergeFunc` is the place holder of function name, the function implemented with the above template is used to aggregate the intermediate result, only can be used in the aggregate query for STable.
D
dingbo 已提交
47

48
Definitions of the parameters:
D
dingbo 已提交
49

50 51 52 53 54
- data:array of output data, if interBuf is used it's an array of interBuf
- numOfRows:number of rows in `data`
- dataOutput:the buffer for output data, the size is same as that of the final result; If the result is not final, it can be put in the interBuf, i.e. `data`.
- numOfOutput:number of rows in the output data
- buf:for the state exchange between UDF and TDengine
D
dingbo 已提交
55

56
[abs_max.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c) is an user defined aggregate function to get the maximum from the absolute value of a column.
D
dingbo 已提交
57

G
gccgdb1234 已提交
58
The internal processing is that the data affected by the select statement will be divided into multiple row blocks and `udfNormalFunc`, i.e. `abs_max` in this case, is performed on each row block to generate the intermediate of each sub table, then `udfMergeFunc`, i.e. `abs_max_merge` in this case, is performed on the intermediate result of sub tables to aggregate to generate the final or intermediate result of STable. The intermediate result of STable is finally processed by `udfFinalizeFunc` to generate the final result, which contain either 0 or 1 row.
D
dingbo 已提交
59

60
Other typical scenarios, like covariance, can also be achieved by aggregate UDF.
D
dingbo 已提交
61

62 63 64
### Finalize

Below function template can be used to finalize the result of your own UDF, normally used when interBuf is used.
D
dingbo 已提交
65 66 67

`void abs_max_finalize(char* dataOutput, char* interBuf, int* numOfOutput, SUdfInit* buf)`

68
`udfFinalizeFunc` is the place holder of function name, definitions of the parameter are as below:
D
dingbo 已提交
69

70 71 72 73
- dataOutput:buffer for output data
- interBuf:buffer for intermediate result, can be used as input for next processing step
- numOfOutput:number of output data, can only be 0 or 1 for aggregate function
- buf:for state exchange between UDF and TDengine
D
dingbo 已提交
74

75
## UDF Conventions
D
dingbo 已提交
76

77
The naming of 3 kinds of UDF, i.e. udfNormalFunc, udfMergeFunc, and udfFinalizeFunc is required to have same prefix, i.e. the actual name of udfNormalFunc, which means udfNormalFunc doesn't need a suffix following the function name. While udfMergeFunc should be udfNormalFunc followed by `_merge`, udfFinalizeFunc should be udfNormalFunc followed by `_finalize`. The naming convention is part of UDF framework, TDengine follows this convention to invoke corresponding actual functions.\
D
dingbo 已提交
78

79
According to the kind of UDF to implement, the functions that need to be implemented are different.
D
dingbo 已提交
80

81
- Scalar function:udfNormalFunc is required
G
gccgdb1234 已提交
82
- Aggregate function:udfNormalFunc, udfMergeFunc (if query on STable) and udfFinalizeFunc are required
D
dingbo 已提交
83

84
To be more accurate, assuming we want to implement a UDF named "foo". If the function is a scalar function, what we really need to implement is `foo`; if the function is aggregate function, we need to implement `foo`, `foo_merge`, and `foo_finalize`. For aggregate UDF, even though one of the three functions is not necessary, there must be an empty implementation.
D
dingbo 已提交
85

86
## Compile UDF
D
dingbo 已提交
87

88
The source code of UDF in C can't be utilized by TDengine directly. UDF can only be loaded into TDengine after compiling to dynamically linked library.
D
dingbo 已提交
89

90
For example, the example UDF `add_one.c` mentioned in previous sections need to be compiled into DLL using below command on Linux Shell.
D
dingbo 已提交
91 92 93 94 95

```bash
gcc -g -O0 -fPIC -shared add_one.c -o add_one.so
```

96 97 98
The generated DLL file `dd_one.so` can be used later when creating UDF. It's recommended to use GCC not older than 7.5.

## Create and Use UDF
D
dingbo 已提交
99

100
### Create UDF
D
dingbo 已提交
101

102
SQL command can be executed on the same hos where the generated UDF DLL resides to load the UDF DLL into TDengine, this operation can't be done through REST interface or web console. Once created, all the clients of the current TDengine can use these UDF functions in their SQL commands. UDF are stored in the management node of TDengine. The UDFs loaded in TDengine would be still available after TDengine is restarted.
D
dingbo 已提交
103

104
When creating UDF, it needs to be clarified as either scalar function or aggregate function. If the specified type is wrong, the SQL statements using the function would fail with error. Besides, the input type and output type don't need to be same in UDF, but the input data type and output data type need to be consistent with the UDF definition.
D
dingbo 已提交
105

106
- Create Scalar Function
D
dingbo 已提交
107 108 109 110 111

```sql
CREATE FUNCTION ids(X) AS ids(Y) OUTPUTTYPE typename(Z) [ BUFSIZE B ];
```

112 113 114 115
- ids(X):the function name to be sued in SQL statement, must be consistent with the function name defined by `udfNormalFunc`
- ids(Y):the absolute path of the DLL file including the implementation of the UDF, the path needs to be quoted by single or double quotes
- typename(Z):the output data type, the value is the literal string of the type
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
D
dingbo 已提交
116

117 118 119 120 121
For example, below SQL statement can be used to create a UDF from `add_one.so`.

```sql
CREATE FUNCTION add_one AS "/home/taos/udf_example/add_one.so" OUTPUTTYPE INT;
```
D
dingbo 已提交
122

123
- Create Aggregate Function
D
dingbo 已提交
124 125 126 127 128

```sql
CREATE AGGREGATE FUNCTION ids(X) AS ids(Y) OUTPUTTYPE typename(Z) [ BUFSIZE B ];
```

129 130
- ids(X):the function name to be sued in SQL statement, must be consistent with the function name defined by `udfNormalFunc`
- ids(Y):the absolute path of the DLL file including the implementation of the UDF, the path needs to be quoted by single or double quotes
131
- typename(Z):the output data type, the value is the literal string of the type
132
- B:the size of intermediate buffer, in bytes; it's an optional parameter and the range is [0,512]
D
dingbo 已提交
133

134
For details about how to use intermediate result, please refer to example program [demo.c](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c).
D
dingbo 已提交
135

136
For example, below SQL statement can be used to create a UDF rom `demo.so`.
D
dingbo 已提交
137

138 139 140
```sql
CREATE AGGREGATE FUNCTION demo AS "/home/taos/udf_example/demo.so" OUTPUTTYPE DOUBLE bufsize 14;
```
D
dingbo 已提交
141

142 143 144
### Manage UDF

- Delete UDF
D
dingbo 已提交
145 146 147 148 149

```
DROP FUNCTION ids(X);
```

150 151
- ids(X):same as that in `CREATE FUNCTION` statement

D
dingbo 已提交
152 153 154
```sql
DROP FUNCTION add_one;
```
155 156 157

- Show Available UDF

D
dingbo 已提交
158 159 160 161
```sql
SHOW FUNCTIONS;
```

162 163 164
### Use UDF

The function name specified when creating UDF can be used directly in SQL statements, just like builtin functions.
D
dingbo 已提交
165 166

```sql
G
gccgdb1234 已提交
167
SELECT X(c) FROM table/STable;
D
dingbo 已提交
168 169
```

170
The above SQL statement invokes function X for column c.
D
dingbo 已提交
171

172
## Restrictions for UDF
D
dingbo 已提交
173

174
In current version there are some restrictions for UDF
D
dingbo 已提交
175

176 177 178 179 180 181 182 183
1. Only Linux is supported when creating and invoking UDF for both client side and server side
2. UDF can't be mixed with builtin functions
3. Only one UDF can be used in a SQL statement
4. Single column is supported as input for UDF
5. Once created successfully, UDF is persisted in MNode of TDengineUDF
6. UDF can't be created through REST interface
7. The function name used when creating UDF in SQL must be consistent with the function name defined in the DLL, i.e. the name defined by `udfNormalFunc`
8. The name name of UDF name should not conflict with any of builtin functions
D
dingbo 已提交
184

185
## Examples
D
dingbo 已提交
186

187
### Scalar function example [add_one](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/add_one.c)
D
dingbo 已提交
188 189 190 191 192 193 194 195 196 197

<details>
<summary>add_one.c</summary>

```c
{{#include tests/script/sh/add_one.c}}
```

</details>

198
### Aggregate function example [abs_max](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/abs_max.c)
D
dingbo 已提交
199 200 201 202 203 204 205 206 207 208

<details>
<summary>abs_max.c</summary>

```c
{{#include tests/script/sh/abs_max.c}}
```

</details>

209
### Example for using intermediate result [demo](https://github.com/taosdata/TDengine/blob/develop/tests/script/sh/demo.c)
D
dingbo 已提交
210 211 212 213 214 215 216 217 218

<details>
<summary>demo.c</summary>

```c
{{#include tests/script/sh/demo.c}}
```

</details>