未验证 提交 7fd87c76 编写于 作者: S ShiningZhang 提交者: GitHub

Merge branch 'develop' into develop

# 如何使用C++定义模型组合
如果您的模型处理过程包含2+的模型推理环节(例如OCR一般需要det+rec两个环节),此时有两种做法可以满足您的需求。
1. 启动两个Serving服务(例如Serving-det, Serving-rec),在您的Client中,读入数据——det前处理——调用Serving-det预测——det后处理——rec前处理——调用Serving-rec预测——rec后处理——输出结果。
- 优点:无须改动Paddle Serving代码
- 缺点:需要两次请求服务,请求数据量越大,效率稍差。
2. 通过修改代码,自定义模型预测行为(自定义OP),自定义服务处理的流程(自定义DAG),将多个模型的组合处理过程(上述的det前处理——调用Serving-det预测——det后处理——rec前处理——调用Serving-rec预测——rec后处理)集成在一个Serving服务中。此时,在您的Client中,读入数据——调用集成后的Serving——输出结果。
- 优点:只需要一次请求服务,效率高。
- 缺点:需要改动代码,且需要重新编译。
本文主要介绍第二种效率高的方法,该方法的基本步骤如下:
1. 自定义OP
2. 自定义DAG
3. 编译
4. 服务启动与调用
# 1. 自定义OP
OP是Paddle Serving服务端的处理流程(即DAG图)的基本组成,参考[从0开始自定义OP](./OP_CN.md),该文档只是讲述了如何自定义一个调用预测的OP节点,您可以在此基础上加上前处理,后处理。
首先获取前置OP的输出,作为本OP的输入,并可以根据自己的需求,通过修改TensorVector* in指向的内存的数据,进行数据的前处理。
``` c++
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name());
const TensorVector *in = &input_blob->tensor_vector;
```
声明本OP的输出
``` c++
GeneralBlob *output_blob = mutable_data<GeneralBlob>();
TensorVector *out = &output_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize();
output_blob->SetBatchSize(batch_size);
```
完成前处理和定义输出变量后,核心调用预测引擎的一句话如下:
``` c++
if (InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
return -1;
}
```
在此之后,模型预测的输出已经写入与OP绑定的TensorVector* out指针变量所指向的内存空间,此时`可以通过修改TensorVector* out指向的内存的数据,进行数据的后处理`,下一个后置OP获取该OP的输出。
最后如果您使用Python API的方式启动Server端,在服务器端为Paddle Serving定义C++运算符后,最后一步是在Python API中为Paddle Serving服务器API添加注册, `python/paddle_serving_server/dag.py`文件里有关于API注册的代码如下
``` python
self.op_dict = {
"general_infer": "GeneralInferOp",
"general_reader": "GeneralReaderOp",
"general_response": "GeneralResponseOp",
"general_text_reader": "GeneralTextReaderOp",
"general_text_response": "GeneralTextResponseOp",
"general_single_kv": "GeneralSingleKVOp",
"general_dist_kv_infer": "GeneralDistKVInferOp",
"general_dist_kv": "GeneralDistKVOp",
"general_copy": "GeneralCopyOp",
"general_detection":"GeneralDetectionOp",
}
```
其中左侧的`”general_infer“名字为自定义(下文有用)`,右侧的`"GeneralInferOp"为自定义的C++OP类的类名`。
在`python/paddle_serving_server/server.py`文件中仅添加`需要加载模型,执行推理预测的自定义的C++OP类的类名`。例如`general_reader`由于只是做一些简单的数据处理而不加载模型调用预测,故在👆的代码中需要添加,而不添加在👇的代码中。
``` python
default_engine_types = [
'GeneralInferOp',
'GeneralDistKVInferOp',
'GeneralDistKVQuantInferOp',
'GeneralDetectionOp',
]
```
# 2. 自定义DAG
DAG图是Server端处理流程的基本定义,在完成上述OP定义的基础上,参考[自定义DAG图](./DAG_CN.md),您可以自行构建Server端多模型(即多个OP)之间的处理逻辑关系。
框架一般需要在开头加上一个`general_reader`,在结尾加上一个`general_response`,中间添加实际需要调用预测的自定义OP,例如`general_infer`就是一个框架定义好的默认OP,它只调用预测,没有前后处理。
例如,对于OCR模型来说,实际是串联det和rec两个模型,我们可以使用一个`自定义的"general_detection"`和`"general_infer"(注意,此处名字必须与上述Python API中严格对应)`构建DAG图,代码(`python/paddle_serving_server/serve.py`)原理如下图所示。
``` python
import paddle_serving_server as serving
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
op_maker = serving.OpMaker()
read_op = op_maker.create('general_reader')
general_detection_op = op_maker.create('general_detection')
general_infer_op = op_maker.create('general_infer')
general_response_op = op_maker.create('general_response')
op_seq_maker = serving.OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_detection_op)
op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op)
```
# 3. 编译
此时,需要您重新编译生成serving,并通过`export SERVING_BIN`设置环境变量来指定使用您编译生成的serving二进制文件,并通过`pip3 install`的方式安装相关python包,细节请参考[如何编译Serving](../Compile_CN.md)
# 4. 服务启动与调用
## 4.1 Server端启动
仍然以OCR模型为例,分别单独启动det单模型和的脚本代码如下:
```python
#分别单独启动模型
python3 -m paddle_serving_server.serve --model ocr_det_model --port 9293#det模型
python3 -m paddle_serving_server.serve --model ocr_rec_model --port 9294#rec模型
```
在前面三个小节工作做好的基础上,一个服务启动两个模型串联,只需要在`--model后依次按顺序传入模型文件夹的相对路径`即可,脚本代码如下:
```python
#一个服务启动多模型串联
python3 -m paddle_serving_server.serve --model ocr_det_model ocr_rec_model --port 9295#多模型串联
```
## 4.2 Client端调用
此时,Client端的调用,也需要传入两个Client端的[proto定义](./Serving_Configure_CN.md),python脚本代码如下:
```python
#一个服务启动多模型串联
python3 ocr_cpp_client.py ocr_det_client ocr_rec_client
#ocr_det_client为第一个模型的Client端proto文件夹相对路径
#ocr_rec_client为第二个模型的Client端proto文件夹相对路径
```
此时,对于Server端而言,`'general_reader'`会检查输入的数据的格式是否与第一个模型的Client端proto格式定义的一致,`'general_response'`会保证输出的数据格式与第二个模型的Client端proto文件一致。
...@@ -16,11 +16,14 @@ ...@@ -16,11 +16,14 @@
### 简单的串联结构 ### 简单的串联结构
PaddleServing在框架中具有一些预定义的计算节点。 一种非常常用的计算图是简单的reader-infer-response模式,可以涵盖大多数单一模型推理方案。 示例图和相应的DAG定义代码如下。 PaddleServing在框架中具有一些预定义的计算节点。 一种非常常用的计算图是简单的reader-infer-response模式,可以涵盖大多数单一模型推理方案。 示例图如下所示。
<center> <center>
<img src='../images/simple_dag.png' width = "260" height = "370" align="middle"/> <img src='../images/simple_dag.png' width = "260" height = "370" align="middle"/>
</center> </center>
通过`Python API 启动Server`相应的DAG定义代码如下(`python/paddle_serving_server/serve.py`)。
``` python ``` python
import paddle_serving_server as serving import paddle_serving_server as serving
from paddle_serving_server import OpMaker from paddle_serving_server import OpMaker
...@@ -37,6 +40,9 @@ op_seq_maker.add_op(general_infer_op) ...@@ -37,6 +40,9 @@ op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
``` ```
如果使用`命令行 + 配置文件的方式启动C++Server`只需[修改配置文件]((./Serving_Configure_CN.md))即可,无须修改👆的代码。
对于简单的串联逻辑,我们将其简化为`Sequence`,使用`OpSeqMaker`进行构建。用户可以不指定每个节点的前继,默认按加入`OpSeqMaker`的顺序来确定前继。 对于简单的串联逻辑,我们将其简化为`Sequence`,使用`OpSeqMaker`进行构建。用户可以不指定每个节点的前继,默认按加入`OpSeqMaker`的顺序来确定前继。
由于该代码在大多数情况下都会被使用,并且用户不必更改代码,因此PaddleServing会发布一个易于使用的启动命令来启动服务。 示例如下: 由于该代码在大多数情况下都会被使用,并且用户不必更改代码,因此PaddleServing会发布一个易于使用的启动命令来启动服务。 示例如下:
......
...@@ -16,12 +16,13 @@ Deep neural nets often have some preprocessing steps on input data, and postproc ...@@ -16,12 +16,13 @@ Deep neural nets often have some preprocessing steps on input data, and postproc
### Simple series structure ### Simple series structure
PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. A example graph and the corresponding DAG definition code is as follows. PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. Here is an example of DAG graph.
<center> <center>
<img src='../images/simple_dag.png' width = "260" height = "370" align="middle"/> <img src='../images/simple_dag.png' width = "260" height = "370" align="middle"/>
</center> </center>
If you want to start the server through the python API. The corresponding DAG definition code is as follows.
``` python ``` python
import paddle_serving_server as serving import paddle_serving_server as serving
from paddle_serving_server import OpMaker from paddle_serving_server import OpMaker
...@@ -38,6 +39,8 @@ op_seq_maker.add_op(general_infer_op) ...@@ -38,6 +39,8 @@ op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
``` ```
If you use `the command line + configuration file method to start C++ server`, you only need to modify [the configuration file](./Serving_Configure_CN.md), don`t need to change any line of 👆 code.
For simple series logic, we simplify it and build it with `OpSeqMaker`. You can determine the successor by default according to the order of joining `OpSeqMaker` without specifying the successor of each node. For simple series logic, we simplify it and build it with `OpSeqMaker`. You can determine the successor by default according to the order of joining `OpSeqMaker` without specifying the successor of each node.
Since the code will be commonly used and users do not have to change the code, PaddleServing releases a easy-to-use launching command for service startup. An example is as follows: Since the code will be commonly used and users do not have to change the code, PaddleServing releases a easy-to-use launching command for service startup. An example is as follows:
......
...@@ -57,7 +57,7 @@ Server端的核心是一个由项目代码编译产生的名称为serving的二 ...@@ -57,7 +57,7 @@ Server端的核心是一个由项目代码编译产生的名称为serving的二
<br> <br>
<p> <p>
C++ Serving框架支持[自定义DAG图](./Model_Ensemble_CN.md)的方式来表示多模型之间串并联组合关系,也支持用户[使用C++开发自定义OP节点](./OP_CN.md)。相比于使用内外两层服务来提供多模型组合处理的方式,由于节省了一次RPC网络传输的开销,把多模型在一个服务中处理性能上会有一定的提升,尤其当RPC通信传输的数据量较大时。 C++ Serving框架支持在一个服务中创建[多模型组合](./2+_model.md),用户可通过[自定义DAG图](./Model_Ensemble_CN.md)的方式来表示多模型之间串并联组合关系,也支持用户[使用C++开发自定义OP节点](./OP_CN.md)。相比于使用内外两层服务来提供多模型组合处理的方式,由于节省了一次RPC网络传输的开销,把多模型在一个服务中处理性能上会有一定的提升,尤其当RPC通信传输的数据量较大时。
### 3.4 模型管理与热加载 ### 3.4 模型管理与热加载
C++ Serving的引擎支持模型管理功能,支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性,需要在服务不中断的情况下对模型进行热加载。C++ Serving对该特性进行了支持,并提供了一个监控产出模型更新本地模型的工具,具体例子请参考《[C++ Serving中的模型热加载](./Hot_Loading_CN.md)》。 C++ Serving的引擎支持模型管理功能,支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性,需要在服务不中断的情况下对模型进行热加载。C++ Serving对该特性进行了支持,并提供了一个监控产出模型更新本地模型的工具,具体例子请参考《[C++ Serving中的模型热加载](./Hot_Loading_CN.md)》。
......
...@@ -113,8 +113,8 @@ int GeneralInferOp::inference() { ...@@ -113,8 +113,8 @@ int GeneralInferOp::inference() {
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
timeline.Start(); timeline.Start();
if (InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)) { if (InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME; LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
return -1; return -1;
} }
...@@ -127,14 +127,15 @@ int GeneralInferOp::inference() { ...@@ -127,14 +127,15 @@ int GeneralInferOp::inference() {
DEFINE_OP(GeneralInferOp); DEFINE_OP(GeneralInferOp);
``` ```
`input_blob``output_blob` 都有很多的 `paddle::PaddleTensor`, 且Paddle预测库会被 `InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)`调用。此函数中的其他大多数代码都与性能分析有关,将来我们也可能会删除多余的代码。 `input_blob``output_blob` 都有很多的 `paddle::PaddleTensor`, 且Paddle预测库会被 `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)`调用。此函数中的其他大多数代码都与性能分析有关,将来我们也可能会删除多余的代码。
基本上,以上代码可以实现一个新的运算符。如果您想访问字典资源,可以参考`core/predictor/framework/resource.cpp`来添加全局可见资源。资源的初始化在启动服务器的运行时执行。 基本上,以上代码可以实现一个新的运算符。如果您想访问字典资源,可以参考`core/predictor/framework/resource.cpp`来添加全局可见资源。资源的初始化在启动服务器的运行时执行。
## 定义 Python API ## 定义 Python API
在服务器端为Paddle Serving定义C++运算符后,最后一步是在Python API中为Paddle Serving服务器API添加注册, `python/paddle_serving_server/__init__.py`文件里有关于API注册的代码如下 在服务器端为Paddle Serving定义C++运算符后,最后一步是在Python API中为Paddle Serving服务器API添加注册, `python/paddle_serving_server/dag.py`文件里有关于API注册的代码如下
``` python ``` python
self.op_dict = { self.op_dict = {
...@@ -147,3 +148,13 @@ self.op_dict = { ...@@ -147,3 +148,13 @@ self.op_dict = {
"general_dist_kv": "GeneralDistKVOp" "general_dist_kv": "GeneralDistKVOp"
} }
``` ```
`python/paddle_serving_server/server.py`文件中仅添加`需要加载模型,执行推理预测的自定义的C++OP类的类名`。例如`general_reader`由于只是做一些简单的数据处理而不加载模型调用预测,故在👆的代码中需要添加,而不添加在👇的代码中。
``` python
default_engine_types = [
'GeneralInferOp',
'GeneralDistKVInferOp',
'GeneralDistKVQuantInferOp',
'GeneralDetectionOp',
]
```
...@@ -112,8 +112,8 @@ int GeneralInferOp::inference() { ...@@ -112,8 +112,8 @@ int GeneralInferOp::inference() {
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
timeline.Start(); timeline.Start();
if (InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)) { if (InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME; LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
return -1; return -1;
} }
...@@ -126,15 +126,16 @@ int GeneralInferOp::inference() { ...@@ -126,15 +126,16 @@ int GeneralInferOp::inference() {
DEFINE_OP(GeneralInferOp); DEFINE_OP(GeneralInferOp);
``` ```
`input_blob` and `output_blob` both have multiple `paddle::PaddleTensor`, and the Paddle Inference library can be called through `InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)`. Most of the other code in this function is about profiling, we may remove redudant code in the future as well. `input_blob` and `output_blob` both have multiple `paddle::PaddleTensor`, and the Paddle Inference library can be called through `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)`. Most of the other code in this function is about profiling, we may remove redudant code in the future as well.
Basically, the above code can implement a new operator. If you want to visit dictionary resource, you can reference `core/predictor/framework/resource.cpp` to add global visible resources. The initialization of resources is executed at the runtime of starting server. Basically, the above code can implement a new operator. If you want to visit dictionary resource, you can reference `core/predictor/framework/resource.cpp` to add global visible resources. The initialization of resources is executed at the runtime of starting server.
## Define Python API ## Define Python API
After you have defined a C++ operator on server side for Paddle Serving, the last step is to add a registration in Python API for PaddleServing server API, `python/paddle_serving_server/__init__.py` in the repo has the code piece. After you have defined a C++ operator on server side for Paddle Serving, the last step is to add a registration in Python API for PaddleServing server API, `python/paddle_serving_server/dag.py` in the repo has the code piece.
``` c++
``` python
self.op_dict = { self.op_dict = {
"general_infer": "GeneralInferOp", "general_infer": "GeneralInferOp",
"general_reader": "GeneralReaderOp", "general_reader": "GeneralReaderOp",
...@@ -145,3 +146,15 @@ self.op_dict = { ...@@ -145,3 +146,15 @@ self.op_dict = {
"general_dist_kv": "GeneralDistKVOp" "general_dist_kv": "GeneralDistKVOp"
} }
``` ```
In `python/paddle_serving_server/server.py` file, only the class name of the C++ OP class that needs to load the model and execute prediction is added.
For example, `general_reader`, need to be added in the 👆 code, but not in the 👇 code. Because it only does some simple data processing without loading the model and call prediction.
``` python
default_engine_types = [
'GeneralInferOp',
'GeneralDistKVInferOp',
'GeneralDistKVQuantInferOp',
'GeneralDetectionOp',
]
```
...@@ -106,9 +106,10 @@ find / -name Python.h ...@@ -106,9 +106,10 @@ find / -name Python.h
2) 设置`PYTHON_LIBRARIES` 2) 设置`PYTHON_LIBRARIES`
搜索 libpython3.7.so 搜索 libpython3.7.so 或 libpython3.7m.so
``` ```
find / -name libpython3.7.so find / -name libpython3.7.so
find / -name libpython3.7m.so
``` ```
通常会有类似于`**/lib/libpython3.7.so`或者`**/lib/x86_64-linux-gnu/libpython3.7.so`出现,我们只需要取它的文件夹目录就好,比如找到`/usr/local/lib/libpython3.7.so`,那么我们只需要`export PYTHON_LIBRARIES=/usr/local/lib`就好。 通常会有类似于`**/lib/libpython3.7.so`或者`**/lib/x86_64-linux-gnu/libpython3.7.so`出现,我们只需要取它的文件夹目录就好,比如找到`/usr/local/lib/libpython3.7.so`,那么我们只需要`export PYTHON_LIBRARIES=/usr/local/lib`就好。
如果没有找到,说明 1)静态编译Python,需要重新安装动态编译的Python 2)全县不足无法查看相关系统目录。 如果没有找到,说明 1)静态编译Python,需要重新安装动态编译的Python 2)全县不足无法查看相关系统目录。
...@@ -132,7 +133,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7 ...@@ -132,7 +133,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
export GOPATH=$HOME/go export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin export PATH=$PATH:$GOPATH/bin
python -m install -r python/requirements.txt python3.7 -m pip install -r python/requirements.txt
go env -w GO111MODULE=on go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct go env -w GOPROXY=https://goproxy.cn,direct
......
...@@ -93,9 +93,10 @@ If not found. Explanation 1) The development version of Python is not installed ...@@ -93,9 +93,10 @@ If not found. Explanation 1) The development version of Python is not installed
2) Set `PYTHON_LIBRARIES` 2) Set `PYTHON_LIBRARIES`
Search for libpython3.7.so Search for libpython3.7.so or libpython3.7m.so
``` ```
find / -name libpython3.7.so find / -name libpython3.7.so
find / -name libpython3.7m.so
``` ```
Usually there will be something similar to `**/lib/libpython3.7.so` or `**/lib/x86_64-linux-gnu/libpython3.7.so`, we only need to take its folder directory, For example, find `/usr/local/lib/libpython3.7.so`, then we only need `export PYTHON_LIBRARIES=/usr/local/lib`. Usually there will be something similar to `**/lib/libpython3.7.so` or `**/lib/x86_64-linux-gnu/libpython3.7.so`, we only need to take its folder directory, For example, find `/usr/local/lib/libpython3.7.so`, then we only need `export PYTHON_LIBRARIES=/usr/local/lib`.
If it is not found, it means 1) Statically compiling Python, you need to reinstall the dynamically compiled Python 2) The county is not enough to view the relevant system catalogs. If it is not found, it means 1) Statically compiling Python, you need to reinstall the dynamically compiled Python 2) The county is not enough to view the relevant system catalogs.
...@@ -119,7 +120,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7 ...@@ -119,7 +120,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
export GOPATH=$HOME/go export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin export PATH=$PATH:$GOPATH/bin
python -m install -r python/requirements.txt python3.7 -m pip install -r python/requirements.txt
go env -w GO111MODULE=on go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct go env -w GOPROXY=https://goproxy.cn,direct
......
...@@ -84,8 +84,8 @@ workdir_9393 ...@@ -84,8 +84,8 @@ workdir_9393
| Argument | Type | Default | Description | | Argument | Type | Default | Description |
| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- | | ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
| `thread` | int | `2` | Number of brpc service thread | | `thread` | int | `2` | Number of brpc service thread |
| `op_num` | int[]| `0` | Thread Number for each model in asynchronous mode | | `runtime_thread_num` | int[]| `0` | Thread Number for each model in asynchronous mode |
| `op_max_batch` | int[]| `32` | Batch Number for each model in asynchronous mode | | `batch_infer_size` | int[]| `32` | Batch Number for each model in asynchronous mode |
| `gpu_ids` | str[]| `"-1"` | Gpu card id for each model | | `gpu_ids` | str[]| `"-1"` | Gpu card id for each model |
| `port` | int | `9292` | Exposed port of current service to users | | `port` | int | `9292` | Exposed port of current service to users |
| `model` | str[]| `""` | Path of paddle model directory to be served | | `model` | str[]| `""` | Path of paddle model directory to be served |
......
...@@ -84,8 +84,8 @@ More flags: ...@@ -84,8 +84,8 @@ More flags:
| Argument | Type | Default | Description | | Argument | Type | Default | Description |
| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- | | ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
| `thread` | int | `2` | Number of brpc service thread | | `thread` | int | `2` | Number of brpc service thread |
| `op_num` | int[]| `0` | Thread Number for each model in asynchronous mode | | `runtime_thread_num` | int[]| `0` | Thread Number for each model in asynchronous mode |
| `op_max_batch` | int[]| `32` | Batch Number for each model in asynchronous mode | | `batch_infer_size` | int[]| `32` | Batch Number for each model in asynchronous mode |
| `gpu_ids` | str[]| `"-1"` | Gpu card id for each model | | `gpu_ids` | str[]| `"-1"` | Gpu card id for each model |
| `port` | int | `9292` | Exposed port of current service to users | | `port` | int | `9292` | Exposed port of current service to users |
| `model` | str[]| `""` | Path of paddle model directory to be served | | `model` | str[]| `""` | Path of paddle model directory to be served |
......
doc/images/wechat_group_1.jpeg

51.1 KB | W: | H:

doc/images/wechat_group_1.jpeg

111.0 KB | W: | H:

doc/images/wechat_group_1.jpeg
doc/images/wechat_group_1.jpeg
doc/images/wechat_group_1.jpeg
doc/images/wechat_group_1.jpeg
  • 2-up
  • Swipe
  • Onion skin
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册