Merge branch 'develop' into develop

7fd87c76 · ShiningZhang · GitHub · cd809d8c · 472b3daa · 7fd87c76
11 changed file
--- a/doc/C++_Serving/2+_model.md
+++ b/doc/C++_Serving/2+_model.md
+# 如何使用C++定义模型组合
+如果您的模型处理过程包含2+的模型推理环节（例如OCR一般需要det+rec两个环节），此时有两种做法可以满足您的需求。
+1. 启动两个Serving服务（例如Serving-det, Serving-rec），在您的Client中，读入数据——det前处理——调用Serving-det预测——det后处理——rec前处理——调用Serving-rec预测——rec后处理——输出结果。
+    - 优点：无须改动Paddle Serving代码
+    - 缺点：需要两次请求服务，请求数据量越大，效率稍差。
+2. 通过修改代码，自定义模型预测行为（自定义OP），自定义服务处理的流程（自定义DAG），将多个模型的组合处理过程(上述的det前处理——调用Serving-det预测——det后处理——rec前处理——调用Serving-rec预测——rec后处理)集成在一个Serving服务中。此时，在您的Client中，读入数据——调用集成后的Serving——输出结果。
+    - 优点：只需要一次请求服务，效率高。
+    - 缺点：需要改动代码，且需要重新编译。
+本文主要介绍第二种效率高的方法，该方法的基本步骤如下：
+1. 自定义OP
+2. 自定义DAG
+3. 编译
+4. 服务启动与调用
+# 1. 自定义OP
+OP是Paddle Serving服务端的处理流程（即DAG图）的基本组成，参考[从0开始自定义OP](./OP_CN.md)，该文档只是讲述了如何自定义一个调用预测的OP节点，您可以在此基础上加上前处理，后处理。
+首先获取前置OP的输出，作为本OP的输入，并可以根据自己的需求，通过修改TensorVector* in指向的内存的数据，进行数据的前处理。
+``` c++
+  const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name());
+  const TensorVector *in = &input_blob->tensor_vector;
+```
+声明本OP的输出
+``` c++
+  GeneralBlob *output_blob = mutable_data<GeneralBlob>();
+  TensorVector *out = &output_blob->tensor_vector;
+  int batch_size = input_blob->GetBatchSize();
+  output_blob->SetBatchSize(batch_size);
+```
+完成前处理和定义输出变量后，核心调用预测引擎的一句话如下：
+``` c++
+if (InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)) {
+    LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
+    return -1;
+}
+```
+在此之后，模型预测的输出已经写入与OP绑定的TensorVector* out指针变量所指向的内存空间，此时`可以通过修改TensorVector* out指向的内存的数据，进行数据的后处理`，下一个后置OP获取该OP的输出。
+最后如果您使用Python API的方式启动Server端，在服务器端为Paddle Serving定义C++运算符后，最后一步是在Python API中为Paddle Serving服务器API添加注册， `python/paddle_serving_server/dag.py`文件里有关于API注册的代码如下
+``` python
+self.op_dict = {
+            "general_infer": "GeneralInferOp",
+            "general_reader": "GeneralReaderOp",
+            "general_response": "GeneralResponseOp",
+            "general_text_reader": "GeneralTextReaderOp",
+            "general_text_response": "GeneralTextResponseOp",
+            "general_single_kv": "GeneralSingleKVOp",
+            "general_dist_kv_infer": "GeneralDistKVInferOp",
+            "general_dist_kv": "GeneralDistKVOp",
+            "general_copy": "GeneralCopyOp",
+            "general_detection":"GeneralDetectionOp",
+        }
+```
+其中左侧的`”general_infer“名字为自定义（下文有用）`，右侧的`"GeneralInferOp"为自定义的C++OP类的类名`。
+在`python/paddle_serving_server/server.py`文件中仅添加`需要加载模型，执行推理预测的自定义的C++OP类的类名`。例如`general_reader`由于只是做一些简单的数据处理而不加载模型调用预测，故在👆的代码中需要添加，而不添加在👇的代码中。
+``` python
+default_engine_types = [
+                'GeneralInferOp',
+                'GeneralDistKVInferOp',
+                'GeneralDistKVQuantInferOp',
+                'GeneralDetectionOp',
+            ]
+```
+# 2. 自定义DAG
+DAG图是Server端处理流程的基本定义，在完成上述OP定义的基础上，参考[自定义DAG图](./DAG_CN.md)，您可以自行构建Server端多模型（即多个OP）之间的处理逻辑关系。
+框架一般需要在开头加上一个`general_reader`，在结尾加上一个`general_response`，中间添加实际需要调用预测的自定义OP，例如`general_infer`就是一个框架定义好的默认OP,它只调用预测，没有前后处理。
+例如，对于OCR模型来说，实际是串联det和rec两个模型，我们可以使用一个`自定义的"general_detection"`和`"general_infer"(注意，此处名字必须与上述Python API中严格对应)`构建DAG图，代码(`python/paddle_serving_server/serve.py`）原理如下图所示。
+``` python
+import paddle_serving_server as serving
+from paddle_serving_server import OpMaker
+from paddle_serving_server import OpSeqMaker
+op_maker = serving.OpMaker()
+read_op = op_maker.create('general_reader')
+general_detection_op = op_maker.create('general_detection')
+general_infer_op = op_maker.create('general_infer')
+general_response_op = op_maker.create('general_response')
+op_seq_maker = serving.OpSeqMaker()
+op_seq_maker.add_op(read_op)
+op_seq_maker.add_op(general_detection_op)
+op_seq_maker.add_op(general_infer_op)
+op_seq_maker.add_op(general_response_op)
+```
+# 3. 编译
+此时，需要您重新编译生成serving，并通过`export SERVING_BIN`设置环境变量来指定使用您编译生成的serving二进制文件，并通过`pip3 install`的方式安装相关python包，细节请参考[如何编译Serving](../Compile_CN.md)
+# 4. 服务启动与调用
+## 4.1 Server端启动
+仍然以OCR模型为例，分别单独启动det单模型和的脚本代码如下：
+```python
+#分别单独启动模型
+python3 -m paddle_serving_server.serve --model ocr_det_model --port 9293#det模型
+python3 -m paddle_serving_server.serve --model ocr_rec_model --port 9294#rec模型
+```
+在前面三个小节工作做好的基础上，一个服务启动两个模型串联，只需要在`--model后依次按顺序传入模型文件夹的相对路径`即可，脚本代码如下：
+```python
+#一个服务启动多模型串联
+python3 -m paddle_serving_server.serve --model ocr_det_model ocr_rec_model --port 9295#多模型串联
+```
+## 4.2 Client端调用
+此时，Client端的调用，也需要传入两个Client端的[proto定义](./Serving_Configure_CN.md)，python脚本代码如下：
+```python
+#一个服务启动多模型串联
+python3 ocr_cpp_client.py ocr_det_client ocr_rec_client
+#ocr_det_client为第一个模型的Client端proto文件夹相对路径
+#ocr_rec_client为第二个模型的Client端proto文件夹相对路径
+```
+此时，对于Server端而言，`'general_reader'`会检查输入的数据的格式是否与第一个模型的Client端proto格式定义的一致，`'general_response'`会保证输出的数据格式与第二个模型的Client端proto文件一致。
--- a/doc/C++_Serving/DAG_CN.md
+++ b/doc/C++_Serving/DAG_CN.md
@@ -16,11 +16,14 @@
 ### 简单的串联结构
-PaddleServing在框架中具有一些预定义的计算节点。 一种非常常用的计算图是简单的reader-infer-response模式，可以涵盖大多数单一模型推理方案。 示例图和相应的DAG定义代码如下。
+PaddleServing在框架中具有一些预定义的计算节点。 一种非常常用的计算图是简单的reader-infer-response模式，可以涵盖大多数单一模型推理方案。 示例图如下所示。
 <center>
 <img src='../images/simple_dag.png' width = "260" height = "370" align="middle"/>
 </center>
+通过`Python API 启动Server`相应的DAG定义代码如下（`python/paddle_serving_server/serve.py`）。
 ``` python
 import paddle_serving_server as serving
 from paddle_serving_server import OpMaker
@@ -37,6 +40,9 @@ op_seq_maker.add_op(general_infer_op)
 op_seq_maker.add_op(general_response_op)
 ```
+如果使用`命令行 + 配置文件的方式启动C++Server`只需[修改配置文件]((./Serving_Configure_CN.md))即可,无须修改👆的代码。
 对于简单的串联逻辑，我们将其简化为`Sequence`，使用`OpSeqMaker`进行构建。用户可以不指定每个节点的前继，默认按加入`OpSeqMaker`的顺序来确定前继。
 由于该代码在大多数情况下都会被使用，并且用户不必更改代码，因此PaddleServing会发布一个易于使用的启动命令来启动服务。 示例如下：

--- a/doc/C++_Serving/DAG_EN.md
+++ b/doc/C++_Serving/DAG_EN.md
@@ -16,12 +16,13 @@ Deep neural nets often have some preprocessing steps on input data, and postproc
 ### Simple series structure
-PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. A example graph and the corresponding DAG definition code is as follows.
+PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. Here is an example of DAG graph.
 <center>
 <img src='../images/simple_dag.png' width = "260" height = "370" align="middle"/>
 </center>
+If you want to start the server through the python API. The corresponding DAG definition code is as follows.
 ``` python
 import paddle_serving_server as serving
 from paddle_serving_server import OpMaker
@@ -38,6 +39,8 @@ op_seq_maker.add_op(general_infer_op)
 op_seq_maker.add_op(general_response_op)
 ```
+If you use `the command line + configuration file method to start C++ server`, you only need to modify [the configuration file](./Serving_Configure_CN.md), don`t need to change any line of 👆 code.
 For simple series logic, we simplify it and build it with `OpSeqMaker`. You can determine the successor by default according to the order of joining `OpSeqMaker` without specifying the successor of each node.
 Since the code will be commonly used and users do not have to change the code, PaddleServing releases a easy-to-use launching command for service startup. An example is as follows: 

--- a/doc/C++_Serving/Introduction_CN.md
+++ b/doc/C++_Serving/Introduction_CN.md
@@ -57,7 +57,7 @@ Server端的核心是一个由项目代码编译产生的名称为serving的二
    <br>
 <p>
-C++ Serving框架支持[自定义DAG图](./Model_Ensemble_CN.md)的方式来表示多模型之间串并联组合关系，也支持用户[使用C++开发自定义OP节点](./OP_CN.md)。相比于使用内外两层服务来提供多模型组合处理的方式，由于节省了一次RPC网络传输的开销，把多模型在一个服务中处理性能上会有一定的提升，尤其当RPC通信传输的数据量较大时。
+C++ Serving框架支持在一个服务中创建[多模型组合](./2+_model.md)，用户可通过[自定义DAG图](./Model_Ensemble_CN.md)的方式来表示多模型之间串并联组合关系，也支持用户[使用C++开发自定义OP节点](./OP_CN.md)。相比于使用内外两层服务来提供多模型组合处理的方式，由于节省了一次RPC网络传输的开销，把多模型在一个服务中处理性能上会有一定的提升，尤其当RPC通信传输的数据量较大时。
 ### 3.4 模型管理与热加载
 C++ Serving的引擎支持模型管理功能，支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性，需要在服务不中断的情况下对模型进行热加载。C++ Serving对该特性进行了支持，并提供了一个监控产出模型更新本地模型的工具，具体例子请参考《[C++ Serving中的模型热加载](./Hot_Loading_CN.md)》。

--- a/doc/C++_Serving/OP_CN.md
+++ b/doc/C++_Serving/OP_CN.md
@@ -113,8 +113,8 @@ int GeneralInferOp::inference() {
  int64_t start = timeline.TimeStampUS();
  timeline.Start();
-  if (InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)) {
+  if (InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)) {
-    LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME;
+    LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
    return -1;
  }
@@ -127,14 +127,15 @@ int GeneralInferOp::inference() {
 DEFINE_OP(GeneralInferOp);
 ```
-`input_blob` 和 `output_blob` 都有很多的 `paddle::PaddleTensor`, 且Paddle预测库会被 `InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)`调用。此函数中的其他大多数代码都与性能分析有关，将来我们也可能会删除多余的代码。
+`input_blob` 和 `output_blob` 都有很多的 `paddle::PaddleTensor`, 且Paddle预测库会被 `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)`调用。此函数中的其他大多数代码都与性能分析有关，将来我们也可能会删除多余的代码。
 基本上，以上代码可以实现一个新的运算符。如果您想访问字典资源，可以参考`core/predictor/framework/resource.cpp`来添加全局可见资源。资源的初始化在启动服务器的运行时执行。
 ## 定义 Python API
-在服务器端为Paddle Serving定义C++运算符后，最后一步是在Python API中为Paddle Serving服务器API添加注册， `python/paddle_serving_server/__init__.py`文件里有关于API注册的代码如下
+在服务器端为Paddle Serving定义C++运算符后，最后一步是在Python API中为Paddle Serving服务器API添加注册， `python/paddle_serving_server/dag.py`文件里有关于API注册的代码如下
 ``` python
 self.op_dict = {
@@ -147,3 +148,13 @@ self.op_dict = {
            "general_dist_kv": "GeneralDistKVOp"
        }
 ```
+在`python/paddle_serving_server/server.py`文件中仅添加`需要加载模型，执行推理预测的自定义的C++OP类的类名`。例如`general_reader`由于只是做一些简单的数据处理而不加载模型调用预测，故在👆的代码中需要添加，而不添加在👇的代码中。
+``` python
+default_engine_types = [
+                'GeneralInferOp',
+                'GeneralDistKVInferOp',
+                'GeneralDistKVQuantInferOp',
+                'GeneralDetectionOp',
+            ]
+```
--- a/doc/C++_Serving/OP_EN.md
+++ b/doc/C++_Serving/OP_EN.md
@@ -112,8 +112,8 @@ int GeneralInferOp::inference() {
  int64_t start = timeline.TimeStampUS();
  timeline.Start();
-  if (InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)) {
+  if (InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)) {
-    LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME;
+    LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
    return -1;
  }
@@ -126,15 +126,16 @@ int GeneralInferOp::inference() {
 DEFINE_OP(GeneralInferOp);
 ```
-`input_blob` and `output_blob` both have multiple `paddle::PaddleTensor`, and the Paddle Inference library can be called through `InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)`. Most of the other code in this function is about profiling, we may remove redudant code in the future as well.
+`input_blob` and `output_blob` both have multiple `paddle::PaddleTensor`, and the Paddle Inference library can be called through `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)`. Most of the other code in this function is about profiling, we may remove redudant code in the future as well.
 Basically, the above code can implement a new operator. If you want to visit dictionary resource, you can reference `core/predictor/framework/resource.cpp` to add global visible resources. The initialization of resources is executed at the runtime of starting server.
 ## Define Python API
-After you have defined a C++ operator on server side for Paddle Serving, the last step is to add a registration in Python API for PaddleServing server API, `python/paddle_serving_server/__init__.py` in the repo has the code piece.
+After you have defined a C++ operator on server side for Paddle Serving, the last step is to add a registration in Python API for PaddleServing server API, `python/paddle_serving_server/dag.py` in the repo has the code piece.
-``` c++
+``` python
 self.op_dict = {
            "general_infer": "GeneralInferOp",
            "general_reader": "GeneralReaderOp",
@@ -145,3 +146,15 @@ self.op_dict = {
            "general_dist_kv": "GeneralDistKVOp"
        }
 ```
+In `python/paddle_serving_server/server.py` file, only the class name of the C++ OP class that needs to load the model and execute prediction is added. 
+For example, `general_reader`, need to be added in the 👆 code, but not in the 👇 code. Because it only does some simple data processing without loading the model and call prediction. 
+``` python
+default_engine_types = [
+                'GeneralInferOp',
+                'GeneralDistKVInferOp',
+                'GeneralDistKVQuantInferOp',
+                'GeneralDetectionOp',
+            ]
+```
--- a/doc/Compile_CN.md
+++ b/doc/Compile_CN.md
@@ -106,9 +106,10 @@ find / -name Python.h
 2) 设置`PYTHON_LIBRARIES`
-搜索 libpython3.7.so
+搜索 libpython3.7.so 或 libpython3.7m.so
 ```
 find / -name libpython3.7.so
+find / -name libpython3.7m.so
 ```
 通常会有类似于`**/lib/libpython3.7.so`或者`**/lib/x86_64-linux-gnu/libpython3.7.so`出现，我们只需要取它的文件夹目录就好，比如找到`/usr/local/lib/libpython3.7.so`，那么我们只需要`export PYTHON_LIBRARIES=/usr/local/lib`就好。
 如果没有找到，说明 1）静态编译Python，需要重新安装动态编译的Python 2）全县不足无法查看相关系统目录。
@@ -132,7 +133,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
 export GOPATH=$HOME/go
 export PATH=$PATH:$GOPATH/bin
-python -m install -r python/requirements.txt
+python3.7 -m pip install -r python/requirements.txt
 go env -w GO111MODULE=on
 go env -w GOPROXY=https://goproxy.cn,direct

--- a/doc/Compile_EN.md
+++ b/doc/Compile_EN.md
@@ -93,9 +93,10 @@ If not found. Explanation 1) The development version of Python is not installed
 2) Set `PYTHON_LIBRARIES`
-Search for libpython3.7.so
+Search for libpython3.7.so or libpython3.7m.so
 ```
 find / -name libpython3.7.so
+find / -name libpython3.7m.so
 ```
 Usually there will be something similar to `**/lib/libpython3.7.so` or `**/lib/x86_64-linux-gnu/libpython3.7.so`, we only need to take its folder directory, For example, find `/usr/local/lib/libpython3.7.so`, then we only need `export PYTHON_LIBRARIES=/usr/local/lib`.
 If it is not found, it means 1) Statically compiling Python, you need to reinstall the dynamically compiled Python 2) The county is not enough to view the relevant system catalogs.
@@ -119,7 +120,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
 export GOPATH=$HOME/go
 export PATH=$PATH:$GOPATH/bin
-python -m install -r python/requirements.txt
+python3.7 -m pip install -r python/requirements.txt
 go env -w GO111MODULE=on
 go env -w GOPROXY=https://goproxy.cn,direct

--- a/doc/Serving_Configure_CN.md
+++ b/doc/Serving_Configure_CN.md
@@ -84,8 +84,8 @@ workdir_9393
 | Argument                                       | Type | Default | Description                                           |
 | ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
 | `thread`                                       | int  | `2`     | Number of brpc service thread                         |
-| `op_num`                                       | int[]| `0`     | Thread Number for each model in asynchronous mode     |
+| `runtime_thread_num`                           | int[]| `0`     | Thread Number for each model in asynchronous mode     |
-| `op_max_batch`                                 | int[]| `32`    | Batch Number for each model in asynchronous mode      |
+| `batch_infer_size`                             | int[]| `32`    | Batch Number for each model in asynchronous mode      |
 | `gpu_ids`                                      | str[]| `"-1"`  | Gpu card id for each model                            |
 | `port`                                         | int  | `9292`  | Exposed port of current service to users              |
 | `model`                                        | str[]| `""`    | Path of paddle model directory to be served           |

--- a/doc/Serving_Configure_EN.md
+++ b/doc/Serving_Configure_EN.md
@@ -84,8 +84,8 @@ More flags:
 | Argument                                       | Type | Default | Description                                           |
 | ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
 | `thread`                                       | int  | `2`     | Number of brpc service thread                         |
-| `op_num`                                       | int[]| `0`     | Thread Number for each model in asynchronous mode     |
+| `runtime_thread_num`                           | int[]| `0`     | Thread Number for each model in asynchronous mode     |
-| `op_max_batch`                                 | int[]| `32`    | Batch Number for each model in asynchronous mode      |
+| `batch_infer_size`                             | int[]| `32`    | Batch Number for each model in asynchronous mode      |
 | `gpu_ids`                                      | str[]| `"-1"`  | Gpu card id for each model                            |
 | `port`                                         | int  | `9292`  | Exposed port of current service to users              |
 | `model`                                        | str[]| `""`    | Path of paddle model directory to be served           |

--- a/doc/images/wechat_group_1.jpeg
+++ b/doc/images/wechat_group_1.jpeg