@@ -16,12 +16,13 @@ Deep neural nets often have some preprocessing steps on input data, and postproc
### Simple series structure
PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. A example graph and the corresponding DAG definition code is as follows.
PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. Here is an example of DAG graph.
If you use `the command line + configuration file method to start C++ server`, you only need to modify [the configuration file](./Serving_Configure_CN.md), don`t need to change any line of 👆 code.
For simple series logic, we simplify it and build it with `OpSeqMaker`. You can determine the successor by default according to the order of joining `OpSeqMaker` without specifying the successor of each node.
Since the code will be commonly used and users do not have to change the code, PaddleServing releases a easy-to-use launching command for service startup. An example is as follows:
C++ Serving框架支持[自定义DAG图](./Model_Ensemble_CN.md)的方式来表示多模型之间串并联组合关系,也支持用户[使用C++开发自定义OP节点](./OP_CN.md)。相比于使用内外两层服务来提供多模型组合处理的方式,由于节省了一次RPC网络传输的开销,把多模型在一个服务中处理性能上会有一定的提升,尤其当RPC通信传输的数据量较大时。
C++ Serving框架支持在一个服务中创建[多模型组合](./2+_model.md),用户可通过[自定义DAG图](./Model_Ensemble_CN.md)的方式来表示多模型之间串并联组合关系,也支持用户[使用C++开发自定义OP节点](./OP_CN.md)。相比于使用内外两层服务来提供多模型组合处理的方式,由于节省了一次RPC网络传输的开销,把多模型在一个服务中处理性能上会有一定的提升,尤其当RPC通信传输的数据量较大时。
### 3.4 模型管理与热加载
C++ Serving的引擎支持模型管理功能,支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性,需要在服务不中断的情况下对模型进行热加载。C++ Serving对该特性进行了支持,并提供了一个监控产出模型更新本地模型的工具,具体例子请参考《[C++ Serving中的模型热加载](./Hot_Loading_CN.md)》。
LOG(ERROR)<<"Failed do infer in fluid model: "<<engine_name().c_str();
return-1;
}
...
...
@@ -126,15 +126,16 @@ int GeneralInferOp::inference() {
DEFINE_OP(GeneralInferOp);
```
`input_blob` and `output_blob` both have multiple `paddle::PaddleTensor`, and the Paddle Inference library can be called through `InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)`. Most of the other code in this function is about profiling, we may remove redudant code in the future as well.
`input_blob` and `output_blob` both have multiple `paddle::PaddleTensor`, and the Paddle Inference library can be called through `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)`. Most of the other code in this function is about profiling, we may remove redudant code in the future as well.
Basically, the above code can implement a new operator. If you want to visit dictionary resource, you can reference `core/predictor/framework/resource.cpp` to add global visible resources. The initialization of resources is executed at the runtime of starting server.
## Define Python API
After you have defined a C++ operator on server side for Paddle Serving, the last step is to add a registration in Python API for PaddleServing server API, `python/paddle_serving_server/__init__.py` in the repo has the code piece.
After you have defined a C++ operator on server side for Paddle Serving, the last step is to add a registration in Python API for PaddleServing server API, `python/paddle_serving_server/dag.py` in the repo has the code piece.
``` c++
``` python
self.op_dict={
"general_infer":"GeneralInferOp",
"general_reader":"GeneralReaderOp",
...
...
@@ -145,3 +146,15 @@ self.op_dict = {
"general_dist_kv":"GeneralDistKVOp"
}
```
In `python/paddle_serving_server/server.py` file, only the class name of the C++ OP class that needs to load the model and execute prediction is added.
For example, `general_reader`, need to be added in the 👆 code, but not in the 👇 code. Because it only does some simple data processing without loading the model and call prediction.