# 进阶 C++ Serving 介绍
## 概述
本文将对 C++ Serving 除基本功能之外的高级特性、性能调优等问题进行介绍和说明,本文适合以下用户:
- 想要全面了解 C++ Serving 源码
- 想要了解模型热加载、A/B Test、加密模型推理服务等高级特性
- 通过修改 C++ Serving 参数进行性能调优
## 协议
当您需要自行组装 Request 请求中的数据或者需要二次开发时,您可以参考[相关文档]()。
## 模型热加载
当您需要在 Server 端不停止的情况下更新模型时,您可以参考[相关文档]()。
## A/B Test
当您需要将用户的请求按照一定的流量比例发送到不同的 Server 端时,您可以参考[相关文档]()。
## 加密模型推理服务
当您需要将模型加密部署到 Server 端时,您可以参考[相关文档]()。
## 多模型串联
当您需要将多个模型串联在同一个 Server 中部署时(例如 OCR 需要串联 Det 和 Rec),您可以参考该部分内容。
## 性能优化指南
当您想要对 C++ Serving 服务端进行性能调优时,您可以参考[相关文档]()。
## 性能指标
当您想要了解 C++ Serving 与竞品的性能对比数据时,您可以参考[相关文档]()。
# Inference Protocols
C++ Serving 基于 BRPC 进行服务构建,支持 BRPC、GRPC、RESTful 请求。请求数据为 protobuf 格式,详见 `core/general-server/proto/general_model_service.proto`。本文介绍构建请求以及解析结果的方法。
## Tensor
**一.Tensor 定义**
Tensor 可以装载多种类型的数据,是 Request 和 Response 的基础单元。Tensor 的具体定义如下:
message Tensor {
// VarType: INT64
repeated int64 int64_data = 1;
// VarType: FP32
repeated float float_data = 2;
// VarType: INT32
repeated int32 int_data = 3;
// VarType: FP64
repeated double float64_data = 4;
// VarType: UINT32
repeated uint32 uint32_data = 5;
// VarType: BOOL
repeated bool bool_data = 6;
// (No support)VarType: COMPLEX64, 2x represents the real part, 2x+1
// represents the imaginary part
repeated float complex64_data = 7;
// (No support)VarType: COMPLEX128, 2x represents the real part, 2x+1
// represents the imaginary part
repeated double complex128_data = 8;
// VarType: STRING
repeated string data = 9;
// Element types:
// 0 => INT64
// 1 => FP32
// 2 => INT32
// 3 => FP64
// 4 => INT16
// 5 => FP16
// 6 => BF16
// 7 => UINT8
// 8 => INT8
// 9 => BOOL
// 10 => COMPLEX64
// 11 => COMPLEX128
// 20 => STRING
int32 elem_type = 10;
// Shape of the tensor, including batch dimensions.
repeated int32 shape = 11;
// Level of data(LOD), support variable length data, only for fetch tensor
// currently.
repeated int32 lod = 12;
// Correspond to the variable 'name' in the model description prototxt.
string name = 13;
// Correspond to the variable 'alias_name' in the model description prototxt.
string alias_name = 14; // get from the Model prototxt
// VarType: FP16, INT16, INT8, BF16, UINT8
bytes tensor_content = 15;
- elem_type:数据类型,当前支持 FLOAT32, INT64, INT32, UINT8, INT8, FLOAT16
- shape:数据维度
- lod:lod 信息,LoD(Level-of-Detail) Tensor 是 Paddle 的高级特性,是对 Tensor 的一种扩充,用于支持更自由的数据输入。Lod 相关原理介绍,请参考[相关文档](../LOD_CN.md)
- name/alias_name: 名称及别名,与模型配置对应
**二.构建 Tensor 数据**
1. FLOAT32 类型 Tensor
// 原始数据
std::vector<float> float_data;
Tensor *tensor = new Tensor;
// 设置维度,可以设置多维
for (uint32_t j = 0; j < float_shape.size(); ++j) {
// 设置 LOD 信息
for (uint32_t j = 0; j < float_lod.size(); ++j) {
// 设置类型、名称及别名
// 拷贝数据
int total_number = float_data.size();
tensor->mutable_float_data()->Resize(total_number, 0);
memcpy(tensor->mutable_float_data()->mutable_data(), float_datadata(), total_number * sizeof(float));
2. INT8 类型 Tensor
// 原始数据
std::string string_data;
Tensor *tensor = new Tensor;
for (uint32_t j = 0; j < string_shape.size(); ++j) {
for (uint32_t j = 0; j < string_lod.size(); ++j) {
## Request
**一.Request 定义**
Request 为客户端需要发送的请求数据,其以 Tensor 为基础数据单元,并包含了额外的请求信息。定义如下:
message Request {
repeated Tensor tensor = 1;
repeated string fetch_var_names = 2;
bool profile_server = 3;
uint64 log_id = 4;
- fetch_vat_names: 需要获取的输出数据名称,在GeneralResponseOP会根据该列表进行过滤.请参考模型文件serving_client_conf.prototxt中的`fetch_var`字段下的`alias_name`
- profile_server: 调试参数,打开时会输出性能信息
- log_id: 请求ID
**二.构建 Request**
1. Protobuf 形式
当使用 BRPC 或 GRPC 进行请求时,使用 protobuf 形式数据,构建方式如下:
Request req;
for (auto &name : fetch_name) {
// 添加Tensor
Tensor *tensor = req.add_tensor();
2. Json 形式
当使用 RESTful 请求时,可以使用 Json 形式数据,具体格式如下:
## Response
**一.Response 定义**
Response 为服务端返回给客户端的结果,包含了 Tensor 数据、错误码、错误信息等。定义如下:
message Response {
repeated ModelOutput outputs = 1;
repeated int64 profile_time = 2;
// Error code
int32 err_no = 3;
// Error messages
string err_msg = 4;
message ModelOutput {
repeated Tensor tensor = 1;
string engine_name = 2;
- profile_time:当设置 request->set_profile_server(true) 时,会返回性能信息
- err_no:错误码,详见`core/predictor/common/constant.h`
- err_msg:错误信息,详见`core/predictor/common/constant.h`
- engine_name:输出节点名称
|-5000|"Paddle Serving Framework Internal Error."|
|-5001|"Paddle Serving Memory Alloc Error."|
|-5002|"Paddle Serving Array Overflow Error."|
|-5100|"Paddle Serving Op Inference Error."|
**二.读取 Response 数据**
uint32_t model_num = res.outputs_size();
for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
std::string engine_name = output.engine_name();
int idx = 0;
// 读取 tensor 维度
int shape_size = output.tensor(idx).shape_size();
for (int i = 0; i < shape_size; ++i) {
shape[i] = output.tensor(idx).shape(i);
// 读取 LOD 信息
int lod_size = output.tensor(idx).lod_size();
if (lod_size > 0) {
for (int i = 0; i < lod_size; ++i) {
lod[i] = output.tensor(idx).lod(i);
// 读取 float 数据
int size = output.tensor(idx).float_data_size();
float_data = std::vector<float>(
output.tensor(idx).float_data().begin() + size);
// 读取 int8 数据
string_data = output.tensor(idx).tensor_content();
# Paddle Serving 中的模型热加载
## 背景
## Server Monitor
Paddle Serving 提供了一个自动监控脚本,远端地址更新模型后会拉取新模型更新本地模型,同时更新本地模型文件夹中的时间戳文件 `fluid_time_stamp` 实现热加载。
目前支持下面几种类型的远端监控 Monitor:
| Monitor类型 | 描述 | 特殊选项 |
| :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| general | 远端无认证,可以通过 `wget` 直接访问下载文件(如无需认证的FTP,BOS等) | `general_host` 通用远端host |
| hdfs/afs(HadoopMonitor) | 远端为 HDFS 或 AFS,通过 Hadoop-Client 执行相关命令 | `hadoop_bin` Hadoop 二进制的路径 <br/>`fs_name` Hadoop fs_name,默认为空<br/>`fs_ugi` Hadoop fs_ugi,默认为空 |
| ftp | 远端为 FTP,通过 `ftplib` 进行相关访问(使用该 Monitor,您可能需要执行 `pip install ftplib` 下载 `ftplib`) | `ftp_host` FTP host<br>`ftp_port` FTP port<br>`ftp_username` FTP username,默认为空<br>`ftp_password` FTP password,默认为空 |
| Monitor通用选项 | 描述 | 默认值 |
| :--------------------: | :----------------------------------------------------------: | :--------------------: |
| `type` | 指定 Monitor 类型 | 无 |
| `remote_path` | 指定远端的基础路径 | 无 |
| `remote_model_name` | 指定远端需要拉取的模型名 | 无 |
| `remote_donefile_name` | 指定远端标志模型更新完毕的 donefile 文件名 | 无 |
| `local_path` | 指定本地工作路径 | 无 |
| `local_model_name` | 指定本地模型名 | 无 |
| `local_timestamp_file` | 指定本地用于热加载的时间戳文件,该文件被认为在 `local_path/local_model_name` 下。 | `fluid_time_file` |
| `local_tmp_path` | 指定本地存放临时文件的文件夹路径,若不存在则自动创建。 | `_serving_monitor_tmp` |
| `interval` | 指定轮询间隔时间,单位为秒。 | `10` |
| `unpacked_filename` | Monitor 支持 tarfile 打包的远程模型。如果远程模型是打包格式,则需要设置该选项来告知 Monitor 解压后的文件名。 | `None` |
| `debug` | 如果添加 `--debug` 选项,则输出更详细的中间信息。 | 默认不添加该选项 |
下面通过 HadoopMonitor 示例来展示 Paddle Serving 的模型热加载功能。
## HadoopMonitor 示例
示例中在 `product_path` 中生产模型上传至 hdfs,在 `server_path` 中模拟服务端模型热加载:
├── product_path
└── server_path
`product_path` 下运行下面的 Python 代码生产模型(运行前需要修改 hadoop 相关的参数),每隔 60 秒会产出 Boston 房价预测模型的打包文件 `uci_housing.tar.gz` 并上传至 hdfs 的`/`路径下,上传完毕后更新时间戳文件 `donefile` 并上传至 hdfs 的`/`路径下。
import os
import sys
import time
import tarfile
import paddle
import paddle.fluid as fluid
import paddle_serving_client.io as serving_io
train_reader = paddle.batch(
paddle.dataset.uci_housing.train(), buf_size=500),
test_reader = paddle.batch(
paddle.dataset.uci_housing.test(), buf_size=500),
x = fluid.data(name='x', shape=[None, 13], dtype='float32')
y = fluid.data(name='y', shape=[None, 1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
def push_to_hdfs(local_file_path, remote_path):
afs = 'afs://***.***.***.***:***' # User needs to change
uci = '***,***' # User needs to change
hadoop_bin = '/path/to/haddop/bin' # User needs to change
prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
os.system('{} -rmr {}/{}'.format(
prefix, remote_path, local_file_path))
os.system('{} -put {} {}'.format(
prefix, local_file_path, remote_path))
name = "uci_housing"
for pass_id in range(30):
for data_train in train_reader():
avg_loss_value, = exe.run(fluid.default_main_program(),
# Simulate the production model every other period of time
model_name = "{}_model".format(name)
client_name = "{}_client".format(name)
serving_io.save_model(model_name, client_name,
{"x": x}, {"price": y_predict},
# Packing model
tar_name = "{}.tar.gz".format(name)
tar = tarfile.open(tar_name, 'w:gz')
# Push packaged model file to hdfs
push_to_hdfs(tar_name, '/')
# Generate donefile
donefile_name = 'donefile'
os.system('touch {}'.format(donefile_name))
# Push donefile to hdfs
push_to_hdfs(donefile_name, '/')
hdfs 上的文件如下列所示:
# hadoop fs -ls /
Found 2 items
-rw-r--r-- 1 root supergroup 0 2020-04-02 02:54 /donefile
-rw-r--r-- 1 root supergroup 2101 2020-04-02 02:54 /uci_housing.tar.gz
进入 `server_path` 文件夹。
1. 用初始模型启动 Server 端
这里使用预训练的 Boston 房价预测模型作为初始模型:
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
启动 Server 端:
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
2. 执行监控程序
用下面的命令来执行 HDFS 监控程序:
python -m paddle_serving_server.monitor \
--type='hdfs' --hadoop_bin='/hadoop-3.1.2/bin/hadoop' \
--remote_path='/' --remote_model_name='uci_housing.tar.gz' \
--remote_donefile_name='donefile' --local_path='.' \
--local_model_name='uci_housing_model' --local_timestamp_file='fluid_time_file' \
--local_tmp_path='_tmp' --unpacked_filename='uci_housing_model' --debug
上面代码通过轮询方式监控远程 HDFS 地址`/`的时间戳文件`/donefile`,当时间戳变更则认为远程模型已经更新,将远程打包模型`/uci_housing.tar.gz`拉取到本地临时路径`./_tmp/uci_housing.tar.gz`下,解包出模型文件`./_tmp/uci_housing_model`后,更新本地模型`./uci_housing_model`以及Paddle Serving的时间戳文件`./uci_housing_model/fluid_time_file`
2020-04-02 10:12 INFO [monitor.py:85] _hadoop_bin: /hadoop-3.1.2/bin/hadoop
2020-04-02 10:12 INFO [monitor.py:85] _fs_name:
2020-04-02 10:12 INFO [monitor.py:85] _fs_ugi:
2020-04-02 10:12 INFO [monitor.py:209] AFS prefix cmd: /hadoop-3.1.2/bin/hadoop fs
2020-04-02 10:12 INFO [monitor.py:85] _remote_path: /
2020-04-02 10:12 INFO [monitor.py:85] _remote_model_name: uci_housing.tar.gz
2020-04-02 10:12 INFO [monitor.py:85] _remote_donefile_name: donefile
2020-04-02 10:12 INFO [monitor.py:85] _local_model_name: uci_housing_model
2020-04-02 10:12 INFO [monitor.py:85] _local_path: .
2020-04-02 10:12 INFO [monitor.py:85] _local_timestamp_file: fluid_time_file
2020-04-02 10:12 INFO [monitor.py:85] _local_tmp_path: _tmp
2020-04-02 10:12 INFO [monitor.py:85] _interval: 10
2020-04-02 10:12 DEBUG [monitor.py:214] check cmd: /hadoop-3.1.2/bin/hadoop fs -ls /donefile 2>/dev/null
2020-04-02 10:12 DEBUG [monitor.py:216] resp: -rw-r--r-- 1 root supergroup 0 2020-04-02 10:11 /donefile
2020-04-02 10:12 INFO [monitor.py:138] doneilfe(donefile) changed.
2020-04-02 10:12 DEBUG [monitor.py:233] pull cmd: /hadoop-3.1.2/bin/hadoop fs -get /uci_housing.tar.gz _tmp/uci_housing.tar.gz 2>/dev/null
2020-04-02 10:12 INFO [monitor.py:144] pull remote model(uci_housing.tar.gz).
2020-04-02 10:12 INFO [monitor.py:98] unpack remote file(uci_housing.tar.gz).
2020-04-02 10:12 DEBUG [monitor.py:108] remove packed file(uci_housing.tar.gz).
2020-04-02 10:12 INFO [monitor.py:110] using unpacked filename: uci_housing_model.
2020-04-02 10:12 DEBUG [monitor.py:175] update model cmd: cp -r _tmp/uci_housing_model/* ./uci_housing_model
2020-04-02 10:12 INFO [monitor.py:152] update local model(uci_housing_model).
2020-04-02 10:12 DEBUG [monitor.py:184] update timestamp cmd: touch ./uci_housing_model/fluid_time_file
2020-04-02 10:12 INFO [monitor.py:157] update model timestamp(fluid_time_file).
2020-04-02 10:12 INFO [monitor.py:161] sleep 10s.
2020-04-02 10:12 DEBUG [monitor.py:214] check cmd: /hadoop-3.1.2/bin/hadoop fs -ls /donefile 2>/dev/null
2020-04-02 10:12 DEBUG [monitor.py:216] resp: -rw-r--r-- 1 root supergroup 0 2020-04-02 10:11 /donefile
2020-04-02 10:12 INFO [monitor.py:161] sleep 10s.
3. 查看 Server 日志
通过下面命令查看 Server 的运行日志:
tail -f log/serving.INFO
I0330 09:38:40.087316 7361 server.cpp:150] Begin reload framework...
W0330 09:38:40.087399 7361 infer.h:656] Succ reload version engine: 18446744073709551615
I0330 09:38:40.087414 7361 manager.h:131] Finish reload 1 workflow(s)
I0330 09:38:50.087535 7361 server.cpp:150] Begin reload framework...
W0330 09:38:50.087641 7361 infer.h:250] begin reload model[uci_housing_model].
I0330 09:38:50.087972 7361 infer.h:66] InferEngineCreationParams: model_path = uci_housing_model, enable_memory_optimization = 0, static_optimization = 0, force_update_static_cache = 0
I0330 09:38:50.088027 7361 analysis_predictor.cc:88] Profiler is deactivated, and no profiling report will be generated.
I0330 09:38:50.088393 7361 analysis_predictor.cc:841] MODEL VERSION: 1.7.1
I0330 09:38:50.088413 7361 analysis_predictor.cc:843] PREDICTOR VERSION: 1.6.3
I0330 09:38:50.089519 7361 graph_pattern_detector.cc:96] --- detected 1 subgraphs
I0330 09:38:50.090925 7361 analysis_predictor.cc:470] ======= optimize end =======
W0330 09:38:50.090986 7361 infer.h:472] Succ load common model[0x7fc83c06abd0], path[uci_housing_model].
I0330 09:38:50.091022 7361 analysis_predictor.cc:88] Profiler is deactivated, and no profiling report will be generated.
W0330 09:38:50.091050 7361 infer.h:509] td_core[0x7fc83c0ad770] clone model from pd_core[0x7fc83c06abd0] succ, cur_idx[0].
W0330 09:38:50.091784 7361 infer.h:489] Succ load clone model, path[uci_housing_model]
W0330 09:38:50.091794 7361 infer.h:656] Succ reload version engine: 18446744073709551615
I0330 09:38:50.091820 7361 manager.h:131] Finish reload 1 workflow(s)
I0330 09:39:00.091987 7361 server.cpp:150] Begin reload framework...
W0330 09:39:00.092161 7361 infer.h:656] Succ reload version engine: 18446744073709551615
I0330 09:39:00.092177 7361 manager.h:131] Finish reload 1 workflow(s)
