未验证 提交 09a353d0 编写于 作者: T TeslaZhao 提交者: GitHub

Merge branch 'PaddlePaddle:develop' into develop

# 进阶 C++ Serving 介绍
## 概述
本文将对 C++ Serving 除基本功能之外的高级特性、性能调优等问题进行介绍和说明,本文适合以下用户:
- 想要全面了解 C++ Serving 源码
- 想要了解模型热加载、A/B Test、加密模型推理服务等高级特性
- 通过修改 C++ Serving 参数进行性能调优
## 协议
当您需要自行组装 Request 请求中的数据或者需要二次开发时,您可以参考[相关文档]()。
## 模型热加载
当您需要在 Server 端不停止的情况下更新模型时,您可以参考[相关文档]()。
## A/B Test
当您需要将用户的请求按照一定的流量比例发送到不同的 Server 端时,您可以参考[相关文档]()。
## 加密模型推理服务
当您需要将模型加密部署到 Server 端时,您可以参考[相关文档]()。
## 多模型串联
当您需要将多个模型串联在同一个 Server 中部署时(例如 OCR 需要串联 Det 和 Rec),您可以参考该部分内容。
## 性能优化指南
当您想要对 C++ Serving 服务端进行性能调优时,您可以参考[相关文档]()。
## 性能指标
当您想要了解 C++ Serving 与竞品的性能对比数据时,您可以参考[相关文档]()。
# Inference Protocols
C++ Serving 基于 BRPC 进行服务构建,支持 BRPC、GRPC、RESTful 请求。请求数据为 protobuf 格式,详见 `core/general-server/proto/general_model_service.proto`。本文介绍构建请求以及解析结果的方法。
## Tensor
**一.Tensor 定义**
Tensor 可以装载多种类型的数据,是 Request 和 Response 的基础单元。Tensor 的具体定义如下:
```protobuf
message Tensor {
// VarType: INT64
repeated int64 int64_data = 1;
// VarType: FP32
repeated float float_data = 2;
// VarType: INT32
repeated int32 int_data = 3;
// VarType: FP64
repeated double float64_data = 4;
// VarType: UINT32
repeated uint32 uint32_data = 5;
// VarType: BOOL
repeated bool bool_data = 6;
// (No support)VarType: COMPLEX64, 2x represents the real part, 2x+1
// represents the imaginary part
repeated float complex64_data = 7;
// (No support)VarType: COMPLEX128, 2x represents the real part, 2x+1
// represents the imaginary part
repeated double complex128_data = 8;
// VarType: STRING
repeated string data = 9;
// Element types:
// 0 => INT64
// 1 => FP32
// 2 => INT32
// 3 => FP64
// 4 => INT16
// 5 => FP16
// 6 => BF16
// 7 => UINT8
// 8 => INT8
// 9 => BOOL
// 10 => COMPLEX64
// 11 => COMPLEX128
// 20 => STRING
int32 elem_type = 10;
// Shape of the tensor, including batch dimensions.
repeated int32 shape = 11;
// Level of data(LOD), support variable length data, only for fetch tensor
// currently.
repeated int32 lod = 12;
// Correspond to the variable 'name' in the model description prototxt.
string name = 13;
// Correspond to the variable 'alias_name' in the model description prototxt.
string alias_name = 14; // get from the Model prototxt
// VarType: FP16, INT16, INT8, BF16, UINT8
bytes tensor_content = 15;
};
```
- elem_type:数据类型,当前支持 FLOAT32, INT64, INT32, UINT8, INT8, FLOAT16
|elem_type|类型|
|---------|----|
|0|INT64|
|1|FLOAT32|
|2|INT32|
|3|FP64|
|4|INT16|
|5|FP16|
|6|BF16|
|7|UINT8|
|8|INT8|
- shape:数据维度
- lod:lod 信息,LoD(Level-of-Detail) Tensor 是 Paddle 的高级特性,是对 Tensor 的一种扩充,用于支持更自由的数据输入。Lod 相关原理介绍,请参考[相关文档](../LOD_CN.md)
- name/alias_name: 名称及别名,与模型配置对应
**二.构建 Tensor 数据**
1. FLOAT32 类型 Tensor
```C
// 原始数据
std::vector<float> float_data;
Tensor *tensor = new Tensor;
// 设置维度,可以设置多维
for (uint32_t j = 0; j < float_shape.size(); ++j) {
tensor->add_shape(float_shape[j]);
}
// 设置 LOD 信息
for (uint32_t j = 0; j < float_lod.size(); ++j) {
tensor->add_lod(float_lod[j]);
}
// 设置类型、名称及别名
tensor->set_elem_type(1);
tensor->set_name(name);
tensor->set_alias_name(alias_name);
// 拷贝数据
int total_number = float_data.size();
tensor->mutable_float_data()->Resize(total_number, 0);
memcpy(tensor->mutable_float_data()->mutable_data(), float_datadata(), total_number * sizeof(float));
```
2. INT8 类型 Tensor
```C
// 原始数据
std::string string_data;
Tensor *tensor = new Tensor;
for (uint32_t j = 0; j < string_shape.size(); ++j) {
tensor->add_shape(string_shape[j]);
}
for (uint32_t j = 0; j < string_lod.size(); ++j) {
tensor->add_lod(string_lod[j]);
}
tensor->set_elem_type(8);
tensor->set_name(name);
tensor->set_alias_name(alias_name);
tensor->set_tensor_content(string_data);
```
## Request
**一.Request 定义**
Request 为客户端需要发送的请求数据,其以 Tensor 为基础数据单元,并包含了额外的请求信息。定义如下:
```protobuf
message Request {
repeated Tensor tensor = 1;
repeated string fetch_var_names = 2;
bool profile_server = 3;
uint64 log_id = 4;
};
```
- fetch_vat_names: 需要获取的输出数据名称,在GeneralResponseOP会根据该列表进行过滤.请参考模型文件serving_client_conf.prototxt中的`fetch_var`字段下的`alias_name`
- profile_server: 调试参数,打开时会输出性能信息
- log_id: 请求ID
**二.构建 Request**
1. Protobuf 形式
当使用 BRPC 或 GRPC 进行请求时,使用 protobuf 形式数据,构建方式如下:
```C
Request req;
req.set_log_id(log_id);
for (auto &name : fetch_name) {
req.add_fetch_var_names(name);
}
// 添加Tensor
Tensor *tensor = req.add_tensor();
...
```
2. Json 形式
当使用 RESTful 请求时,可以使用 Json 形式数据,具体格式如下:
```Json
{"tensor":[{"float_data":[0.0137,-0.1136,0.2553,-0.0692,0.0582,-0.0727,-0.1583,-0.0584,0.6283,0.4919,0.1856,0.0795,-0.0332],"elem_type":1,"name":"x","alias_name":"x","shape":[1,13]}],"fetch_var_names":["price"],"log_id":0}
```
## Response
**一.Response 定义**
Response 为服务端返回给客户端的结果,包含了 Tensor 数据、错误码、错误信息等。定义如下:
```protobuf
message Response {
repeated ModelOutput outputs = 1;
repeated int64 profile_time = 2;
// Error code
int32 err_no = 3;
// Error messages
string err_msg = 4;
};
message ModelOutput {
repeated Tensor tensor = 1;
string engine_name = 2;
}
```
- profile_time:当设置 request->set_profile_server(true) 时,会返回性能信息
- err_no:错误码,详见`core/predictor/common/constant.h`
- err_msg:错误信息,详见`core/predictor/common/constant.h`
- engine_name:输出节点名称
|err_no|err_msg|
|---------|----|
|0|OK|
|-5000|"Paddle Serving Framework Internal Error."|
|-5001|"Paddle Serving Memory Alloc Error."|
|-5002|"Paddle Serving Array Overflow Error."|
|-5100|"Paddle Serving Op Inference Error."|
**二.读取 Response 数据**
```C
uint32_t model_num = res.outputs_size();
for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
std::string engine_name = output.engine_name();
int idx = 0;
// 读取 tensor 维度
int shape_size = output.tensor(idx).shape_size();
for (int i = 0; i < shape_size; ++i) {
shape[i] = output.tensor(idx).shape(i);
}
// 读取 LOD 信息
int lod_size = output.tensor(idx).lod_size();
if (lod_size > 0) {
lod.resize(lod_size);
for (int i = 0; i < lod_size; ++i) {
lod[i] = output.tensor(idx).lod(i);
}
}
// 读取 float 数据
int size = output.tensor(idx).float_data_size();
float_data = std::vector<float>(
output.tensor(idx).float_data().begin(),
output.tensor(idx).float_data().begin() + size);
// 读取 int8 数据
string_data = output.tensor(idx).tensor_content();
}
```
# Paddle Serving 中的模型热加载
## 背景
在实际的工业场景下,通常是远端定期不间断产出模型,线上服务端需要在服务不中断的情况下拉取新模型对旧模型进行更新迭代。
## Server Monitor
Paddle Serving 提供了一个自动监控脚本,远端地址更新模型后会拉取新模型更新本地模型,同时更新本地模型文件夹中的时间戳文件 `fluid_time_stamp` 实现热加载。
目前支持下面几种类型的远端监控 Monitor:
| Monitor类型 | 描述 | 特殊选项 |
| :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| general | 远端无认证,可以通过 `wget` 直接访问下载文件(如无需认证的FTP,BOS等) | `general_host` 通用远端host |
| hdfs/afs(HadoopMonitor) | 远端为 HDFS 或 AFS,通过 Hadoop-Client 执行相关命令 | `hadoop_bin` Hadoop 二进制的路径 <br/>`fs_name` Hadoop fs_name,默认为空<br/>`fs_ugi` Hadoop fs_ugi,默认为空 |
| ftp | 远端为 FTP,通过 `ftplib` 进行相关访问(使用该 Monitor,您可能需要执行 `pip install ftplib` 下载 `ftplib`) | `ftp_host` FTP host<br>`ftp_port` FTP port<br>`ftp_username` FTP username,默认为空<br>`ftp_password` FTP password,默认为空 |
| Monitor通用选项 | 描述 | 默认值 |
| :--------------------: | :----------------------------------------------------------: | :--------------------: |
| `type` | 指定 Monitor 类型 | 无 |
| `remote_path` | 指定远端的基础路径 | 无 |
| `remote_model_name` | 指定远端需要拉取的模型名 | 无 |
| `remote_donefile_name` | 指定远端标志模型更新完毕的 donefile 文件名 | 无 |
| `local_path` | 指定本地工作路径 | 无 |
| `local_model_name` | 指定本地模型名 | 无 |
| `local_timestamp_file` | 指定本地用于热加载的时间戳文件,该文件被认为在 `local_path/local_model_name` 下。 | `fluid_time_file` |
| `local_tmp_path` | 指定本地存放临时文件的文件夹路径,若不存在则自动创建。 | `_serving_monitor_tmp` |
| `interval` | 指定轮询间隔时间,单位为秒。 | `10` |
| `unpacked_filename` | Monitor 支持 tarfile 打包的远程模型。如果远程模型是打包格式,则需要设置该选项来告知 Monitor 解压后的文件名。 | `None` |
| `debug` | 如果添加 `--debug` 选项,则输出更详细的中间信息。 | 默认不添加该选项 |
下面通过 HadoopMonitor 示例来展示 Paddle Serving 的模型热加载功能。
## HadoopMonitor 示例
示例中在 `product_path` 中生产模型上传至 hdfs,在 `server_path` 中模拟服务端模型热加载:
```shell
.
├── product_path
└── server_path
```
**一.生产模型**
`product_path` 下运行下面的 Python 代码生产模型(运行前需要修改 hadoop 相关的参数),每隔 60 秒会产出 Boston 房价预测模型的打包文件 `uci_housing.tar.gz` 并上传至 hdfs 的`/`路径下,上传完毕后更新时间戳文件 `donefile` 并上传至 hdfs 的`/`路径下。
```python
import os
import sys
import time
import tarfile
import paddle
import paddle.fluid as fluid
import paddle_serving_client.io as serving_io
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=16)
test_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500),
batch_size=16)
x = fluid.data(name='x', shape=[None, 13], dtype='float32')
y = fluid.data(name='y', shape=[None, 1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
sgd_optimizer.minimize(avg_loss)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
def push_to_hdfs(local_file_path, remote_path):
afs = 'afs://***.***.***.***:***' # User needs to change
uci = '***,***' # User needs to change
hadoop_bin = '/path/to/haddop/bin' # User needs to change
prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
os.system('{} -rmr {}/{}'.format(
prefix, remote_path, local_file_path))
os.system('{} -put {} {}'.format(
prefix, local_file_path, remote_path))
name = "uci_housing"
for pass_id in range(30):
for data_train in train_reader():
avg_loss_value, = exe.run(fluid.default_main_program(),
feed=feeder.feed(data_train),
fetch_list=[avg_loss])
# Simulate the production model every other period of time
time.sleep(60)
model_name = "{}_model".format(name)
client_name = "{}_client".format(name)
serving_io.save_model(model_name, client_name,
{"x": x}, {"price": y_predict},
fluid.default_main_program())
# Packing model
tar_name = "{}.tar.gz".format(name)
tar = tarfile.open(tar_name, 'w:gz')
tar.add(model_name)
tar.close()
# Push packaged model file to hdfs
push_to_hdfs(tar_name, '/')
# Generate donefile
donefile_name = 'donefile'
os.system('touch {}'.format(donefile_name))
# Push donefile to hdfs
push_to_hdfs(donefile_name, '/')
```
hdfs 上的文件如下列所示:
```bash
# hadoop fs -ls /
Found 2 items
-rw-r--r-- 1 root supergroup 0 2020-04-02 02:54 /donefile
-rw-r--r-- 1 root supergroup 2101 2020-04-02 02:54 /uci_housing.tar.gz
```
**二.服务端加载模型**
进入 `server_path` 文件夹。
1. 用初始模型启动 Server 端
这里使用预训练的 Boston 房价预测模型作为初始模型:
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
启动 Server 端:
```shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
```
2. 执行监控程序
用下面的命令来执行 HDFS 监控程序:
```shell
python -m paddle_serving_server.monitor \
--type='hdfs' --hadoop_bin='/hadoop-3.1.2/bin/hadoop' \
--remote_path='/' --remote_model_name='uci_housing.tar.gz' \
--remote_donefile_name='donefile' --local_path='.' \
--local_model_name='uci_housing_model' --local_timestamp_file='fluid_time_file' \
--local_tmp_path='_tmp' --unpacked_filename='uci_housing_model' --debug
```
上面代码通过轮询方式监控远程 HDFS 地址`/`的时间戳文件`/donefile`,当时间戳变更则认为远程模型已经更新,将远程打包模型`/uci_housing.tar.gz`拉取到本地临时路径`./_tmp/uci_housing.tar.gz`下,解包出模型文件`./_tmp/uci_housing_model`后,更新本地模型`./uci_housing_model`以及Paddle Serving的时间戳文件`./uci_housing_model/fluid_time_file`
预计输出如下:
```shell
2020-04-02 10:12 INFO [monitor.py:85] _hadoop_bin: /hadoop-3.1.2/bin/hadoop
2020-04-02 10:12 INFO [monitor.py:85] _fs_name:
2020-04-02 10:12 INFO [monitor.py:85] _fs_ugi:
2020-04-02 10:12 INFO [monitor.py:209] AFS prefix cmd: /hadoop-3.1.2/bin/hadoop fs
2020-04-02 10:12 INFO [monitor.py:85] _remote_path: /
2020-04-02 10:12 INFO [monitor.py:85] _remote_model_name: uci_housing.tar.gz
2020-04-02 10:12 INFO [monitor.py:85] _remote_donefile_name: donefile
2020-04-02 10:12 INFO [monitor.py:85] _local_model_name: uci_housing_model
2020-04-02 10:12 INFO [monitor.py:85] _local_path: .
2020-04-02 10:12 INFO [monitor.py:85] _local_timestamp_file: fluid_time_file
2020-04-02 10:12 INFO [monitor.py:85] _local_tmp_path: _tmp
2020-04-02 10:12 INFO [monitor.py:85] _interval: 10
2020-04-02 10:12 DEBUG [monitor.py:214] check cmd: /hadoop-3.1.2/bin/hadoop fs -ls /donefile 2>/dev/null
2020-04-02 10:12 DEBUG [monitor.py:216] resp: -rw-r--r-- 1 root supergroup 0 2020-04-02 10:11 /donefile
2020-04-02 10:12 INFO [monitor.py:138] doneilfe(donefile) changed.
2020-04-02 10:12 DEBUG [monitor.py:233] pull cmd: /hadoop-3.1.2/bin/hadoop fs -get /uci_housing.tar.gz _tmp/uci_housing.tar.gz 2>/dev/null
2020-04-02 10:12 INFO [monitor.py:144] pull remote model(uci_housing.tar.gz).
2020-04-02 10:12 INFO [monitor.py:98] unpack remote file(uci_housing.tar.gz).
2020-04-02 10:12 DEBUG [monitor.py:108] remove packed file(uci_housing.tar.gz).
2020-04-02 10:12 INFO [monitor.py:110] using unpacked filename: uci_housing_model.
2020-04-02 10:12 DEBUG [monitor.py:175] update model cmd: cp -r _tmp/uci_housing_model/* ./uci_housing_model
2020-04-02 10:12 INFO [monitor.py:152] update local model(uci_housing_model).
2020-04-02 10:12 DEBUG [monitor.py:184] update timestamp cmd: touch ./uci_housing_model/fluid_time_file
2020-04-02 10:12 INFO [monitor.py:157] update model timestamp(fluid_time_file).
2020-04-02 10:12 INFO [monitor.py:161] sleep 10s.
2020-04-02 10:12 DEBUG [monitor.py:214] check cmd: /hadoop-3.1.2/bin/hadoop fs -ls /donefile 2>/dev/null
2020-04-02 10:12 DEBUG [monitor.py:216] resp: -rw-r--r-- 1 root supergroup 0 2020-04-02 10:11 /donefile
2020-04-02 10:12 INFO [monitor.py:161] sleep 10s.
```
3. 查看 Server 日志
通过下面命令查看 Server 的运行日志:
```shell
tail -f log/serving.INFO
```
日志中显示模型已经被热加载:
```shell
I0330 09:38:40.087316 7361 server.cpp:150] Begin reload framework...
W0330 09:38:40.087399 7361 infer.h:656] Succ reload version engine: 18446744073709551615
I0330 09:38:40.087414 7361 manager.h:131] Finish reload 1 workflow(s)
I0330 09:38:50.087535 7361 server.cpp:150] Begin reload framework...
W0330 09:38:50.087641 7361 infer.h:250] begin reload model[uci_housing_model].
I0330 09:38:50.087972 7361 infer.h:66] InferEngineCreationParams: model_path = uci_housing_model, enable_memory_optimization = 0, static_optimization = 0, force_update_static_cache = 0
I0330 09:38:50.088027 7361 analysis_predictor.cc:88] Profiler is deactivated, and no profiling report will be generated.
I0330 09:38:50.088393 7361 analysis_predictor.cc:841] MODEL VERSION: 1.7.1
I0330 09:38:50.088413 7361 analysis_predictor.cc:843] PREDICTOR VERSION: 1.6.3
I0330 09:38:50.089519 7361 graph_pattern_detector.cc:96] --- detected 1 subgraphs
I0330 09:38:50.090925 7361 analysis_predictor.cc:470] ======= optimize end =======
W0330 09:38:50.090986 7361 infer.h:472] Succ load common model[0x7fc83c06abd0], path[uci_housing_model].
I0330 09:38:50.091022 7361 analysis_predictor.cc:88] Profiler is deactivated, and no profiling report will be generated.
W0330 09:38:50.091050 7361 infer.h:509] td_core[0x7fc83c0ad770] clone model from pd_core[0x7fc83c06abd0] succ, cur_idx[0].
...
W0330 09:38:50.091784 7361 infer.h:489] Succ load clone model, path[uci_housing_model]
W0330 09:38:50.091794 7361 infer.h:656] Succ reload version engine: 18446744073709551615
I0330 09:38:50.091820 7361 manager.h:131] Finish reload 1 workflow(s)
I0330 09:39:00.091987 7361 server.cpp:150] Begin reload framework...
W0330 09:39:00.092161 7361 infer.h:656] Succ reload version engine: 18446744073709551615
I0330 09:39:00.092177 7361 manager.h:131] Finish reload 1 workflow(s)
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册