Merge branch 'PaddlePaddle:develop' into develop

09a353d0 · TeslaZhao · GitHub · 03907aa8 · 86dcf947 · 09a353d0
3 changed file
--- a/doc/Offical_Docs/6-0_C++_Serving_Advanced_Introduction_CN.md
+++ b/doc/Offical_Docs/6-0_C++_Serving_Advanced_Introduction_CN.md
+# 进阶 C++ Serving 介绍
+
+## 概述
+
+本文将对 C++ Serving 除基本功能之外的高级特性、性能调优等问题进行介绍和说明，本文适合以下用户：
+- 想要全面了解 C++ Serving 源码  
+- 想要了解模型热加载、A/B Test、加密模型推理服务等高级特性
+- 通过修改 C++ Serving 参数进行性能调优
+
+## 协议
+
+当您需要自行组装 Request 请求中的数据或者需要二次开发时，您可以参考[相关文档]()。
+
+## 模型热加载
+
+当您需要在 Server 端不停止的情况下更新模型时，您可以参考[相关文档]()。
+
+## A/B Test
+
+当您需要将用户的请求按照一定的流量比例发送到不同的 Server 端时，您可以参考[相关文档]()。
+
+## 加密模型推理服务
+
+当您需要将模型加密部署到 Server 端时，您可以参考[相关文档]()。
+
+## 多模型串联
+
+当您需要将多个模型串联在同一个 Server 中部署时（例如 OCR 需要串联 Det 和 Rec），您可以参考该部分内容。
+
+## 性能优化指南
+
+当您想要对 C++ Serving 服务端进行性能调优时，您可以参考[相关文档]()。
+
+## 性能指标
+
+当您想要了解 C++ Serving 与竞品的性能对比数据时，您可以参考[相关文档]()。
--- a/doc/Offical_Docs/6-1_Inference_Protocols_CN.md
+++ b/doc/Offical_Docs/6-1_Inference_Protocols_CN.md
+# Inference Protocols
+
+C++ Serving 基于 BRPC 进行服务构建，支持 BRPC、GRPC、RESTful 请求。请求数据为 protobuf 格式，详见 `core/general-server/proto/general_model_service.proto`。本文介绍构建请求以及解析结果的方法。
+
+## Tensor
+
+**一.Tensor 定义**
+
+Tensor 可以装载多种类型的数据，是 Request 和 Response 的基础单元。Tensor 的具体定义如下：
+
+```protobuf
+message Tensor {
+  // VarType: INT64
+  repeated int64 int64_data = 1;
+
+  // VarType: FP32
+  repeated float float_data = 2;
+
+  // VarType: INT32
+  repeated int32 int_data = 3;
+
+  // VarType: FP64
+  repeated double float64_data = 4;
+
+  // VarType: UINT32
+  repeated uint32 uint32_data = 5;
+
+  // VarType: BOOL
+  repeated bool bool_data = 6;
+
+  // (No support)VarType: COMPLEX64, 2x represents the real part, 2x+1
+  // represents the imaginary part
+  repeated float complex64_data = 7;
+
+  // (No support)VarType: COMPLEX128, 2x represents the real part, 2x+1
+  // represents the imaginary part
+  repeated double complex128_data = 8;
+
+  // VarType: STRING
+  repeated string data = 9;
+
+  // Element types:
+  //   0 => INT64
+  //   1 => FP32
+  //   2 => INT32
+  //   3 => FP64
+  //   4 => INT16
+  //   5 => FP16
+  //   6 => BF16
+  //   7 => UINT8
+  //   8 => INT8
+  //   9 => BOOL
+  //  10 => COMPLEX64
+  //  11 => COMPLEX128
+  //  20 => STRING
+  int32 elem_type = 10;
+
+  // Shape of the tensor, including batch dimensions.
+  repeated int32 shape = 11;
+
+  // Level of data(LOD), support variable length data, only for fetch tensor
+  // currently.
+  repeated int32 lod = 12;
+
+  // Correspond to the variable 'name' in the model description prototxt.
+  string name = 13;
+
+  // Correspond to the variable 'alias_name' in the model description prototxt.
+  string alias_name = 14; // get from the Model prototxt
+
+  // VarType: FP16, INT16, INT8, BF16, UINT8
+  bytes tensor_content = 15;
+};
+```
+
+- elem_type：数据类型，当前支持 FLOAT32, INT64, INT32, UINT8, INT8, FLOAT16
+
+|elem_type|类型|
+|---------|----|
+|0|INT64|
+|1|FLOAT32|
+|2|INT32|
+|3|FP64|
+|4|INT16|
+|5|FP16|
+|6|BF16|
+|7|UINT8|
+|8|INT8|
+
+- shape：数据维度
+- lod：lod 信息，LoD(Level-of-Detail) Tensor 是 Paddle 的高级特性，是对 Tensor 的一种扩充，用于支持更自由的数据输入。Lod 相关原理介绍，请参考[相关文档](../LOD_CN.md)
+- name/alias_name: 名称及别名，与模型配置对应
+
+**二.构建 Tensor 数据**
+
+1. FLOAT32 类型 Tensor
+
+```C
+// 原始数据
+std::vector<float> float_data;
+Tensor *tensor = new Tensor;
+// 设置维度，可以设置多维
+for (uint32_t j = 0; j < float_shape.size(); ++j) {
+  tensor->add_shape(float_shape[j]);
+}
+// 设置 LOD 信息
+for (uint32_t j = 0; j < float_lod.size(); ++j) {
+  tensor->add_lod(float_lod[j]);
+}
+// 设置类型、名称及别名
+tensor->set_elem_type(1);
+tensor->set_name(name);
+tensor->set_alias_name(alias_name);
+// 拷贝数据
+int total_number = float_data.size();
+tensor->mutable_float_data()->Resize(total_number, 0);
+memcpy(tensor->mutable_float_data()->mutable_data(), float_datadata(), total_number * sizeof(float));
+```
+
+2. INT8 类型 Tensor
+
+```C
+// 原始数据
+std::string string_data;
+Tensor *tensor = new Tensor;
+for (uint32_t j = 0; j < string_shape.size(); ++j) {
+  tensor->add_shape(string_shape[j]);
+}
+for (uint32_t j = 0; j < string_lod.size(); ++j) {
+  tensor->add_lod(string_lod[j]);
+}
+tensor->set_elem_type(8);
+tensor->set_name(name);
+tensor->set_alias_name(alias_name);
+tensor->set_tensor_content(string_data);
+```
+
+## Request
+
+**一.Request 定义**
+
+Request 为客户端需要发送的请求数据，其以 Tensor 为基础数据单元，并包含了额外的请求信息。定义如下：
+
+```protobuf
+message Request {
+  repeated Tensor tensor = 1;
+  repeated string fetch_var_names = 2;
+  bool profile_server = 3;
+  uint64 log_id = 4;
+};
+```
+
+- fetch_vat_names: 需要获取的输出数据名称，在GeneralResponseOP会根据该列表进行过滤.请参考模型文件serving_client_conf.prototxt中的`fetch_var`字段下的`alias_name`。
+- profile_server: 调试参数，打开时会输出性能信息
+- log_id: 请求ID
+
+**二.构建 Request**
+
+1. Protobuf 形式
+
+当使用 BRPC 或 GRPC 进行请求时，使用 protobuf 形式数据，构建方式如下：
+
+```C
+Request req;
+req.set_log_id(log_id);
+for (auto &name : fetch_name) {
+  req.add_fetch_var_names(name);
+}
+// 添加Tensor
+Tensor *tensor = req.add_tensor();
+...
+```
+2. Json 形式
+
+当使用 RESTful 请求时，可以使用 Json 形式数据，具体格式如下：
+
+```Json
+{"tensor":[{"float_data":[0.0137,-0.1136,0.2553,-0.0692,0.0582,-0.0727,-0.1583,-0.0584,0.6283,0.4919,0.1856,0.0795,-0.0332],"elem_type":1,"name":"x","alias_name":"x","shape":[1,13]}],"fetch_var_names":["price"],"log_id":0}
+```
+
+## Response
+
+**一.Response 定义**
+
+Response 为服务端返回给客户端的结果，包含了 Tensor 数据、错误码、错误信息等。定义如下：
+
+```protobuf
+message Response {
+  repeated ModelOutput outputs = 1;
+  repeated int64 profile_time = 2;
+  // Error code
+  int32 err_no = 3;
+
+  // Error messages
+  string err_msg = 4;
+};
+
+message ModelOutput {
+  repeated Tensor tensor = 1;
+  string engine_name = 2;
+}
+```
+
+- profile_time：当设置 request->set_profile_server(true) 时，会返回性能信息
+- err_no：错误码，详见`core/predictor/common/constant.h`
+- err_msg：错误信息，详见`core/predictor/common/constant.h`
+- engine_name：输出节点名称
+
+|err_no|err_msg|
+|---------|----|
+|0|OK|
+|-5000|"Paddle Serving Framework Internal Error."|
+|-5001|"Paddle Serving Memory Alloc Error."|
+|-5002|"Paddle Serving Array Overflow Error."|
+|-5100|"Paddle Serving Op Inference Error."|
+
+**二.读取 Response 数据**
+
+```C
+uint32_t model_num = res.outputs_size();
+for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
+  std::string engine_name = output.engine_name();
+  int idx = 0;
+  // 读取 tensor 维度
+  int shape_size = output.tensor(idx).shape_size();
+  for (int i = 0; i < shape_size; ++i) {
+    shape[i] = output.tensor(idx).shape(i);
+  }
+  // 读取 LOD 信息
+  int lod_size = output.tensor(idx).lod_size();
+  if (lod_size > 0) {
+    lod.resize(lod_size);
+    for (int i = 0; i < lod_size; ++i) {
+      lod[i] = output.tensor(idx).lod(i);
+    }
+  }
+  // 读取 float 数据
+  int size = output.tensor(idx).float_data_size();
+  float_data = std::vector<float>(
+      output.tensor(idx).float_data().begin(),
+      output.tensor(idx).float_data().begin() + size);
+  // 读取 int8 数据
+  string_data = output.tensor(idx).tensor_content();
+}
+```
--- a/doc/Offical_Docs/6-2_Hot_Loading_CN.md
+++ b/doc/Offical_Docs/6-2_Hot_Loading_CN.md
+# Paddle Serving 中的模型热加载
+
+## 背景
+
+在实际的工业场景下，通常是远端定期不间断产出模型，线上服务端需要在服务不中断的情况下拉取新模型对旧模型进行更新迭代。
+
+## Server Monitor
+
+Paddle Serving 提供了一个自动监控脚本，远端地址更新模型后会拉取新模型更新本地模型，同时更新本地模型文件夹中的时间戳文件 `fluid_time_stamp` 实现热加载。
+
+目前支持下面几种类型的远端监控 Monitor：
+
+| Monitor类型 |                             描述                             |                           特殊选项                           |
+| :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+|   general   | 远端无认证，可以通过 `wget` 直接访问下载文件（如无需认证的FTP，BOS等） |                 `general_host` 通用远端host                  |
+|  hdfs/afs(HadoopMonitor)   |        远端为 HDFS 或 AFS，通过 Hadoop-Client 执行相关命令        | `hadoop_bin` Hadoop 二进制的路径 <br/>`fs_name` Hadoop fs_name，默认为空<br/>`fs_ugi` Hadoop fs_ugi，默认为空 |
+|     ftp     | 远端为 FTP，通过 `ftplib` 进行相关访问（使用该 Monitor，您可能需要执行 `pip install ftplib` 下载 `ftplib`） | `ftp_host` FTP host<br>`ftp_port` FTP port<br>`ftp_username` FTP username，默认为空<br>`ftp_password` FTP password，默认为空 |
+
+|    Monitor通用选项     |                             描述                             |         默认值         |
+| :--------------------: | :----------------------------------------------------------: | :--------------------: |
+|         `type`         |                       指定 Monitor 类型                        |           无           |
+|     `remote_path`      |                      指定远端的基础路径                      |           无           |
+|  `remote_model_name`   |                   指定远端需要拉取的模型名                   |           无           |
+| `remote_donefile_name` |           指定远端标志模型更新完毕的 donefile 文件名           |           无           |
+|      `local_path`      |                       指定本地工作路径                       |           无           |
+|   `local_model_name`   |                        指定本地模型名                        |           无           |
+| `local_timestamp_file` | 指定本地用于热加载的时间戳文件，该文件被认为在 `local_path/local_model_name` 下。 |   `fluid_time_file`    |
+|    `local_tmp_path`    |    指定本地存放临时文件的文件夹路径，若不存在则自动创建。    | `_serving_monitor_tmp` |
+|       `interval`       |                 指定轮询间隔时间，单位为秒。                 |          `10`          |
+|  `unpacked_filename`   | Monitor 支持 tarfile 打包的远程模型。如果远程模型是打包格式，则需要设置该选项来告知 Monitor 解压后的文件名。 |         `None`         |
+|        `debug`         |       如果添加 `--debug` 选项，则输出更详细的中间信息。        |    默认不添加该选项    |
+
+下面通过 HadoopMonitor 示例来展示 Paddle Serving 的模型热加载功能。
+
+## HadoopMonitor 示例
+
+示例中在 `product_path` 中生产模型上传至 hdfs，在 `server_path` 中模拟服务端模型热加载：
+
+```shell
+.
+├── product_path
+└── server_path
+```
+
+**一.生产模型**
+
+在 `product_path` 下运行下面的 Python 代码生产模型（运行前需要修改 hadoop 相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件 `uci_housing.tar.gz` 并上传至 hdfs 的`/`路径下，上传完毕后更新时间戳文件 `donefile` 并上传至 hdfs 的`/`路径下。
+
+```python
+import os
+import sys
+import time
+import tarfile
+import paddle
+import paddle.fluid as fluid
+import paddle_serving_client.io as serving_io
+
+train_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.train(), buf_size=500),
+    batch_size=16)
+
+test_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.test(), buf_size=500),
+    batch_size=16)
+
+x = fluid.data(name='x', shape=[None, 13], dtype='float32')
+y = fluid.data(name='y', shape=[None, 1], dtype='float32')
+
+y_predict = fluid.layers.fc(input=x, size=1, act=None)
+cost = fluid.layers.square_error_cost(input=y_predict, label=y)
+avg_loss = fluid.layers.mean(cost)
+sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
+sgd_optimizer.minimize(avg_loss)
+
+place = fluid.CPUPlace()
+feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+
+def push_to_hdfs(local_file_path, remote_path):
+    afs = 'afs://***.***.***.***:***' # User needs to change
+    uci = '***,***' # User needs to change
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
+
+name = "uci_housing"
+for pass_id in range(30):
+    for data_train in train_reader():
+        avg_loss_value, = exe.run(fluid.default_main_program(),
+                                  feed=feeder.feed(data_train),
+                                  fetch_list=[avg_loss])
+    # Simulate the production model every other period of time
+    time.sleep(60)
+    model_name = "{}_model".format(name)
+    client_name = "{}_client".format(name)
+    serving_io.save_model(model_name, client_name,
+                          {"x": x}, {"price": y_predict},
+                          fluid.default_main_program())
+    # Packing model
+    tar_name = "{}.tar.gz".format(name)
+    tar = tarfile.open(tar_name, 'w:gz')
+    tar.add(model_name)
+    tar.close()
+
+    # Push packaged model file to hdfs
+    push_to_hdfs(tar_name, '/')
+
+    # Generate donefile
+    donefile_name = 'donefile'
+    os.system('touch {}'.format(donefile_name))
+
+    # Push donefile to hdfs
+    push_to_hdfs(donefile_name, '/')
+```
+
+hdfs 上的文件如下列所示：
+
+```bash
+# hadoop fs -ls /
+Found 2 items
+-rw-r--r--   1 root supergroup          0 2020-04-02 02:54 /donefile
+-rw-r--r--   1 root supergroup       2101 2020-04-02 02:54 /uci_housing.tar.gz
+```
+
+**二.服务端加载模型**
+
+进入 `server_path` 文件夹。
+
+1. 用初始模型启动 Server 端
+
+这里使用预训练的 Boston 房价预测模型作为初始模型：
+
+```shell
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+tar -xzf uci_housing.tar.gz
+```
+
+启动 Server 端：
+
+```shell
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
+```
+
+2. 执行监控程序
+
+用下面的命令来执行 HDFS 监控程序：
+
+```shell
+python -m paddle_serving_server.monitor \
+	--type='hdfs' --hadoop_bin='/hadoop-3.1.2/bin/hadoop' \
+	--remote_path='/' --remote_model_name='uci_housing.tar.gz' \
+	--remote_donefile_name='donefile' --local_path='.' \
+	--local_model_name='uci_housing_model' --local_timestamp_file='fluid_time_file' \
+	--local_tmp_path='_tmp' --unpacked_filename='uci_housing_model' --debug
+```
+
+上面代码通过轮询方式监控远程 HDFS 地址`/`的时间戳文件`/donefile`，当时间戳变更则认为远程模型已经更新，将远程打包模型`/uci_housing.tar.gz`拉取到本地临时路径`./_tmp/uci_housing.tar.gz`下，解包出模型文件`./_tmp/uci_housing_model`后，更新本地模型`./uci_housing_model`以及Paddle Serving的时间戳文件`./uci_housing_model/fluid_time_file`。
+
+预计输出如下：
+
+```shell
+2020-04-02 10:12 INFO     [monitor.py:85] _hadoop_bin: /hadoop-3.1.2/bin/hadoop
+2020-04-02 10:12 INFO     [monitor.py:85] _fs_name:
+2020-04-02 10:12 INFO     [monitor.py:85] _fs_ugi:
+2020-04-02 10:12 INFO     [monitor.py:209] AFS prefix cmd: /hadoop-3.1.2/bin/hadoop fs
+2020-04-02 10:12 INFO     [monitor.py:85] _remote_path: /
+2020-04-02 10:12 INFO     [monitor.py:85] _remote_model_name: uci_housing.tar.gz
+2020-04-02 10:12 INFO     [monitor.py:85] _remote_donefile_name: donefile
+2020-04-02 10:12 INFO     [monitor.py:85] _local_model_name: uci_housing_model
+2020-04-02 10:12 INFO     [monitor.py:85] _local_path: .
+2020-04-02 10:12 INFO     [monitor.py:85] _local_timestamp_file: fluid_time_file
+2020-04-02 10:12 INFO     [monitor.py:85] _local_tmp_path: _tmp
+2020-04-02 10:12 INFO     [monitor.py:85] _interval: 10
+2020-04-02 10:12 DEBUG    [monitor.py:214] check cmd: /hadoop-3.1.2/bin/hadoop fs  -ls /donefile 2>/dev/null
+2020-04-02 10:12 DEBUG    [monitor.py:216] resp: -rw-r--r--   1 root supergroup          0 2020-04-02 10:11 /donefile
+2020-04-02 10:12 INFO     [monitor.py:138] doneilfe(donefile) changed.
+2020-04-02 10:12 DEBUG    [monitor.py:233] pull cmd: /hadoop-3.1.2/bin/hadoop fs  -get /uci_housing.tar.gz _tmp/uci_housing.tar.gz 2>/dev/null
+2020-04-02 10:12 INFO     [monitor.py:144] pull remote model(uci_housing.tar.gz).
+2020-04-02 10:12 INFO     [monitor.py:98] unpack remote file(uci_housing.tar.gz).
+2020-04-02 10:12 DEBUG    [monitor.py:108] remove packed file(uci_housing.tar.gz).
+2020-04-02 10:12 INFO     [monitor.py:110] using unpacked filename: uci_housing_model.
+2020-04-02 10:12 DEBUG    [monitor.py:175] update model cmd: cp -r _tmp/uci_housing_model/* ./uci_housing_model
+2020-04-02 10:12 INFO     [monitor.py:152] update local model(uci_housing_model).
+2020-04-02 10:12 DEBUG    [monitor.py:184] update timestamp cmd: touch ./uci_housing_model/fluid_time_file
+2020-04-02 10:12 INFO     [monitor.py:157] update model timestamp(fluid_time_file).
+2020-04-02 10:12 INFO     [monitor.py:161] sleep 10s.
+2020-04-02 10:12 DEBUG    [monitor.py:214] check cmd: /hadoop-3.1.2/bin/hadoop fs  -ls /donefile 2>/dev/null
+2020-04-02 10:12 DEBUG    [monitor.py:216] resp: -rw-r--r--   1 root supergroup          0 2020-04-02 10:11 /donefile
+2020-04-02 10:12 INFO     [monitor.py:161] sleep 10s.
+```
+
+3. 查看 Server 日志
+
+通过下面命令查看 Server 的运行日志：
+
+```shell
+tail -f log/serving.INFO
+```
+
+日志中显示模型已经被热加载：
+
+```shell
+I0330 09:38:40.087316  7361 server.cpp:150] Begin reload framework...
+W0330 09:38:40.087399  7361 infer.h:656] Succ reload version engine: 18446744073709551615
+I0330 09:38:40.087414  7361 manager.h:131] Finish reload 1 workflow(s)
+I0330 09:38:50.087535  7361 server.cpp:150] Begin reload framework...
+W0330 09:38:50.087641  7361 infer.h:250] begin reload model[uci_housing_model].
+I0330 09:38:50.087972  7361 infer.h:66] InferEngineCreationParams: model_path = uci_housing_model, enable_memory_optimization = 0, static_optimization = 0, force_update_static_cache = 0
+I0330 09:38:50.088027  7361 analysis_predictor.cc:88] Profiler is deactivated, and no profiling report will be generated.
+I0330 09:38:50.088393  7361 analysis_predictor.cc:841] MODEL VERSION: 1.7.1
+I0330 09:38:50.088413  7361 analysis_predictor.cc:843] PREDICTOR VERSION: 1.6.3
+I0330 09:38:50.089519  7361 graph_pattern_detector.cc:96] ---  detected 1 subgraphs
+I0330 09:38:50.090925  7361 analysis_predictor.cc:470] ======= optimize end =======
+W0330 09:38:50.090986  7361 infer.h:472] Succ load common model[0x7fc83c06abd0], path[uci_housing_model].
+I0330 09:38:50.091022  7361 analysis_predictor.cc:88] Profiler is deactivated, and no profiling report will be generated.
+W0330 09:38:50.091050  7361 infer.h:509] td_core[0x7fc83c0ad770] clone model from pd_core[0x7fc83c06abd0] succ, cur_idx[0].
+...
+W0330 09:38:50.091784  7361 infer.h:489] Succ load clone model, path[uci_housing_model]
+W0330 09:38:50.091794  7361 infer.h:656] Succ reload version engine: 18446744073709551615
+I0330 09:38:50.091820  7361 manager.h:131] Finish reload 1 workflow(s)
+I0330 09:39:00.091987  7361 server.cpp:150] Begin reload framework...
+W0330 09:39:00.092161  7361 infer.h:656] Succ reload version engine: 18446744073709551615
+I0330 09:39:00.092177  7361 manager.h:131] Finish reload 1 workflow(s)
+```