Merge branch 'develop' of https://github.com/TeslaZhao/Serving into develop

aff56c53 · TeslaZhao · 4b6cc8d1 · ad3e820d · aff56c53 · aff56c53
显示空白变更内容
内联并排

Showing with 88 addition and 41 deletion

doc/Check_Env_CN.md doc/Check_Env_CN.md +22 -0

doc/TensorRT_Dynamic_Shape_CN.md doc/TensorRT_Dynamic_Shape_CN.md +66 -41

未找到文件。
--- a/doc/Check_Env_CN.md
+++ b/doc/Check_Env_CN.md
+# Paddle Serving 环境检查功能介绍
+## 概览
+Paddle Serving 提供了一键运行示例，检查 Paddle Serving 环境是否安装正确。
+## 启动方式
+```
+python3 -m paddle_serving_server.serve check
+```
+|命令|描述|
+|---------|----|
+|check_all|检查 Paddle Inference、Pipeline Serving、C++ Serving。只打印检测结果，不记录日志|
+|check_pipeline|检查 Pipeline Serving，只打印检测结果，不记录日志|
+|check_cpp|检查 C++ Serving，只打印检测结果，不记录日志|
+|check_inference|检查 Paddle Inference 是否安装正确，只打印检测结果，不记录日志|
+|debug|发生报错后，该命令将打印提示日志到屏幕，并记录详细日志文件|
+|exit|退出|
+>> **注意**:<br>
+>> 1.当 C++ Serving 启动报错且是自己编译后 pip 安装的paddle_serving_server, 确认是否执行 `export SERVING_BIN` 导入`SERVING_BIN`真实路径。<br>
+>> 2.可以通过 `export SERVING_LOG_PATH` 指定`debug`命令生成log的路径，默认是在当前路径下记录日志。
--- a/doc/TensorRT_Dynamic_Shape_CN.md
+++ b/doc/TensorRT_Dynamic_Shape_CN.md
-# 如何配置TensorRT动态shape
+# 如何开启 TensorRT 并配置动态 shape
 (简体中文|[English](./TensorRT_Dynamic_Shape_EN.md))
-## 引言
+## 概览
-在Pipeline/C++开启TensorRT`--use_trt`后，关于如何进行动态shape的配置，以下会分别给出Pipeline Serving和C++ Serving示例
+TensorRT是一个高性能的深度学习推理（Inference）优化器，可以为深度学习应用提供低延迟、高吞吐率的部署推理。
+以下将分别从 Pipeline Serving 和 C++ Serving 介绍 Tensorrt 开启方式以及配置动态 shape(Dynamic Shape)。
-以下是动态shape api
+## Paddle Inference Dynamic Shape Api
 ```
  void SetTRTDynamicShapeInfo(
      std::map<std::string, std::vector<int>> min_input_shape,
@@ -15,7 +16,23 @@
 ```
 具体API说明请参考[C++](https://paddleinference.paddlepaddle.org.cn/api_reference/cxx_api_doc/Config/GPUConfig.html#tensorrt)/[Python](https://paddleinference.paddlepaddle.org.cn/api_reference/python_api_doc/Config/GPUConfig.html#tensorrt)
-### C++ Serving
+## C++ Serving 
+**一. C++ Serving Tensorrt 开启方式**
+在 C++ Serving 启动命令加上`--use_trt`
+```
+python -m paddle_serving_server.serve \
+--model serving_server \
+--thread 2 --port 9000 \
+--gpu_ids 0 \
+--use_trt \
+--precision FP16
+```
+**二. C++ Serving 设置动态 shape**
 在`**/paddle_inference/paddle/include/paddle_engine.h` 修改如下代码
 ```
@@ -111,44 +128,52 @@
 ```
-### Pipeline Serving
+## Pipeline Serving
+**一. Pipeline Serving Tensorrt 开启方式**
-在`**/python/paddle_serving_app/local_predict.py`中修改如下代码
+在示例目录下的 config.yml 文件, 修改`device_type: 2`, 配置 GPU 使用的核心 `devices: "0,1,2,3"`
+>> **注意**: Tensorrt 需要配合 GPU 使用
+**二. Pipeline Serving 设置动态 shape**
+在示例目录下的 web_service.py, 在每个 op 下可以通过 `def set_dynamic_shape_info(self):` 添加动态 shape 相关的配置
+示例如下
 ```
-if use_trt:
+def set_dynamic_shape_info(self):
-    config.enable_tensorrt_engine(
+    min_input_shape = {
-        precision_mode=precision_type,
+        "x": [1, 3, 50, 50],
-        workspace_size=1 << 20,
+        "conv2d_182.tmp_0": [1, 1, 20, 20],
-        max_batch_size=32,
+        "nearest_interp_v2_2.tmp_0": [1, 1, 20, 20],
-        min_subgraph_size=3,
+        "nearest_interp_v2_3.tmp_0": [1, 1, 20, 20],
-        use_static=False,
+        "nearest_interp_v2_4.tmp_0": [1, 1, 20, 20],
-        use_calib_mode=False)
+        "nearest_interp_v2_5.tmp_0": [1, 1, 20, 20]
-    head_number = 12
+    }
+    max_input_shape = {
-    names = [
+        "x": [1, 3, 1536, 1536],
-        "placeholder_0", "placeholder_1", "placeholder_2", "stack_0.tmp_0"
+        "conv2d_182.tmp_0": [20, 200, 960, 960],
-    ]
+        "nearest_interp_v2_2.tmp_0": [20, 200, 960, 960],
-    min_input_shape = [1, 1, 1]
+        "nearest_interp_v2_3.tmp_0": [20, 200, 960, 960],
-    max_input_shape = [100, 128, 1]
+        "nearest_interp_v2_4.tmp_0": [20, 200, 960, 960],
-    opt_input_shape = [10, 60, 1]
+        "nearest_interp_v2_5.tmp_0": [20, 200, 960, 960],
+    }
-    config.set_trt_dynamic_shape_info(
+    opt_input_shape = {
-        {
+        "x": [1, 3, 960, 960],
-            names[0]: min_input_shape,
+        "conv2d_182.tmp_0": [3, 96, 240, 240],
-            names[1]: min_input_shape,
+        "nearest_interp_v2_2.tmp_0": [3, 96, 240, 240],
-            names[2]: min_input_shape,
+        "nearest_interp_v2_3.tmp_0": [3, 24, 240, 240],
-            names[3]: [1, head_number, 1, 1]
+        "nearest_interp_v2_4.tmp_0": [3, 24, 240, 240],
-        }, {
+        "nearest_interp_v2_5.tmp_0": [3, 24, 240, 240],
-            names[0]: max_input_shape,
+    }
-            names[1]: max_input_shape,
+    self.dynamic_shape_info = {
-            names[2]: max_input_shape,
+        "min_input_shape": min_input_shape,
-            names[3]: [100, head_number, 128, 128]
+        "max_input_shape": max_input_shape,
-        }, {
+        "opt_input_shape": opt_input_shape,
-            names[0]: opt_input_shape,
+    }
-            names[1]: opt_input_shape,
-            names[2]: opt_input_shape,
-            names[3]: [10, head_number, 60, 60]
-        })
 ```
+具体可以参考[Pipeline OCR](../examples/Pipeline/PaddleOCR/ocr/)
+>> **注意**: 由于不同的模型具有不同的动态 shape 配置，因此不存在通用的动态 shape 配置方法。当运行 Pipeline Serving 
+>> 出现报错信息时，应该使用[netron](https://netron.app/) 加载模型，查看各个 op 的输入输出 shape。之后，结合报错信息，在 web_service.py 
+>> 添加相应的动态 shape 配置代码。