diff --git a/README.md b/README.md index 911b4a7c056f9f487f80f977a6ad3fe351828d04..cef9b1eb4f236b2ece8bc6ef1f31a16025f2353c 100755 --- a/README.md +++ b/README.md @@ -56,8 +56,8 @@ This chapter guides you through the installation and deployment steps. It is str - [Install Paddle Serving using docker](doc/Install_EN.md) - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md) -- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_EN.md) -- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker.md) +- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md) +- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md) - [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md) - [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop) @@ -68,10 +68,10 @@ The first step is to call the model save interface to generate a model parameter - [Quick Start](doc/Quick_Start_EN.md) - [Save a servable model](doc/Save_EN.md) - [Description of configuration and startup parameters](doc/Serving_Configure_EN.md) -- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Http_Service_EN.md) +- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md) - [Infer on quantizative models](doc/Low_Precision_CN.md) - [Data format of classic models](doc/Process_Data_CN.md) -- [C++ Serving](doc/C++_Serving/Introduction_EN.md) +- [C++ Serving](doc/C++_Serving/Introduction_CN.md) - [protocols](doc/C++_Serving/Inference_Protocols_CN.md) - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md) - [A/B Test](doc/C++_Serving/ABTest_EN.md) @@ -101,20 +101,20 @@ For Paddle Serving developers, we provide extended documents such as custom OP, Paddle Serving works closely with the Paddle model suite, and implements a large number of service deployment examples, including image classification, object detection, language and text recognition, Chinese part of speech, sentiment analysis, content recommendation and other types of examples, for a total of 42 models. -
+

| PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | | :----: | :----: | :----: | :----: | :----: | :----: | | 8 | 12 | 13 | 2 | 3 | 4 | -

+

For more model examples, read [Model zoo](doc/Model_Zoo_EN.md) -
- - -
+

+ + +

Community

@@ -122,10 +122,19 @@ For more model examples, read [Model zoo](doc/Model_Zoo_EN.md) If you want to communicate with developers and other users? Welcome to join us, join the community through the following methods below. ### Wechat -- 微信用户请扫码 +- WeChat scavenging + + +

+ +

### QQ -- 飞桨推理部署交流群(Group No.:696965088) +- 飞桨推理部署交流群(Group No.:697765514) + +

+ +

### Slack @@ -134,11 +143,12 @@ If you want to communicate with developers and other users? Welcome to join us, > Contribution If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/Contribute_EN.md) - -- Special Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command -- Special Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark -- Special Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark and modifying resize comment error -- Special Thanks to [@cuicheng01](https://github.com/cuicheng01) for providing 11 PaddleClas models +- Thanks to [@loveululu](https://github.com/loveululu) for providing python API of Cube. +- Thanks to [@EtachGu](https://github.com/EtachGu) in updating run docker codes. +- Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command +- Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark +- Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark modifying resize comment error +- Thanks to [@cuicheng01](https://github.com/cuicheng01) for providing 11 PaddleClas models > Feedback diff --git a/README_CN.md b/README_CN.md index f80e62436b1faea245580a3a7f7b244ef60f195f..397cb184fbf85545e7290aed2eca7a15c54ff624 100644 --- a/README_CN.md +++ b/README_CN.md @@ -64,9 +64,9 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 安装Paddle Serving后,使用快速开始将引导您运行Serving。第一步,调用模型保存接口,生成模型参数配置文件(.prototxt)用以在客户端和服务端使用;第二步,阅读配置和启动参数并启动服务;第三步,根据API和您的使用场景,基于SDK编写客户端请求,并测试推理服务。您想了解跟多特性的使用场景和方法,请详细阅读以下文档。 - [快速开始](doc/Quick_Start_CN.md) -- [保存用于Paddle Serving的模型和配置](doc/SAVE_CN.md) +- [保存用于Paddle Serving的模型和配置](doc/Save_CN.md) - [配置和启动参数的说明](doc/Serving_Configure_CN.md) -- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Http_Service_CN.md) +- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性) - [低精度推理](doc/Low_Precision_CN.md) - [常见模型数据处理](doc/Process_data_CN.md) - [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) @@ -95,21 +95,21 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发

模型库

Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例,以及Paddle全链条项目,共计42个模型。 -
+ +

| PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | | :----: | :----: | :----: | :----: | :----: | :----: | | 8 | 12 | 13 | 2 | 3 | 4 | -

+

+ 更多模型示例参考Repo,可进入[模型库](doc/Model_Zoo_CN.md) -
- - - - -
+

+ + +

社区

@@ -119,8 +119,16 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署, ### 微信 - 微信用户请扫码 +

+ +

+ ### QQ -- 飞桨推理部署交流群(群号:696965088) +- 飞桨推理部署交流群(Group No.:697765514) + +

+ +

### Slack - [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ) @@ -129,11 +137,12 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署, > 贡献代码 如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute.md) - -- 特别感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。 -- 特别感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本 -- 特别感谢 [@cg82616424](https://github.com/cg82616424) 提供unet benchmark脚本和修改部分注释错误 -- 特别感谢 [@cuicheng01](https://github.com/cuicheng01) 提供PaddleClas的11个模型 +- 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API +- 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令 +- 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。 +- 感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本 +- 感谢 [@cg82616424](https://github.com/cg82616424) 提供unet benchmark脚本和修改部分注释错误 +- 感谢 [@cuicheng01](https://github.com/cuicheng01) 提供PaddleClas的11个模型 > 反馈 diff --git a/doc/C++_Serving/Performance_Tuning_CN.md b/doc/C++_Serving/Performance_Tuning_CN.md index 37d1baf6aafb2940e85960b568fd5d9edd45862e..b7ec523259eaa4ecfbad5c63d2c11666a6affdaa 100755 --- a/doc/C++_Serving/Performance_Tuning_CN.md +++ b/doc/C++_Serving/Performance_Tuning_CN.md @@ -1,6 +1,6 @@ # C++ Serving性能分析与优化 # 1.背景知识介绍 -1) 首先,应确保您知道C++ Serving常用的一些[功能特点](Introduction_CN.md)和[C++ Serving 参数配置和启动的详细说明](../SERVING_CONFIGURE_CN.md。 +1) 首先,应确保您知道C++ Serving常用的一些[功能特点](Introduction_CN.md)和[C++ Serving 参数配置和启动的详细说明](../Serving_Configure_CN.md)。 2) 关于C++ Serving框架本身的性能分析和介绍,请参考[C++ Serving框架性能测试](Frame_Performance_CN.md)。 3) 您需要对您使用的模型、机器环境、需要部署上线的业务有一些了解,例如,您使用CPU还是GPU进行预测;是否可以开启TRT进行加速;你的机器CPU是多少core的;您的业务包含几个模型;每个模型的输入和输出需要做些什么处理;您业务的最大线上流量是多少;您的模型支持的最大输入batch是多少等等. diff --git a/doc/Model_Zoo_CN.md b/doc/Model_Zoo_CN.md index d42e6601e033b1f364d7d5110eca6087851c3bcc..645523a2d6fbd01ee842dd146bb8bf290ba3ffd0 100644 --- a/doc/Model_Zoo_CN.md +++ b/doc/Model_Zoo_CN.md @@ -57,3 +57,13 @@ - 更多Paddle Serving支持的部署模型请参考[wholechain](https://www.paddlepaddle.org.cn/wholechain) + +- 最新模型可参考 + - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) + - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) + - [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) + - [PaddleRec](https://github.com/PaddlePaddle/PaddleRec) + - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) + - [PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN) + diff --git a/doc/Model_Zoo_EN.md b/doc/Model_Zoo_EN.md index 26addb3e31011ac4d6d07ce8536af3da377e646a..67fcaf6dd5c535a7fc7c56336ac8ea9ccb996396 100644 --- a/doc/Model_Zoo_EN.md +++ b/doc/Model_Zoo_EN.md @@ -57,3 +57,11 @@ Special thanks to the [Padddle wholechain](https://www.paddlepaddle.org.cn/whole - Refer [wholechain](https://www.paddlepaddle.org.cn/wholechain) for more pre-trained models supported by PaddleServing +- Latest models refer + - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) + - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) + - [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) + - [PaddleRec](https://github.com/PaddlePaddle/PaddleRec) + - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) + - [PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN) diff --git a/doc/Python_Pipeline/Performance_Tuning_CN.md b/doc/Python_Pipeline/Performance_Tuning_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..4c547edea95db27f59a4d4a85a58a5adffd734a5 --- /dev/null +++ b/doc/Python_Pipeline/Performance_Tuning_CN.md @@ -0,0 +1,82 @@ +# Pipeline Serving 性能优化 + +([English](./Performance_Tuning_EN.md)|简体中文) + +## 1. 性能分析与优化 + + +### 1.1 如何通过 Timeline 工具进行优化 + +为了更好地对性能进行优化,PipelineServing 提供了 Timeline 工具,对整个服务的各个阶段时间进行打点。 + +### 1.2 在 Server 端输出 Profile 信息 + +Server 端用 yaml 中的 `use_profile` 字段进行控制: + +```yaml +dag: + use_profile: true +``` + +开启该功能后,Server 端在预测的过程中会将对应的日志信息打印到标准输出,为了更直观地展现各阶段的耗时,提供 Analyst 模块对日志文件做进一步的分析处理。 + +使用时先将 Server 的输出保存到文件,以 `profile.txt` 为例,脚本将日志中的时间打点信息转换成 json 格式保存到 `trace` 文件,`trace` 文件可以通过 chrome 浏览器的 tracing 功能进行可视化。 + +```python +from paddle_serving_server.pipeline import Analyst +import json +import sys + +if __name__ == "__main__": + log_filename = "profile.txt" + trace_filename = "trace" + analyst = Analyst(log_filename) + analyst.save_trace(trace_filename) +``` + +具体操作:打开 chrome 浏览器,在地址栏输入 `chrome://tracing/` ,跳转至 tracing 页面,点击 load 按钮,打开保存的 `trace` 文件,即可将预测服务的各阶段时间信息可视化。 + +### 1.3 在 Client 端输出 Profile 信息 + +Client 端在 `predict` 接口设置 `profile=True`,即可开启 Profile 功能。 + +开启该功能后,Client 端在预测的过程中会将该次预测对应的日志信息打印到标准输出,后续分析处理同 Server。 + +### 1.4 分析方法 +根据pipeline.tracer日志中的各个阶段耗时,按以下公式逐步分析出主要耗时在哪个阶段。 +``` +单OP耗时: +op_cost = process(pre + mid + post) + +OP期望并发数: +op_concurrency = 单OP耗时(s) * 期望QPS + +服务吞吐量: +service_throughput = 1 / 最慢OP的耗时 * 并发数 + +服务平响: +service_avg_cost = ∑op_concurrency 【关键路径】 + +Channel堆积: +channel_acc_size = QPS(down - up) * time + +批量预测平均耗时: +avg_batch_cost = (N * pre + mid + post) / N +``` + +### 1.5 优化思路 +根据长耗时在不同阶段,采用不同的优化方法. +- OP推理阶段(mid-process): + - 增加OP并发度 + - 开启auto-batching(前提是多个请求的shape一致) + - 若批量数据中某条数据的shape很大,padding很大导致推理很慢,可使用mini-batch + - 开启TensorRT/MKL-DNN优化 + - 开启低精度推理 +- OP前处理阶段(pre-process): + - 增加OP并发度 + - 优化前处理逻辑 +- in/out耗时长(channel堆积>5) + - 检查channel传递的数据大小和延迟 + - 优化传入数据,不传递数据或压缩后再传入 + - 增加OP并发度 + - 减少上游OP并发度 diff --git a/doc/Python_Pipeline/Performance_Tuning_EN.md b/doc/Python_Pipeline/Performance_Tuning_EN.md new file mode 100644 index 0000000000000000000000000000000000000000..8ea9afb00b5e41ac95208c2437e0ea8abfc56426 --- /dev/null +++ b/doc/Python_Pipeline/Performance_Tuning_EN.md @@ -0,0 +1,85 @@ +# Pipeline Serving Performance Optimization + +(English|[简体中文](./Performance_Tuning_CN.md)) + + +## 1. Performance analysis and optimization + + +### 1.1 How to optimize with the timeline tool + +In order to better optimize the performance, PipelineServing provides a timeline tool to monitor the time of each stage of the whole service. + +### 1.2 Output profile information on server side + +The server is controlled by the `use_profile` field in yaml: + +```yaml +dag: + use_profile: true +``` + +After the function is enabled, the server will print the corresponding log information to the standard output in the process of prediction. In order to show the time consumption of each stage more intuitively, Analyst module is provided for further analysis and processing of log files. + +The output of the server is first saved to a file. Taking `profile.txt` as an example, the script converts the time monitoring information in the log into JSON format and saves it to the `trace` file. The `trace` file can be visualized through the tracing function of Chrome browser. + +```shell +from paddle_serving_server.pipeline import Analyst +import json +import sys + +if __name__ == "__main__": + log_filename = "profile.txt" + trace_filename = "trace" + analyst = Analyst(log_filename) + analyst.save_trace(trace_filename) +``` + +Specific operation: open Chrome browser, input in the address bar `chrome://tracing/` , jump to the tracing page, click the load button, open the saved `trace` file, and then visualize the time information of each stage of the prediction service. + +### 1.3 Output profile information on client side + +The profile function can be enabled by setting `profile=True` in the `predict` interface on the client side. + +After the function is enabled, the client will print the log information corresponding to the prediction to the standard output during the prediction process, and the subsequent analysis and processing are the same as that of the server. + +### 1.4 Analytical methods +According to the time consumption of each stage in the pipeline.tracer log, the following formula is used to gradually analyze which stage is the main time consumption. + +``` +cost of one single OP: +op_cost = process(pre + mid + post) + +OP Concurrency: +op_concurrency = op_cost(s) * qps_expected + +Service throughput: +service_throughput = 1 / slowest_op_cost * op_concurrency + +Service average cost: +service_avg_cost = ∑op_concurrency in critical Path + +Channel accumulations: +channel_acc_size = QPS(down - up) * time + +Average cost of batch predictor: +avg_batch_cost = (N * pre + mid + post) / N +``` + +### 1.5 Optimization ideas +According to the long time consuming in stages below, different optimization methods are adopted. +- OP Inference stage(mid-process): + - Increase `concurrency` + - Turn on `auto-batching`(Ensure that the shapes of multiple requests are consistent) + - Use `mini-batch`, If the shape of data is very large. + - Turn on TensorRT for GPU + - Turn on MKLDNN for CPU + - Turn on low precison inference +- OP preprocess or postprocess stage: + - Increase `concurrency` + - Optimize processing logic +- In/Out stage(channel accumulation > 5): + - Check the size and delay of the data passed by the channel + - Optimize the channel to transmit data, do not transmit data or compress it before passing it in + - Increase `concurrency` + - Decrease `concurrency` upstreams. diff --git a/doc/Serving_Configure_CN.md b/doc/Serving_Configure_CN.md index 6e28f6ef864eacd49785138311a37f4d34e6eba7..49f45ea1230e7c002efbd9c44b1c408d2f7e0f3d 100644 --- a/doc/Serving_Configure_CN.md +++ b/doc/Serving_Configure_CN.md @@ -12,7 +12,7 @@ ## 模型配置文件 -在开始介绍Server配置之前,先来介绍一下模型配置文件。我们在将模型转换为PaddleServing模型时,会生成对应的serving_client_conf.prototxt以及serving_server_conf.prototxt,两者内容一致,为模型输入输出的参数信息,方便用户拼装参数。该配置文件用于Server以及Client,并不需要用户自行修改。转换方法参考文档《[怎样保存用于Paddle Serving的模型](SAVE_CN.md)》。protobuf格式可参考`core/configure/proto/general_model_config.proto`。 +在开始介绍Server配置之前,先来介绍一下模型配置文件。我们在将模型转换为PaddleServing模型时,会生成对应的serving_client_conf.prototxt以及serving_server_conf.prototxt,两者内容一致,为模型输入输出的参数信息,方便用户拼装参数。该配置文件用于Server以及Client,并不需要用户自行修改。转换方法参考文档《[怎样保存用于Paddle Serving的模型](Save_CN.md)》。protobuf格式可参考`core/configure/proto/general_model_config.proto`。 样例如下: ``` diff --git a/doc/Serving_Configure_EN.md b/doc/Serving_Configure_EN.md index 0a2bd4265dec6d6ca69d665382a611167f9b6b6c..a50db60ef155179a2fd83d05ab4f34947f69d50d 100644 --- a/doc/Serving_Configure_EN.md +++ b/doc/Serving_Configure_EN.md @@ -58,7 +58,7 @@ fetch_var { ## C++ Serving -### 1. Quick start +### 1. Quick start and stop The easiest way to start c++ serving is to provide the `--model` and `--port` flags. @@ -107,6 +107,11 @@ python3 -m paddle_serving_server.serve --model serving_model --thread 10 --port ```BASH python3 -m paddle_serving_server.serve --model serving_model_1 serving_model_2 --thread 10 --port 9292 ``` +#### Stop Serving. +```BASH +python3 -m paddle_serving_server.serve stop +``` +`stop` sends SIGINT to C++ Serving. When setting `kill`, SIGKILL will be sent to C++ Serving ### 2. Starting with user-defined Configuration @@ -316,6 +321,19 @@ fetch_var { ## Python Pipeline +### Quick start and stop + +Example starting Pipeline Serving: +```BASH +python3 -m paddle_serving_server.serve --model serving_model --port 9393 +``` +### Stop Serving. +```BASH +python3 -m paddle_serving_server.serve stop +``` +`stop` sends SIGINT to Pipeline Serving. When setting `kill`, SIGKILL will be sent to Pipeline Serving + +### yml Configuration Python Pipeline provides a user-friendly programming framework for multi-model composite services. Example of config.yaml: @@ -460,4 +478,4 @@ Python Pipeline supports low-precision inference. The precision types supported #GPU support: "fp32"(default), "fp16(TensorRT)", "int8"; #CPU support: "fp32"(default), "fp16", "bf16"(mkldnn); not support: "int8" precision: "fp32" -``` \ No newline at end of file +``` diff --git a/doc/images/qq_group_1.png b/doc/images/qq_group_1.png new file mode 100644 index 0000000000000000000000000000000000000000..7e1fff13b8ef3b81cec84fe3721dcc6ce01bc316 Binary files /dev/null and b/doc/images/qq_group_1.png differ diff --git a/doc/images/wechat_group_1.jpeg b/doc/images/wechat_group_1.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..dd5c55e04d60f271c0d9d7e3bc9ee12ae92ea149 Binary files /dev/null and b/doc/images/wechat_group_1.jpeg differ