Merge pull request #1788 from TeslaZhao/develop

Update cmake & doc

Merge pull request #1788 from TeslaZhao/develop
Update cmake & doc
99297767 · TeslaZhao · GitHub · 50b24542 · 16f66f09 · 99297767
45 changed file
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@
 The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows:


- Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
+- Integrate high-performance server-side inference engine [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) and mobile-side engine [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html). Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
 - There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
 - Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
 - Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, request cache, etc.
@@ -40,13 +40,17 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
 - Support service monitoring, provide prometheus-based performance statistics and port access


-<h2 align="center">Tutorial and Papers</h2>
-
+<h2 align="center">Tutorial and Solutions</h2>

 - AIStudio tutorial(Chinese) : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013)
 - AIStudio OCR practice(Chinese) : [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
 - Video tutorial(Chinese) : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
 - Edge AI solution(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
+- GOVT Q&A Solution(Chinese) : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
+- Smart Q&A Solution(Chinese) : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
+- Semantic Indexing Solution(Chinese) : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
+
+<h2 align="center">Papers</h2>

 - Paper : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
 Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf)
@@ -67,6 +71,7 @@ This chapter guides you through the installation and deployment steps. It is str

 - [Install Paddle Serving using docker](doc/Install_EN.md)
 - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
+- [Install Paddle Serving on linux system](doc/Install_Linux_Env_CN.md)
 - [Deploy Paddle Serving on Kubernetes(Chinese)](doc/Run_On_Kubernetes_CN.md)
 - [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
 - Deploy on more hardwares[[ARM CPU、百度昆仑](doc/Run_On_XPU_EN.md)、[华为昇腾](doc/Run_On_NPU_CN.md)、[海光DCU](doc/Run_On_DCU_CN.md)、[Jetson](doc/Run_On_JETSON_CN.md)]
@@ -93,10 +98,11 @@ The first step is to call the model save interface to generate a model parameter
  - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
  - [Multiple models in series(Chinese)](doc/C++_Serving/2+_model.md)
  - [Request Cache(Chinese)](doc/C++_Serving/Request_Cache_CN.md)
- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
-  - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
-  - [TensorRT dynamic Shape](doc/TensorRT_Dynamic_Shape_EN.md)
-  - [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
+- [Python Pipeline Overview(Chinese)](doc/Python_Pipeline/Pipeline_Int_CN.md)
+  - [Architecture Design(Chinese)](doc/Python_Pipeline/Pipeline_Design_CN.md)
+  - [Core Features(Chinese)](doc/Python_Pipeline/Pipeline_Features_CN.md)
+  - [Performance Optimization(Chinese)](doc/Python_Pipeline/Pipeline_Optimize_CN.md)
+  - [Benchmark(Chinese)](doc/Python_Pipeline/Pipeline_Benchmark_CN.md)
 - Client SDK
  - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
  - [JAVA SDK](doc/Java_SDK_EN.md)

--- a/README_CN.md
+++ b/README_CN.md
@@ -24,27 +24,32 @@

 ***

-Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案，和多种经典预训练模型示例。核心特性如下：
-
- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite，其他机器学习平台（Caffe/TensorFlow/ONNX/PyTorch）可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
- 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md)；提供C++、Python、Java语言SDK
- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架，具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性
- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU、华为昇腾310/910、海光DCU、Nvidia Jetson等多种硬件
- 集成Intel MKLDNN、Nvidia TensorRT加速库，以及低精度和量化推理
- 提供一套模型安全部署解决方案，包括加密模型部署、鉴权校验、HTTPs安全网关，并在实际项目中应用
- 支持云端部署，提供百度云智能云kubernetes集群部署Paddle Serving案例
- 提供丰富的经典模型部署示例，如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件，共计40+个预训练精品模型
- 支持大规模稀疏参数索引模型分布式部署，具有多表、多分片、多副本、本地高频cache等特性、可单机或云端部署
+Paddle Serving 依托深度学习框架 PaddlePaddle 旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案，和多种经典预训练模型示例。核心特性如下：
+
+- 集成高性能服务端推理引擎 [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) 和端侧引擎 [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html)，其他机器学习平台（Caffe/TensorFlow/ONNX/PyTorch）可通过 [x2paddle](https://github.com/PaddlePaddle/X2Paddle) 工具迁移模型
+- 具有高性能 C++ Serving 和高易用 Python Pipeline 2套框架。C++ Serving 基于高性能 bRPC 网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python Pipeline 基于 gRPC/gRPC-Gateway 网络框架和 Python 语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
+- 支持 HTTP、gRPC、bRPC 等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md)；提供 C++、Python、Java 语言 SDK
+- 设计并实现基于有向无环图(DAG) 的异步流水线高性能推理框架，具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性
+- 适配 x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑 XPU、华为昇腾310/910、海光 DCU、Nvidia Jetson 等多种硬件
+- 集成 Intel MKLDNN、Nvidia TensorRT 加速库，以及低精度量化推理
+- 提供一套模型安全部署解决方案，包括加密模型部署、鉴权校验、HTTPs 安全网关，并在实际项目中应用
+- 支持云端部署，提供百度云智能云 kubernetes 集群部署 Paddle Serving 案例
+- 提供丰富的经典模型部署示例，如 PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec 等套件，共计40+个预训练精品模型
+- 支持大规模稀疏参数索引模型分布式部署，具有多表、多分片、多副本、本地高频 cache 等特性、可单机或云端部署
 - 支持服务监控，提供基于普罗米修斯的性能数据统计及端口访问


-<h2 align="center">教程与论文</h2>
+<h2 align="center">教程与案例</h2>

 - AIStudio 使用教程 : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013)
- AIStudio OCR实战 : [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
+- AIStudio OCR 实战 : [基于Paddle Serving的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
 - 视频教程 : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
- 边缘AI 解决方案 : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
+- 边缘 AI 解决方案 : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
+- 政务问答解决方案 : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
+- 智能问答解决方案 : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
+- 语义索引解决方案 : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
+
+<h2 align="center">论文</h2>

 - 论文 : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
 Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf)
@@ -61,13 +66,14 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
 > 部署

 此章节引导您完成安装和部署步骤，强烈推荐使用Docker部署Paddle Serving，如您不使用docker，省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用docker安装Paddle Serving](doc/Install_CN.md)
- [源码编译安装Paddle Serving](doc/Compile_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
+- [使用 Docker 安装 Paddle Serving](doc/Install_CN.md)
+- [Linux 原生系统安装 Paddle Serving](doc/Install_Linux_Env_CN.md)
+- [源码编译安装 Paddle Serving](doc/Compile_CN.md)
+- [Kuberntes集群部署 Paddle Serving](doc/Run_On_Kubernetes_CN.md)
+- [部署 Paddle Serving 安全网关](doc/Serving_Auth_Docker_CN.md)
 - 异构硬件部署[[ARM CPU、百度昆仑](doc/Run_On_XPU_CN.md)、[华为昇腾](doc/Run_On_NPU_CN.md)、[海光DCU](doc/Run_On_DCU_CN.md)、[Jetson](doc/Run_On_JETSON_CN.md)]
- [Docker镜像](doc/Docker_Images_CN.md)
- [下载Wheel包](doc/Latest_Packages_CN.md)
+- [Docker 镜像列表](doc/Docker_Images_CN.md)
+- [下载 Python Wheels](doc/Latest_Packages_CN.md)

 > 使用

@@ -79,7 +85,9 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
 - [低精度推理](doc/Low_Precision_CN.md)
 - [常见模型数据处理](doc/Process_data_CN.md)
 - [普罗米修斯](doc/Prometheus_CN.md)
- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) 
+- [设置 TensorRT 动态shape](doc/TensorRT_Dynamic_Shape_CN.md)
+- [C++ Serving 概述](doc/C++_Serving/Introduction_CN.md)
+  - [异步框架](doc/C++_Serving/Asynchronous_Framwork_CN.md) 
  - [协议](doc/C++_Serving/Inference_Protocols_CN.md)
  - [模型热加载](doc/C++_Serving/Hot_Loading_CN.md)
  - [A/B Test](doc/C++_Serving/ABTest_CN.md)
@@ -88,10 +96,11 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
  - [性能指标](doc/C++_Serving/Benchmark_CN.md)
  - [多模型串联](doc/C++_Serving/2+_model.md)
  - [请求缓存](doc/C++_Serving/Request_Cache_CN.md)
- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
-  - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
-  - [TensorRT动态shape](doc/TensorRT_Dynamic_Shape_CN.md)
-  - [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
+- [Python Pipeline 概述](doc/Python_Pipeline/Pipeline_Int_CN.md)
+  - [框架设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
+  - [核心功能](doc/Python_Pipeline/Pipeline_Features_CN.md)
+  - [性能优化](doc/Python_Pipeline/Pipeline_Optimize_CN.md)
+  - [性能指标](doc/Python_Pipeline/Pipeline_Benchmark_CN.md)
 - 客户端SDK
  - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
  - [JAVA SDK](doc/Java_SDK_CN.md)
@@ -107,13 +116,13 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)

 <h2 align="center">模型库</h2>

-Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例，以及Paddle全链条项目，共计45个模型。
+Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例，以及Paddle全链条项目，共计47个模型。

 <p align="center">

 | PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | Paddle Video |
 | :----:  | :----: | :----: | :----: | :----: | :----: | :----: | 
-| 8 | 12 | 14 | 2 | 3 | 6 | 1 | 
+| 8 | 12 | 14 | 2 | 3 | 7 | 1 | 

 </p>

@@ -147,6 +156,7 @@ Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，
 > 贡献代码

 如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
+- 感谢 [@w5688414](https://github.com/w5688414) 提供 NLP Ernie Indexing 案例
 - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
 - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
 - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程，更新FAQ教程，整理文件目录。

--- a/cmake/paddlepaddle.cmake
+++ b/cmake/paddlepaddle.cmake
@@ -39,7 +39,7 @@ if (WITH_GPU)
        set(WITH_TRT ON)
    elseif(CUDA_VERSION EQUAL 10.2)
        if(CUDNN_MAJOR_VERSION EQUAL 7)
-            set(CUDA_SUFFIX "x86-64_gcc5.4_avx_mkl_cuda10.2_cudnn7.6.5_trt6.0.1.5")
+            set(CUDA_SUFFIX "x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn7.6.5_trt6.0.1.5")
            set(WITH_TRT ON)
        elseif(CUDNN_MAJOR_VERSION EQUAL 8)
            set(CUDA_SUFFIX "x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4")

--- a/doc/C++_Serving/ABTest_CN.md
+++ b/doc/C++_Serving/ABTest_CN.md
-# 如何使用Paddle Serving做ABTEST
+# C++ Serving ABTest

-(简体中文|[English](./ABTest_EN.md))
+- [功能设计](#1)
+- [使用案例](#2)
+  - [1.1 安装 Paddle Serving Wheels](#2.1)
+  - [1.2 下载多个模型并保存模型参数](#2.2)
+  - [1.3 启动 A，B，C 3个服务](#2.3)
+  - [1.4 客户端注册 A，B，C 服务端地址](#2.4)
+  - [1.5 启动客户端并验证结果](#2.5)

-该文档将会用一个基于IMDB数据集的文本分类任务的例子，介绍如何使用Paddle Serving搭建A/B Test框架，例中的Client端、Server端结构如下图所示。
+ABTest 是一种功能测试方案，一般是为同一个产品目标制定多种方案，让一部分用户使用 A 方案，另一部分用户使用 B 或 C 方案，根据测试效果，如点击率、转化率等来评价方案的优劣。

-<img src="../images/abtest.png" style="zoom:33%;" />
+模型服务化部署框架中，ABTest 属于一个重要的基础功能，为模型迭代升级提供实验环境。Paddle Serving 的 PYTHON SDK 中实现 ABTest 功能，为用户提供简单易用功能测试环境。

-需要注意的是：A/B Test只适用于RPC模式，不适用于WEB模式。
+<a name="1"></a>

-### 下载数据以及模型
+## 功能设计

-``` shell
-cd Serving/examples/C++/imdb
-sh get_data.sh
-```
+Paddle Serving 的 ABTest 功能是基于 PYTHON SDK 和 多个服务端构成。每个服务端加载不同模型，在客户端上注册多个服务端地址和访问比例，最终确定访问。
+
+<div align=center>
+<img src='../images/6-5_Cpp_ABTest_CN_1.png' height = "400" align="middle"/>
+</div
+
+<a name="2"></a>

-### 处理数据
-由于处理数据需要用到相关库，请使用pip进行安装
-``` shell
-pip install paddlepaddle
-pip install paddle-serving-app
-pip install Shapely
-````
-您可以直接运行下面的命令来处理数据。
+## 使用案例

-[python abtest_get_data.py](../../examples/C++/imdb/abtest_get_data.py)
+以 [imdb](https://github.com/PaddlePaddle/Serving/tree/develop/examples/C%2B%2B/imdb) 示例为例，介绍 ABTest 的使用，部署有5个步骤：

-文件中的Python代码将处理`test_data/part-0`的数据，并将处理后的数据生成并写入`processed.data`文件中。
+1. 安装 Paddle Serving Wheels
+2. 下载多个模型并保存模型参数
+3. 启动 A，B，C 3个服务
+4. 客户端注册 A，B，C 服务端地址
+5. 启动客户端并验证结果

-### 启动Server端
+<a name="2.1"></a>

-这里采用[Docker方式](../Install_CN.md)启动Server端服务。
+**一.安装 Paddle Serving Wheels**

-首先启动BOW Server，该服务启用`8000`端口：
+使用 ABTest 功能的前提是使用 PYTHON SDK，因此需要安装 `paddle_serving_client` 的 wheel 包。[安装方法](../Docker_Images_CN.md) 如下：

-```bash
-docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
-docker exec -it bow-server /bin/bash
-pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
-pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
-python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
-exit
 ```
+pip3 install paddle-serving-client==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+<a name="2.2"></a>
+
+**二.下载多个模型并保存模型参数**

-同理启动LSTM Server，该服务启用`9000`端口：
+本示例已提供了一键下载脚本 `sh get_data.sh`，下载自训练的模型 `bow`、`cnn`和`lstm` 3种不同方式训练的模型。 

-```bash
-docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
-docker exec -it lstm-server /bin/bash
-pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
-pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
-python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
-exit
+```
+sh get_data.sh
 ```

-### 启动Client端
-为了模拟ABTEST工况，您可以在宿主机运行下面Python代码启动Client端，但需确保宿主机具备相关环境，您也可以在docker环境下运行.
+3种模型的所有文件如下所示，已为用户提前保存模型参数，无需执行保存操作。
+```
+├── imdb_bow_client_conf
+│   ├── serving_client_conf.prototxt
+│   └── serving_client_conf.stream.prototxt
+├── imdb_bow_model
+│   ├── embedding_0.w_0
+│   ├── fc_0.b_0
+│   ├── fc_0.w_0
+│   ├── fc_1.b_0
+│   ├── fc_1.w_0
+│   ├── fc_2.b_0
+│   ├── fc_2.w_0
+│   ├── fluid_time_file
+│   ├── __model__
+│   ├── serving_server_conf.prototxt
+│   └── serving_server_conf.stream.prototxt
+├── imdb_cnn_client_conf
+│   ├── serving_client_conf.prototxt
+│   └── serving_client_conf.stream.prototxt
+├── imdb_cnn_model
+│   ├── embedding_0.w_0
+│   ├── fc_0.b_0
+│   ├── fc_0.w_0
+│   ├── fc_1.b_0
+│   ├── fc_1.w_0
+│   ├── fluid_time_file
+│   ├── __model__
+│   ├── sequence_conv_0.b_0
+│   ├── sequence_conv_0.w_0
+│   ├── serving_server_conf.prototxt
+│   └── serving_server_conf.stream.prototxt
+├── imdb_lstm_client_conf
+│   ├── serving_client_conf.prototxt
+│   └── serving_client_conf.stream.prototxt
+├── imdb_lstm_model
+│   ├── embedding_0.w_0
+│   ├── fc_0.b_0
+│   ├── fc_0.w_0
+│   ├── fc_1.b_0
+│   ├── fc_1.w_0
+│   ├── fc_2.b_0
+│   ├── fc_2.w_0
+│   ├── lstm_0.b_0
+│   ├── lstm_0.w_0
+│   ├── __model__
+│   ├── serving_server_conf.prototxt
+│   └── serving_server_conf.stream.prototxt
+```

-运行前使用`pip install paddle-serving-client`安装paddle-serving-client包。
+虽然3个模型的网络结构不同，但是 `feed var` 和 `fetch_var` 都是相同的便于做 ABTest。
+```
+feed_var {
+  name: "words"
+  alias_name: "words"
+  is_lod_tensor: true
+  feed_type: 0
+  shape: -1
+}
+fetch_var {
+  name: "fc_2.tmp_2"
+  alias_name: "prediction"
+  is_lod_tensor: false
+  fetch_type: 1
+  shape: 2
+}
+```

+<a name="2.3"></a>

-您可以直接使用下面的命令，进行ABTEST预测。
+**三.启动 A，B，C 3个服务**

-[python abtest_client.py](../../examples/C++/imdb/abtest_client.py)
+后台启动 `bow`、`cnn` 和 `lstm` 模型服务:

 ```python
+## 启动 bow 模型服务
+python3 -m paddle_serving_server.serve --model imdb_bow_model/ --port 9297 >/dev/null 2>&1 &
+
+## 启动 cnn 模型服务
+python3 -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9298  >/dev/null 2>&1 &
+
+## 启动 lstm 模型服务
+python3 -m paddle_serving_server.serve --model imdb_lstm_model/ --port 9299 >/dev/null 2>&1 &
+```
+
+<a name="2.4"></a>
+
+**四.客户端注册 A，B，C 服务端地址**
+
+使用 `paddle_serving_client` 中 `Client::add_variant(self, tag, cluster, variant_weight)` 接口注册服务标签、服务地址和权重。框架会将所有权重求和后计算每个服务的比例。本示例中，bow 服务的权重是10，cnn 服务的权重是30, lstm的权重是60，每次请求分别请求到3个服务的比例是10%、30%和60%。
+
+```
 from paddle_serving_client import Client
+from paddle_serving_app.reader.imdb_reader import IMDBDataset
+import sys
 import numpy as np

 client = Client()
-client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
-client.add_variant("bow", ["127.0.0.1:8000"], 10)
-client.add_variant("lstm", ["127.0.0.1:9000"], 90)
+client.load_client_config(sys.argv[1])
+client.add_variant("bow", ["127.0.0.1:9297"], 10)
+client.add_variant("cnn", ["127.0.0.1:9298"], 30)
+client.add_variant("lstm", ["127.0.0.1:9299"], 60)
 client.connect()
+```
+如要在结果中打印请求到了哪个服务，在 `client.predict(feed, fetch, batch, need_variant_tag, logid)` 中设置 `need_variant_tag=True`。
+
+<a name="2.5"></a>

-print('please wait for about 10s')
-with open('processed.data') as f:
-    cnt = {"bow": {'acc': 0, 'total': 0}, "lstm": {'acc': 0, 'total': 0}}
-    for line in f:
-        word_ids, label = line.split(';')
-        word_ids = [int(x) for x in word_ids.split(',')]
-        word_len = len(word_ids)
-        feed = {
-            "words": np.array(word_ids).reshape(word_len, 1),
-            "words.lod": [0, word_len]
-        }
-        fetch = ["acc", "cost", "prediction"]
-        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True,batch=True)
-        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
-            cnt[tag]['acc'] += 1
-        cnt[tag]['total'] += 1
-
-    for tag, data in cnt.items():
-        print('[{}]<total: {}> acc: {}'.format(tag, data['total'], float(data['acc'])/float(data['total']) ))
+**五.启动客户端并验证结果**
+
+运行命令:
+```
+head test_data/part-0 | python3.7 abtest_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab
 ```
-代码中，`client.add_variant(tag, clusters, variant_weight)`是为了添加一个标签为`tag`、流量权重为`variant_weight`的variant。在这个样例中，添加了一个标签为`bow`、流量权重为`10`的BOW variant，以及一个标签为`lstm`、流量权重为`90`的LSTM variant。Client端的流量会根据`10:90`的比例分发到两个variant。

-Client端做预测时，若指定参数`need_variant_tag=True`，返回值则包含分发流量对应的variant标签。
+运行结果如下，10次请求中，bow 服务2次，cnn 服务3次，lstm 服务5次，与设置的比例基本相近。
+```
+I0506 04:02:46.720135 44567 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9297"): added 1
+I0506 04:02:46.722630 44567 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9298"): added 1
+I0506 04:02:46.723577 44567 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9299"): added 1
+I0506 04:02:46.814075 44567 general_model.cpp:490] [client]logid=0,client_cost=9.889ms,server_cost=6.283ms.
+server_tag=lstm prediction=[0.500398   0.49960205]
+I0506 04:02:46.826339 44567 general_model.cpp:490] [client]logid=0,client_cost=10.261ms,server_cost=9.503ms.
+server_tag=lstm prediction=[0.5007235  0.49927652]
+I0506 04:02:46.828992 44567 general_model.cpp:490] [client]logid=0,client_cost=1.667ms,server_cost=0.741ms.
+server_tag=bow prediction=[0.25859657 0.74140346]
+I0506 04:02:46.843299 44567 general_model.cpp:490] [client]logid=0,client_cost=13.402ms,server_cost=12.827ms.
+server_tag=lstm prediction=[0.50039905 0.4996009 ]
+I0506 04:02:46.850219 44567 general_model.cpp:490] [client]logid=0,client_cost=5.129ms,server_cost=4.332ms.
+server_tag=cnn prediction=[0.6369219  0.36307803]
+I0506 04:02:46.854203 44567 general_model.cpp:490] [client]logid=0,client_cost=2.804ms,server_cost=0.782ms.
+server_tag=bow prediction=[0.15088597 0.849114  ]
+I0506 04:02:46.858268 44567 general_model.cpp:490] [client]logid=0,client_cost=3.292ms,server_cost=2.677ms.
+server_tag=cnn prediction=[0.4608788 0.5391212]
+I0506 04:02:46.869217 44567 general_model.cpp:490] [client]logid=0,client_cost=10.13ms,server_cost=9.556ms.
+server_tag=lstm prediction=[0.5000269  0.49997318]
+I0506 04:02:46.883790 44567 general_model.cpp:490] [client]logid=0,client_cost=13.312ms,server_cost=12.822ms.
+server_tag=lstm prediction=[0.50083774 0.49916226]
+I0506 04:02:46.887256 44567 general_model.cpp:490] [client]logid=0,client_cost=2.432ms,server_cost=1.812ms.
+server_tag=cnn prediction=[0.47895813 0.52104187]

-### 预期结果
-由于网络情况的不同，可能每次预测的结果略有差异。
-``` bash
-[lstm]<total: 1867> acc: 0.490091055169
-[bow]<total: 217> acc: 0.73732718894
 ```
--- a/doc/C++_Serving/Asynchronous_Framwork_CN.md
+++ b/doc/C++_Serving/Asynchronous_Framwork_CN.md
+# C++ Serving 异步模式
+
+- [设计方案](#1)
+    - [网络同步线程](#1.1)
+    - [异步调度线程](#1.2)
+    - [动态批量](#1.3)
+- [使用案例](#2)
+    - [开启同步模式](#2.1)
+    - [开启异步模式](#2.2)
+- [性能测试](#3)
+    - [测试结果](#3.1)
+    - [测试数据](#3.2)
+
+<a name="1"></a>
+
+## 设计方案
+
+<a name="1.1"></a>
+
+**一.同步网络线程**
+
+Paddle Serving 的网络框架层面是同步处理模式，即 bRPC 网络处理线程从系统内核拿到完整请求数据后( epoll 模式)，在同一线程内完成业务处理，C++ Serving 默认使用同步模式。同步模式比较简单直接，适用于模型预测时间短，或单个 Request 请求批量较大的情况。
+
+<p align="center">
+<img src='../images/syn_mode.png' width = "350" height = "300">
+<p>
+
+Server 端线程数 N = 模型预测引擎数 N = 同时处理 Request 请求数 N，超发的 Request 请求需要等待当前线程处理结束后才能得到响应和处理。
+
+<a name="1.2"></a>
+
+**二.异步调度线程**
+
+为了提高计算芯片吞吐和计算资源利用率，C++ Serving 在调度层实现异步多线程并发合并请求，实现动态批量推理。异步模型主要适用于模型支持批量，单个 Request 请求的无批量或较小，单次预测时间较长的情况。
+
+<p align="center">
+<img src='../images/asyn_mode.png'>
+<p>
+
+异步模式下，Server 端 N 个线程只负责接收 Request 请求，实际调用预测引擎是在异步框架的线程池中，异步框架的线程数可以由配置选项来指定。为了方便理解，我们假设每个 Request 请求批量均为1，此时异步框架会尽可能多得从请求池中取 n(n≤M)个 Request 并将其拼装为1个 Request(batch=n)，调用1次预测引擎，得到1个 Response(batch = n)，再将其对应拆分为 n 个 Response 作为返回结果。
+
+<a name="1.3"></a>
+
+**三.动态批量** 
+
+通常，异步框架合并多个请求的前提是所有请求的 `feed var` 的维度除 batch 维度外必须是相同的。例如，以 OCR 文字识别案例中检测模型为例，A 请求的 `x` 变量的 shape 是 [1, 3, 960, 960]，B 请求的 `x` 变量的 shape 是 [2, 3, 960, 960]，虽然第一个维度值不相同，但第一个维度属于 `batch` 维度，因此，请求 A 和 请求 B 可以合并。C 请求的 `x` 变量的 shape 是 [1, 3, 640, 480]，由于除了 `batch` 维度外还有2个维度值不同，A 和 C 不能直接合并。
+
+从经验来看，当2个请求的同一个变量 shape 维度的数量相等时，通过 `padding` 补0的方式按最大 shape 值对齐即可。即 C 请求的 shape 补齐到 [1, 3, 960, 960]，那么就可以与 A 和 B 请求合并了。Paddle Serving 框架实现了动态 Padding 功能补齐 shape。
+
+当多个将要合并的请求中有一个 shape 值很大时，所有请求的 shape 都要按最大补齐，导致计算量成倍增长。Paddle Serving 设计了一套合并策略，满足任何一个条件均可合并:
+
+- 条件 1：绝对值差的字节数小于 **1024** 字节，评估补齐绝对长度
+- 条件 2：相似度的乘积大于 **50%**，评估相似度，评估补齐绝对值整体数据量比例
+  
+场景1：`Shape-1 = [batch, 500, 500], Shape-2 = [batch, 400, 400]`。此时，`绝对值差 = 500*500 - 400*400 = 90000` 字节，`相对误差= (400/500) * (400/500) = 0.8*0.8 = 0.64`，满足条件1，不满足条件2，触发动态 Padding。
+
+场景2：`Shape-1 = [batch, 1, 1], Shape-2 = [batch, 2, 2]`。此时，`绝对值差 = 2*2 - 1*1 = 3`字节，`相对误差 = (1/2) * (1/2) = 0.5*0.5 = 0.25`，满足条件2，不满足条件1，触发动态 Padding。
+
+场景3：`Shape-1 = [batch, 3, 320, 320], Shape-2 = [batch, 3, 960, 960]`。此时，`绝对值差 = 3*960*960 - 3*320*320 = 2457600`字节，`相对误差 = (3/3) * (320/960) * (320/960) = 0.3*0.3 = 0.09`，条件1和条件2均不满足，未触发动态 Padding。
+
+<a name="2"></a>
+
+## 使用案例
+
+<a name="2.1"></a>
+
+**一.开启同步模式**
+
+启动命令不使用 `--runtime_thread_num` 和 `--batch_infer_size` 时，属于同步处理模式，未开启异步模式。`--thread 16` 表示启动16个同步网络处理线程。
+```
+python3 -m paddle_serving_server.serve --model uci_housing_model --thread 16 --port 9292 
+```
+
+<a name="2.2"></a>
+
+**二.开启异步模式**
+
+启动命令使用 `--runtime_thread_num 2` 和  `--batch_infer_size 32` 开启异步模式，Serving 框架会启动2个异步线程，单次合并最大批量为32，自动开启动态 Padding。 
+```
+python3 -m paddle_serving_server.serve --model uci_housing_model --thread 16 --port 9292 --runtime_thread_num 4 --batch_infer_size 32 --ir_optim --gpu_multi_stream --gpu_ids 0
+```
+
+<a name="3"></a>
+
+## 性能测试
+
+
+- GPU：Tesla P4 7611 MiB
+- CUDA：cuda11.2-cudnn8-trt8
+- Python 版本：python3.7
+- 模型：ResNet_v2_50
+- 测试数据：构造全1输入，单client请求100次，shape 范围(1, 224 ± 50, 224 ± 50)
+
+同步模式启动命令:
+```
+python3 -m paddle_serving_server.serve --model resnet_v2_50_imagenet_model --port 9393 --thread 8 --ir_optim --gpu_multi_stream --gpu_ids 1 --enable_prometheus --prometheus_port 1939
+```
+
+异步模式启动命令:
+```
+python3 -m paddle_serving_server.serve --model resnet_v2_50_imagenet_model --port 9393 --thread 64 --runtime_thread_num 8 --ir_optim --gpu_multi_stream --gpu_ids 1 --enable_prometheus --prometheus_port 19393
+```
+
+<a name="3.1"></a>
+
+**一.测试结果**
+
+使用异步模式，并开启动态批量后，并发测试不同 shape 数据时，吞吐性能大幅提升。
+<div align=center>
+<img src='../images/6-1_Cpp_Asynchronous_Framwork_CN_1.png' height = "600" align="middle"/>
+</div
+
+由于动态批量导致响应时长增长，经过测试，大多数场景下吞吐增量高于响应时长增长，尤其在高并发场景(client=70时)，在响应时长增长 33% 情况下，吞吐增加 105%。
+
+|Client |1 |5 |10 | 20 |30 |40 |50 |70 |
+|---|---|---|---|---|---|---|---|---|
+|QPS |-2.08% |-7.23% |-1.89% |20.55% |23.02% |23.34% |46.41% |105.27% |
+|响应时长 | 2.70% |7.09% |5.24% |13.34% |10.80% |43.60% |8.72% |33.89% |
+
+异步模式可有效提升服务吞吐性能。
+
+<a name="3.2"></a>
+
+**二.测试数据**
+
+1. 同步模式
+
+| client_num | batch_size |CPU_util_pre(%) |CPU_util(%) |GPU_memory(mb) |GPU_util(%) |qps(samples/s) |total count |mean(ms) |median(ms) |80 percent(ms) |90 percent(ms) |99 percent(ms) |total cost(s) |each cost(s)|infer_count_total|infer_cost_total(ms)|infer_cost_avg(ms)|
+|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| 1 |1 |1.30 |18.90 |2066 |71.56 |22.938 |100 |43.594 |23.516 |78.118 |78.323 |133.544 |4.4262 |4.3596 |7100.0000 |1666392.70 | 41.1081 |
+| 5 |1 |2.00 |28.20 |3668 |92.57 |33.630 |500 |148.673 |39.531 |373.231 |396.306 |419.088 |15.0606 |14.8676 |7600.0000 |1739372.7480| 145.9601 |
+|10 |1 |1.90 |29.80 |4202 |91.98 |34.303 |1000 |291.512 |76.728 |613.963 |632.736 |1217.863 |29.8004 |29.1516 |8600.0000 |1974147.7420| 234.7750 |
+|20 |1 |4.70 |49.60 |4736 |92.63 |34.359 |2000 |582.089 |154.952 |1239.115 |1813.371 |1858.128 |59.7303 |58.2093 |12100.0000 |2798459.6330 |235.6248 |
+|30 |1 |5.70 |65.70 |4736 |92.60 |34.162 |3000 |878.164 |231.121 |2391.687 |2442.744 |2499.963 |89.6546 |87.8168 |17600.0000 |4100408.9560 |236.6877 |
+|40 |1 |5.40 |74.40 |5270 |92.44 |34.090 |4000 |1173.373 |306.244 |3037.038 |3070.198 |3134.894 |119.4162 |117.3377 |21600.0000 |5048139.2170 |236.9326|
+|50 |1 |1.40 |64.70 |5270 |92.37 |34.031 |5000 |1469.250 |384.327 |3676.812 |3784.330 |4366.862 |149.7041 |146.9254 |26600.0000 |6236269.4230 |237.6260|
+|70 |1 |3.70 |79.70 |5270 |91.89 |33.976 |7000 |2060.246 |533.439 |5429.255 |5552.704 |5661.492 |210.1008 |206.0250 |33600.0000 |7905005.9940 |238.3909|
+
+
+2. 异步模式 - 未开启动态批量
+
+| client_num | batch_size |CPU_util_pre(%) |CPU_util(%) |GPU_memory(mb) |GPU_util(%) |qps(samples/s) |total count |mean(ms) |median(ms) |80 percent(ms) |90 percent(ms) |99 percent(ms) |total cost(s) |each cost(s)|infer_count_total|infer_cost_total(ms)|infer_cost_avg(ms)|
+|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| 1 |1 |6.20 |13.60 |5170 |71.11 |22.894 |100 |43.677 |23.992 |78.285 |78.788 |123.542 |4.4253 |4.3679 |3695.0000 |745061.9120 |40.6655 |
+| 5 |1 |6.10 |32.20 |7306 |89.54 |33.532 |500 |149.109 |43.906 |376.889 |401.999 |422.753 |15.1623 |14.9113 |4184.0000 |816834.2250 |146.7736|
+|10 |1 |4.90 |43.60 |7306 |91.55 |38.136 |1000 |262.216 |75.393 |575.788 |632.016 |1247.775 |27.1019 |26.2220 |5107.0000 |1026490.3950 |227.1464|
+|20 |1 |5.70 |39.60 |7306 |91.36 |58.601 |2000 |341.287 |145.774 |646.824 |994.748 |1132.979 |38.3915 |34.1291 |7461.0000 |1555234.6260 |229.9113|
+|30 |1 |1.30 |45.40 |7484 |91.10 |69.008 |3000 |434.728 |204.347 |959.184 |1092.181 |1661.289 |46.3822 |43.4732 |10289.0000 |2269499.9730 |249.4257|
+|40 |1 |3.10 |73.00 |7562 |91.83 |80.956 |4000 |494.091 |272.889 |966.072 |1310.011 |1851.887 |52.0609 |49.4095 |12102.0000 |2678878.2010 |225.8016|
+|50 |1 |0.80 |68.00 |7522 |91.10 |83.018 |5000 |602.276 |364.064 |1058.261 |1473.051 |1671.025 |72.9869 |60.2280 |14225.0000 |3256628.2820 |272.1385|
+|70 |1 |6.10 |78.40 |7584 |92.02 |65.069 |7000 |1075.777 |474.014 |2411.296 |2705.863 |3409.085 |111.6653 |107.5781 |17974.0000 |4139377.4050 |235.4626
+
+
+
+3. 异步模式 - 开启动态批量
+
+
+| client_num | batch_size |CPU_util_pre(%) |CPU_util(%) |GPU_memory(mb) |GPU_util(%) |qps(samples/s) |total count |mean(ms) |median(ms) |80 percent(ms) |90 percent(ms) |99 percent(ms) |total cost(s) |each cost(s)|infer_count_total|infer_cost_total(ms)|infer_cost_avg(ms)|
+|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+| 1 |1 |1.20 |13.30 |6048 |70.07 |22.417 |100 |44.606 |24.486 |78.365 |78.707 |139.349 |4.5201 |4.4608 |1569.0000 |462418.6390 |41.7646 |
+| 5 |1 |1.20 |50.80 |7116 |87.37 |31.106 |500 |160.740 |42.506 |414.903 |458.841 |481.112 |16.3525 |16.0743 |2059.0000 |539439.3300 |157.1851
+|10 |1 |0.80 |26.20 |7264 |88.74 |37.417 |1000 |267.254 |79.452 |604.451 |686.477 |1345.528 |27.9848 |26.7258 |2950.0000 |752428.0570 |239.0446|
+|20 |1 |1.50 |32.80 |7264 |89.52 |70.641 |2000 |283.117 |133.441 |516.066 |652.089 |1274.957 |33.0280 |28.3121 |4805.0000 |1210814.5610 |260.5873|
+|30 |1 |0.90 |59.10 |7348 |89.57 |84.894 |3000 |353.380 |217.385 |613.587 |757.829 |1277.283 |40.7093 |35.3384 |6924.0000 |1817515.1710 |276.3695|
+|40 |1 |1.30 |57.30 |7356 |89.30 |99.853 |4000 |400.584 |204.425 |666.015 |1031.186 |1380.650 |49.4807 |40.0588 |8104.0000 |2200137.0060 |324.2558|
+|50 |1 |1.50 |50.60 |7578 |89.04 |121.545 |5000 |411.364 |331.118 |605.809 |874.543 |1285.650 |48.2343 |41.1369 |9350.0000 |2568777.6400 |295.8593|
+|70 |1 |3.80 |83.20 |7602 |89.59 |133.568 |7000 |524.073 |382.653 |799.463 |1202.179 |1576.809 |57.2885 |52.4077 |10761.0000 |3013600.9670 |315.2540|
+
+
+
--- a/doc/C++_Serving/Encryption_CN.md
+++ b/doc/C++_Serving/Encryption_CN.md
 # 加密模型预测

-(简体中文|[English](./Encryption_EN.md))
-
-Padle Serving提供了模型加密预测功能，本文档显示了详细信息。
+Padle Serving 提供了模型加密预测功能，本文档显示了详细信息。

 ## 原理

 采用对称加密算法对模型进行加密。对称加密算法采用同一密钥进行加解密，它计算量小，速度快，是最常用的加密方法。

-### 获得加密模型
+**一. 获得加密模型：**

 普通的模型和参数可以理解为一个字符串，通过对其使用加密算法（参数是您的密钥），普通模型和参数就变成了一个加密的模型和参数。

 我们提供了一个简单的演示来加密模型。请参阅[examples/C++/encryption/encrypt.py](../../examples/C++/encryption/encrypt.py)。


-### 启动加密服务
+**二. 启动加密服务：**

 假设您已经有一个已经加密的模型（在`encrypt_server/`路径下）,您可以通过添加一个额外的命令行参数 `--use_encryption_model`来启动加密模型服务。

@@ -30,7 +28,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_

 此时，服务器不会真正启动，而是等待密钥。

-### Client Encryption Inference
+**三. Client Encryption Inference：**

 首先，您必须拥有模型加密过程中使用的密钥。

@@ -39,5 +37,6 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_
 一旦服务器获得密钥，它就使用该密钥解析模型并启动模型预测服务。


-### 模型加密推理示例
+**四. 模型加密推理示例：**
+
 模型加密推理示例, 请参见[examples/C++/encryption/](../../examples/C++/encryption/)。
--- a/doc/C++_Serving/Hot_Loading_CN.md
+++ b/doc/C++_Serving/Hot_Loading_CN.md
-# Paddle Serving中的模型热加载
-
-(简体中文|[English](./Hot_Loading_EN.md))
+# Paddle Serving 中的模型热加载

 ## 背景

@@ -8,35 +6,35 @@

 ## Server Monitor

-Paddle Serving提供了一个自动监控脚本，远端地址更新模型后会拉取新模型更新本地模型，同时更新本地模型文件夹中的时间戳文件`fluid_time_stamp`实现热加载。
+Paddle Serving 提供了一个自动监控脚本，远端地址更新模型后会拉取新模型更新本地模型，同时更新本地模型文件夹中的时间戳文件 `fluid_time_stamp` 实现热加载。

-目前支持下面几种类型的远端监控Monitor：
+目前支持下面几种类型的远端监控 Monitor：

 | Monitor类型 |                             描述                             |                           特殊选项                           |
 | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
-|   general   | 远端无认证，可以通过`wget`直接访问下载文件（如无需认证的FTP，BOS等） |                 `general_host` 通用远端host                  |
-|  hdfs/afs(HadoopMonitor)   |        远端为HDFS或AFS，通过Hadoop-Client执行相关命令        | `hadoop_bin` Hadoop二进制的路径<br/>`fs_name` Hadoop fs_name，默认为空<br/>`fs_ugi` Hadoop fs_ugi，默认为空 |
-|     ftp     | 远端为FTP，通过`ftplib`进行相关访问（使用该Monitor，您可能需要执行`pip install ftplib`下载`ftplib`） | `ftp_host` FTP host<br>`ftp_port` FTP port<br>`ftp_username` FTP username，默认为空<br>`ftp_password` FTP password，默认为空 |
+|   general   | 远端无认证，可以通过 `wget` 直接访问下载文件（如无需认证的FTP，BOS等） |                 `general_host` 通用远端host                  |
+|  hdfs/afs(HadoopMonitor)   |        远端为 HDFS 或 AFS，通过 Hadoop-Client 执行相关命令        | `hadoop_bin` Hadoop 二进制的路径 <br/>`fs_name` Hadoop fs_name，默认为空<br/>`fs_ugi` Hadoop fs_ugi，默认为空 |
+|     ftp     | 远端为 FTP，通过 `ftplib` 进行相关访问（使用该 Monitor，您可能需要执行 `pip install ftplib` 下载 `ftplib`） | `ftp_host` FTP host<br>`ftp_port` FTP port<br>`ftp_username` FTP username，默认为空<br>`ftp_password` FTP password，默认为空 |

 |    Monitor通用选项     |                             描述                             |         默认值         |
 | :--------------------: | :----------------------------------------------------------: | :--------------------: |
-|         `type`         |                       指定Monitor类型                        |           无           |
+|         `type`         |                       指定 Monitor 类型                        |           无           |
 |     `remote_path`      |                      指定远端的基础路径                      |           无           |
 |  `remote_model_name`   |                   指定远端需要拉取的模型名                   |           无           |
-| `remote_donefile_name` |           指定远端标志模型更新完毕的donefile文件名           |           无           |
+| `remote_donefile_name` |           指定远端标志模型更新完毕的 donefile 文件名           |           无           |
 |      `local_path`      |                       指定本地工作路径                       |           无           |
 |   `local_model_name`   |                        指定本地模型名                        |           无           |
-| `local_timestamp_file` | 指定本地用于热加载的时间戳文件，该文件被认为在`local_path/local_model_name`下。 |   `fluid_time_file`    |
+| `local_timestamp_file` | 指定本地用于热加载的时间戳文件，该文件被认为在 `local_path/local_model_name` 下。 |   `fluid_time_file`    |
 |    `local_tmp_path`    |    指定本地存放临时文件的文件夹路径，若不存在则自动创建。    | `_serving_monitor_tmp` |
 |       `interval`       |                 指定轮询间隔时间，单位为秒。                 |          `10`          |
-|  `unpacked_filename`   | Monitor支持tarfile打包的远程模型。如果远程模型是打包格式，则需要设置该选项来告知Monitor解压后的文件名。 |         `None`         |
-|        `debug`         |       如果添加`--debug`选项，则输出更详细的中间信息。        |    默认不添加该选项    |
+|  `unpacked_filename`   | Monitor 支持 tarfile 打包的远程模型。如果远程模型是打包格式，则需要设置该选项来告知 Monitor 解压后的文件名。 |         `None`         |
+|        `debug`         |       如果添加 `--debug` 选项，则输出更详细的中间信息。        |    默认不添加该选项    |

-下面通过HadoopMonitor示例来展示Paddle Serving的模型热加载功能。
+下面通过 HadoopMonitor 示例来展示 Paddle Serving 的模型热加载功能。

-## HadoopMonitor示例
+## HadoopMonitor 示例

-示例中在`product_path`中生产模型上传至hdfs，在`server_path`中模拟服务端模型热加载：
+示例中在 `product_path` 中生产模型上传至 hdfs，在 `server_path` 中模拟服务端模型热加载：

 ```shell
 .
@@ -44,9 +42,9 @@ Paddle Serving提供了一个自动监控脚本，远端地址更新模型后会
 └── server_path
 ```

-### 生产模型
+**一.生产模型**

-在`product_path`下运行下面的Python代码生产模型（运行前需要修改hadoop相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
+在 `product_path` 下运行下面的 Python 代码生产模型（运行前需要修改 hadoop 相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件 `uci_housing.tar.gz` 并上传至 hdfs 的`/`路径下，上传完毕后更新时间戳文件 `donefile` 并上传至 hdfs 的`/`路径下。

 ```python
 import os
@@ -121,7 +119,7 @@ for pass_id in range(30):
    push_to_hdfs(donefile_name, '/')
 ```

-hdfs上的文件如下列所示：
+hdfs 上的文件如下列所示：

 ```bash
 # hadoop fs -ls /
@@ -130,11 +128,11 @@ Found 2 items
 -rw-r--r--   1 root supergroup       2101 2020-04-02 02:54 /uci_housing.tar.gz
 ```

-### 服务端加载模型
+**二.服务端加载模型**

-进入`server_path`文件夹。
+进入 `server_path` 文件夹。

-#### 用初始模型启动Server端
+1. 用初始模型启动 Server 端

 这里使用预训练的 Boston 房价预测模型作为初始模型：

@@ -143,15 +141,15 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar
 tar -xzf uci_housing.tar.gz
 ```

-启动Server端：
+启动 Server 端：

 ```shell
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```

-#### 执行监控程序
+2. 执行监控程序

-用下面的命令来执行HDFS监控程序：
+用下面的命令来执行 HDFS 监控程序：

 ```shell
 python -m paddle_serving_server.monitor \
@@ -162,7 +160,7 @@ python -m paddle_serving_server.monitor \
 	--local_tmp_path='_tmp' --unpacked_filename='uci_housing_model' --debug
 ```

-上面代码通过轮询方式监控远程HDFS地址`/`的时间戳文件`/donefile`，当时间戳变更则认为远程模型已经更新，将远程打包模型`/uci_housing.tar.gz`拉取到本地临时路径`./_tmp/uci_housing.tar.gz`下，解包出模型文件`./_tmp/uci_housing_model`后，更新本地模型`./uci_housing_model`以及Paddle Serving的时间戳文件`./uci_housing_model/fluid_time_file`。
+上面代码通过轮询方式监控远程 HDFS 地址`/`的时间戳文件`/donefile`，当时间戳变更则认为远程模型已经更新，将远程打包模型`/uci_housing.tar.gz`拉取到本地临时路径`./_tmp/uci_housing.tar.gz`下，解包出模型文件`./_tmp/uci_housing_model`后，更新本地模型`./uci_housing_model`以及Paddle Serving的时间戳文件`./uci_housing_model/fluid_time_file`。

 预计输出如下：

@@ -197,9 +195,9 @@ python -m paddle_serving_server.monitor \
 2020-04-02 10:12 INFO     [monitor.py:161] sleep 10s.
 ```

-#### 查看Server日志
+3. 查看 Server 日志

-通过下面命令查看Server的运行日志：
+通过下面命令查看 Server 的运行日志：

 ```shell
 tail -f log/serving.INFO

--- a/doc/C++_Serving/Inference_Protocols_CN.md
+++ b/doc/C++_Serving/Inference_Protocols_CN.md
 # Inference Protocols

-C++ Serving基于BRPC进行服务构建，支持BRPC、GRPC、RESTful请求。请求数据为protobuf格式，详见`core/general-server/proto/general_model_service.proto`。本文介绍构建请求以及解析结果的方法。
+C++ Serving 基于 BRPC 进行服务构建，支持 BRPC、GRPC、RESTful 请求。请求数据为 protobuf 格式，详见 `core/general-server/proto/general_model_service.proto`。本文介绍构建请求以及解析结果的方法。

 ## Tensor

-Tensor可以装载多种类型的数据，是Request和Response的基础单元。Tensor的具体定义如下：
+**一.Tensor 定义**
+
+Tensor 可以装载多种类型的数据，是 Request 和 Response 的基础单元。Tensor 的具体定义如下：

 ```protobuf
 message Tensor {
@@ -71,7 +73,7 @@ message Tensor {
 };
 ```

- elem_type：数据类型，当前支持FLOAT32, INT64, INT32, UINT8, INT8, FLOAT16
+- elem_type：数据类型，当前支持 FLOAT32, INT64, INT32, UINT8, INT8, FLOAT16

 |elem_type|类型|
 |---------|----|
@@ -86,10 +88,12 @@ message Tensor {
 |8|INT8|

 - shape：数据维度
- lod：lod信息，LoD(Level-of-Detail) Tensor是Paddle的高级特性，是对Tensor的一种扩充，用于支持更自由的数据输入。详见[LOD](../LOD_CN.md)
+- lod：lod 信息，LoD(Level-of-Detail) Tensor 是 Paddle 的高级特性，是对 Tensor 的一种扩充，用于支持更自由的数据输入。Lod 相关原理介绍，请参考[相关文档](../LOD_CN.md)
 - name/alias_name: 名称及别名，与模型配置对应

-### 构建FLOAT32数据Tensor
+**二.构建 Tensor 数据**
+
+1. FLOAT32 类型 Tensor

 ```C
 // 原始数据
@@ -99,7 +103,7 @@ Tensor *tensor = new Tensor;
 for (uint32_t j = 0; j < float_shape.size(); ++j) {
  tensor->add_shape(float_shape[j]);
 }
-// 设置LOD信息
+// 设置 LOD 信息
 for (uint32_t j = 0; j < float_lod.size(); ++j) {
  tensor->add_lod(float_lod[j]);
 }
@@ -113,7 +117,7 @@ tensor->mutable_float_data()->Resize(total_number, 0);
 memcpy(tensor->mutable_float_data()->mutable_data(), float_datadata(), total_number * sizeof(float));
 ```

-### 构建INT8数据Tensor
+2. INT8 类型 Tensor

 ```C
 // 原始数据
@@ -133,7 +137,9 @@ tensor->set_tensor_content(string_data);

 ## Request

-Request为客户端需要发送的请求数据，其以Tensor为基础数据单元，并包含了额外的请求信息。定义如下：
+**一.Request 定义**
+
+Request 为客户端需要发送的请求数据，其以 Tensor 为基础数据单元，并包含了额外的请求信息。定义如下：

 ```protobuf
 message Request {
@@ -148,9 +154,11 @@ message Request {
 - profile_server: 调试参数，打开时会输出性能信息
 - log_id: 请求ID

-### 构建Request
+**二.构建 Request**

-当使用BRPC或GRPC进行请求时，使用protobuf形式数据，构建方式如下：
+1. Protobuf 形式
+
+当使用 BRPC 或 GRPC 进行请求时，使用 protobuf 形式数据，构建方式如下：

 ```C
 Request req;
@@ -162,16 +170,19 @@ for (auto &name : fetch_name) {
 Tensor *tensor = req.add_tensor();
 ...
 ```
+2. Json 形式

-当使用RESTful请求时，可以使用JSON形式数据，具体格式如下：
+当使用 RESTful 请求时，可以使用 Json 形式数据，具体格式如下：

-```JSON
+```Json
 {"tensor":[{"float_data":[0.0137,-0.1136,0.2553,-0.0692,0.0582,-0.0727,-0.1583,-0.0584,0.6283,0.4919,0.1856,0.0795,-0.0332],"elem_type":1,"name":"x","alias_name":"x","shape":[1,13]}],"fetch_var_names":["price"],"log_id":0}
 ```

 ## Response

-Response为服务端返回给客户端的结果，包含了Tensor数据、错误码、错误信息等。定义如下：
+**一.Response 定义**
+
+Response 为服务端返回给客户端的结果，包含了 Tensor 数据、错误码、错误信息等。定义如下：

 ```protobuf
 message Response {
@@ -190,7 +201,7 @@ message ModelOutput {
 }
 ```

- profile_time：当设置request->set_profile_server(true)时，会返回性能信息
+- profile_time：当设置 request->set_profile_server(true) 时，会返回性能信息
 - err_no：错误码，详见`core/predictor/common/constant.h`
 - err_msg：错误信息，详见`core/predictor/common/constant.h`
 - engine_name：输出节点名称
@@ -203,19 +214,19 @@ message ModelOutput {
 |-5002|"Paddle Serving Array Overflow Error."|
 |-5100|"Paddle Serving Op Inference Error."|

-### 读取Response数据
+**二.读取 Response 数据**

 ```C
 uint32_t model_num = res.outputs_size();
 for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
  std::string engine_name = output.engine_name();
  int idx = 0;
-  // 读取tensor维度
+  // 读取 tensor 维度
  int shape_size = output.tensor(idx).shape_size();
  for (int i = 0; i < shape_size; ++i) {
    shape[i] = output.tensor(idx).shape(i);
  }
-  // 读取LOD信息
+  // 读取 LOD 信息
  int lod_size = output.tensor(idx).lod_size();
  if (lod_size > 0) {
    lod.resize(lod_size);
@@ -223,12 +234,12 @@ for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
      lod[i] = output.tensor(idx).lod(i);
    }
  }
-  // 读取float数据
+  // 读取 float 数据
  int size = output.tensor(idx).float_data_size();
  float_data = std::vector<float>(
      output.tensor(idx).float_data().begin(),
      output.tensor(idx).float_data().begin() + size);
-  // 读取int8数据
+  // 读取 int8 数据
  string_data = output.tensor(idx).tensor_content();
 }
-```
\ No newline at end of file
+```
--- a/doc/C++_Serving/Model_Ensemble_CN.md
+++ b/doc/C++_Serving/Model_Ensemble_CN.md
-# Paddle Serving中的集成预测
-
-(简体中文|[English](./Model_Ensemble_EN.md))
-
-在一些场景中，可能使用多个相同输入的模型并行集成预测以获得更好的预测效果，Paddle Serving提供了这项功能。
-
-下面将以文本分类任务为例，来展示Paddle Serving的集成预测功能（暂时还是串行预测，我们会尽快支持并行化）。
-
-## 集成预测样例
-
-该样例中（见下图），Server端在一项服务中并行预测相同输入的BOW和CNN模型，Client端获取两个模型的预测结果并进行后处理，得到最终的预测结果。
-
-![simple example](../images/model_ensemble_example.png)
-
-需要注意的是，目前只支持在同一个服务中使用多个相同格式输入输出的模型。在该例子中，CNN模型和BOW模型的输入输出格式是相同的。
-
-样例中用到的代码保存在`examples/C++/imdb`路径下：
-
-```shell
-.
-├── get_data.sh
-├── imdb_reader.py
-├── test_ensemble_client.py
-└── test_ensemble_server.py
+# 如何使用 C++ 定义模型组合
+
+如果您的模型处理过程包含一个以上的模型推理环节（例如 OCR 一般需要 det+rec 两个环节），此时有两种做法可以满足您的需求。
+
+1. 启动两个 Serving 服务（例如 Serving-det, Serving-rec），在您的 Client 中，读入数据——>det 前处理——>调用 Serving-det 预测——>det 后处理——>rec 前处理——>调用 Serving-rec 预测——>rec 后处理——>输出结果。
+    - 优点：无须改动 Paddle Serving 代码
+    - 缺点：需要两次请求服务，请求数据量越大，效率稍差。
+2. 通过修改代码，自定义模型预测行为（自定义 OP），自定义服务处理的流程（自定义 DAG），将多个模型的组合处理过程(上述的 det 前处理——>调用 Serving-det 预测——>det 后处理——>rec 前处理——>调用 Serving-rec 预测——>rec 后处理)集成在一个 Serving 服务中。此时，在您的 Client 中，读入数据——>调用集成后的 Serving——>输出结果。
+    - 优点：只需要一次请求服务，效率高。
+    - 缺点：需要改动代码，且需要重新编译。
+
+本文主要介绍自定义服务处理流程的方法，该方法的基本步骤如下：
+1. 自定义 OP（即定义单个模型的前处理-模型预测-模型后处理）
+2. 编译
+3. 服务启动与调用
+
+## 自定义 OP
+一个 OP 定义了单个模型的前处理-模型预测-模型后处理，定义 OP 需要以下 2 步：
+1. 定义 C++.h 头文件
+2. 定义 C++.cpp 源文件
+
+**一. 定义 C++.h 头文件**
+复制下方的代码，将其中`/*自定义 Class 名称*/`更换为自定义的类名即可，如 `GeneralDetectionOp`
+
+放置于 `core/general-server/op/` 路径下，文件名自定义即可，如 `general_detection_op.h`
+``` C++
+#pragma once
+#include <string>
+#include <vector>
+#include "core/general-server/general_model_service.pb.h"
+#include "core/general-server/op/general_infer_helper.h"
+#include "paddle_inference_api.h"  // NOLINT
+
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+
+class /*自定义Class名称*/
+    : public baidu::paddle_serving::predictor::OpWithChannel<GeneralBlob> {
+ public:
+  typedef std::vector<paddle::PaddleTensor> TensorVector;
+
+  DECLARE_OP(/*自定义Class名称*/);
+
+  int inference();
+};
+
+}  // namespace serving
+}  // namespace paddle_serving
+}  // namespace baidu
 ```
-
-### 数据准备
-
-通过下面命令获取预训练的CNN和BOW模型（您也可以直接运行`get_data.sh`脚本）：
-
-```shell
-wget --no-check-certificate https://fleet.bj.bcebos.com/text_classification_data.tar.gz
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz
-tar -zxvf text_classification_data.tar.gz
-tar -zxvf imdb_model.tar.gz
+**二. 定义 C++.cpp 源文件**
+复制下方的代码，将其中`/*自定义 Class 名称*/`更换为自定义的类名，如 `GeneralDetectionOp`
+
+将前处理和后处理的代码添加在下方的代码中注释的前处理和后处理的位置。
+
+放置于 `core/general-server/op/` 路径下，文件名自定义即可，如 `general_detection_op.cpp`
+
+``` C++
+#include "core/general-server/op/自定义的头文件名"
+#include <algorithm>
+#include <iostream>
+#include <memory>
+#include <sstream>
+#include "core/predictor/framework/infer.h"
+#include "core/predictor/framework/memory.h"
+#include "core/predictor/framework/resource.h"
+#include "core/util/include/timer.h"
+
+namespace baidu {
+namespace paddle_serving {
+namespace serving {
+
+using baidu::paddle_serving::Timer;
+using baidu::paddle_serving::predictor::MempoolWrapper;
+using baidu::paddle_serving::predictor::general_model::Tensor;
+using baidu::paddle_serving::predictor::general_model::Response;
+using baidu::paddle_serving::predictor::general_model::Request;
+using baidu::paddle_serving::predictor::InferManager;
+using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
+
+int /*自定义Class名称*/::inference() {
+  //获取前置OP节点
+  const std::vector<std::string> pre_node_names = pre_names();
+  if (pre_node_names.size() != 1) {
+    LOG(ERROR) << "This op(" << op_name()
+               << ") can only have one predecessor op, but received "
+               << pre_node_names.size();
+    return -1;
+  }
+  const std::string pre_name = pre_node_names[0];
+
+  //将前置OP的输出，作为本OP的输入。
+  GeneralBlob *input_blob = mutable_depend_argument<GeneralBlob>(pre_name);
+  if (!input_blob) {
+    LOG(ERROR) << "input_blob is nullptr,error";
+    return -1;
+  }
+  TensorVector *in = &input_blob->tensor_vector;
+  uint64_t log_id = input_blob->GetLogId();
+  int batch_size = input_blob->_batch_size;
+
+  //初始化本OP的输出。
+  GeneralBlob *output_blob = mutable_data<GeneralBlob>();
+  output_blob->SetLogId(log_id);
+  output_blob->_batch_size = batch_size;
+  VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size;
+  TensorVector *out = &output_blob->tensor_vector;
+
+  //前处理的代码添加在此处，前处理直接修改上文的TensorVector* in
+  //注意in里面的数据是前置节点的输出经过后处理后的out中的数据
+
+  Timer timeline;
+  int64_t start = timeline.TimeStampUS();
+  timeline.Start();
+  // 将前处理后的in，初始化的out传入，进行模型预测，模型预测的输出会直接修改out指向的内存中的数据
+  // 如果您想定义一个不需要模型调用，只进行数据处理的OP，删除下面这一部分的代码即可。
+  if (InferManager::instance().infer(
+          engine_name().c_str(), in, out, batch_size)) {
+    LOG(ERROR) << "(logid=" << log_id
+               << ") Failed do infer in fluid model: " << engine_name().c_str();
+    return -1;
+  }
+
+  //后处理的代码添加在此处，后处理直接修改上文的TensorVector* out
+  //后处理后的out会被传递给后续的节点
+
+  int64_t end = timeline.TimeStampUS();
+  CopyBlobInfo(input_blob, output_blob);
+  AddBlobInfo(output_blob, start);
+  AddBlobInfo(output_blob, end);
+  return 0;
+}
+DEFINE_OP(/*自定义Class名称*/);
+
+}  // namespace serving
+}  // namespace paddle_serving
+}  // namespace baidu
 ```

-### 启动Server
-
-通过下面的Python代码启动Server端（您也可以直接运行`test_ensemble_server.py`脚本）：
-
-```python
-from paddle_serving_server import OpMaker
-from paddle_serving_server import OpGraphMaker
-from paddle_serving_server import Server
-
-op_maker = OpMaker()
-read_op = op_maker.create('GeneralReaderOp')
-cnn_infer_op = op_maker.create(
-    'GeneralInferOp', engine_name='cnn', inputs=[read_op])
-bow_infer_op = op_maker.create(
-    'GeneralInferOp', engine_name='bow', inputs=[read_op])
-response_op = op_maker.create(
-    'GeneralResponseOp', inputs=[cnn_infer_op, bow_infer_op])
-
-op_graph_maker = OpGraphMaker()
-op_graph_maker.add_op(read_op)
-op_graph_maker.add_op(cnn_infer_op)
-op_graph_maker.add_op(bow_infer_op)
-op_graph_maker.add_op(response_op)
-
-server = Server()
-server.set_op_graph(op_graph_maker.get_op_graph())
-model_config = {cnn_infer_op: 'imdb_cnn_model', bow_infer_op: 'imdb_bow_model'}
-server.load_model_config(model_config)
-server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
-server.run_server()
+1. TensorVector数据结构
+
+TensorVector* in 和 out 都是一个 TensorVector 类型的指针，其使用方法跟 Paddle C++ API 中的 Tensor 几乎一样，相关的数据结构如下所示
+
+``` C++
+//TensorVector
+typedef std::vector<paddle::PaddleTensor> TensorVector;
+
+//paddle::PaddleTensor
+struct PD_INFER_DECL PaddleTensor {
+  PaddleTensor() = default;
+  std::string name;  ///<  variable name.
+  std::vector<int> shape;
+  PaddleBuf data;  ///<  blob of data.
+  PaddleDType dtype;
+  std::vector<std::vector<size_t>> lod;  ///<  Tensor+LoD equals LoDTensor
+};
+
+//PaddleBuf
+class PD_INFER_DECL PaddleBuf {
+ public:
+
+ explicit PaddleBuf(size_t length)
+      : data_(new char[length]), length_(length), memory_owned_(true) {}
+
+  PaddleBuf(void* data, size_t length)
+      : data_(data), length_(length), memory_owned_{false} {}
+
+  explicit PaddleBuf(const PaddleBuf& other);
+
+  void Resize(size_t length);
+  void Reset(void* data, size_t length);
+  bool empty() const { return length_ == 0; }
+  void* data() const { return data_; }
+  size_t length() const { return length_; }
+  ~PaddleBuf() { Free(); }
+  PaddleBuf& operator=(const PaddleBuf&);
+  PaddleBuf& operator=(PaddleBuf&&);
+  PaddleBuf() = default;
+  PaddleBuf(PaddleBuf&& other);
+ private:
+  void Free();
+  void* data_{nullptr};  ///< pointer to the data memory.
+  size_t length_{0};     ///< number of memory bytes.
+  bool memory_owned_{true};
+};
 ```

-与普通预测服务不同的是，这里我们需要用DAG来描述Server端的运行逻辑。
+2. TensorVector 代码示例
+
+```C++
+/*例如，你想访问输入数据中的第1个Tensor*/
+paddle::PaddleTensor& tensor_1 = in->at(0);
+/*例如，你想修改输入数据中的第1个Tensor的名称*/
+tensor_1.name = "new name";
+/*例如，你想获取输入数据中的第1个Tensor的shape信息*/
+std::vector<int> tensor_1_shape = tensor_1.shape;
+/*例如，你想修改输入数据中的第1个Tensor中的数据*/
+void* data_1 = tensor_1.data.data();
+//后续直接修改data_1指向的内存即可
+//比如，当您的数据是int类型，将void*转换为int*进行处理即可
+```

-在创建Op的时候需要指定当前Op的前继（在该例子中，`cnn_infer_op`与`bow_infer_op`的前继均是`read_op`，`response_op`的前继是`cnn_infer_op`和`bow_infer_op`），对于预测Op`infer_op`还需要定义预测引擎名称`engine_name`（也可以使用默认值，建议设置该值方便Client端获取预测结果）。

-同时在配置模型路径时，需要以预测Op为key，对应的模型路径为value，创建模型配置字典，来告知Serving每个预测Op使用哪个模型。
+## 修改后编译
+此时，需要您重新编译生成 serving，并通过 `export SERVING_BIN` 设置环境变量来指定使用您编译生成的 serving 二进制文件，并通过 `pip3 install` 的方式安装相关 python 包，细节请参考[如何编译Serving](2-3_Compile_CN.md)

-### 启动Client
+## 服务启动与调用

-通过下面的Python代码运行Client端（您也可以直接运行`test_ensemble_client.py`脚本）：
+**一. Server 端启动**

+在前面两个小节工作做好的基础上，一个服务启动两个模型串联，只需要在`--model 后依次按顺序传入模型文件夹的相对路径`，且需要在`--op 后依次传入自定义 C++OP 类名称`，其中--model 后面的模型与--op 后面的类名称的顺序需要对应，`这里假设我们已经定义好了两个 OP 分别为 GeneralDetectionOp 和 GeneralRecOp`，则脚本代码如下：
 ```python
-from paddle_serving_client import Client
-from imdb_reader import IMDBDataset
-
-client = Client()
-# If you have more than one model, make sure that the input
-# and output of more than one model are the same.
-client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
-client.connect(["127.0.0.1:9393"])
-
-# you can define any english sentence or dataset here
-# This example reuses imdb reader in training, you
-# can define your own data preprocessing easily.
-imdb_dataset = IMDBDataset()
-imdb_dataset.load_resource('imdb.vocab')
-
-for i in range(3):
-    line = 'i am very sad | 0'
-    word_ids, label = imdb_dataset.get_words_and_label(line)
-    feed = {"words": word_ids}
-    fetch = ["acc", "cost", "prediction"]
-    fetch_maps = client.predict(feed=feed, fetch=fetch)
-    if len(fetch_maps) == 1:
-        print("step: {}, res: {}".format(i, fetch_maps['prediction'][0][1]))
-    else:
-        for model, fetch_map in fetch_maps.items():
-            print("step: {}, model: {}, res: {}".format(i, model, fetch_map[
-                'prediction'][0][1]))
+#一个服务启动多模型串联
+python3 -m paddle_serving_server.serve --model ocr_det_model ocr_rec_model --op GeneralDetectionOp GeneralRecOp --port 9292
+#多模型串联 ocr_det_model 对应 GeneralDetectionOp  ocr_rec_model 对应 GeneralRecOp
 ```

-Client端与普通预测服务没有发生太大的变化。当使用多个模型预测时，预测服务将返回一个key为Server端定义的引擎名称`engine_name`，value为对应的模型预测结果的字典。
+**二. Client 端调用**

-### 预期结果
-
-```txt
-step: 0, model: cnn, res: 0.560272455215
-step: 0, model: bow, res: 0.633530199528
-step: 1, model: cnn, res: 0.560272455215
-step: 1, model: bow, res: 0.633530199528
-step: 2, model: cnn, res: 0.560272455215
-step: 2, model: bow, res: 0.633530199528
+此时，Client 端的调用，也需要传入两个 Client 端的 proto 文件或文件夹的路径，以 OCR 为例，可以参考[ocr_cpp_client.py](../../examples/C++/PaddleOCR/ocr/ocr_cpp_client.py)来自行编写您的脚本，此时 Client 调用如下：
+```python
+#一个服务启动多模型串联
+python3 自定义.py ocr_det_client ocr_rec_client
+#ocr_det_client为第一个模型的Client端proto文件夹的相对路径
+#ocr_rec_client为第二个模型的Client端proto文件夹的相对路径
 ```
+此时，对于 Server 端而言，输入的数据的格式与`第一个模型的 Client 端 proto 格式`定义的一致，输出的数据格式与`最后一个模型的 Client 端 proto`文件一致。一般情况下您无须关注此事，当您需要了解详细的proto的定义，请参考[Serving 配置](5-3_Serving_Configure_CN.md)。
--- a/doc/C++_Serving/OP_CN.md
+++ b/doc/C++_Serving/OP_CN.md
 # 如何开发一个新的General Op?

-(简体中文|[English](./OP_EN.md))
+- [定义一个Op](#1)
+- [在Op之间使用 `GeneralBlob`](#2)
+  - [2.1 实现 `int Inference()`](#2.1)
+- [定义 Python API](#3)

-在本文档中，我们主要集中于如何为Paddle Serving开发新的服务器端运算符。 在开始编写新运算符之前，让我们看一些示例代码以获得为服务器编写新运算符的基本思想。 我们假设您已经知道Paddle Serving服务器端的基本计算逻辑。 下面的代码您可以在 Serving代码库下的 `core/general-server/op` 目录查阅。
+在本文档中，我们主要集中于如何为 Paddle Serving 开发新的服务器端运算符。在开始编写新运算符之前，让我们看一些示例代码以获得为服务器编写新运算符的基本思想。我们假设您已经知道 Paddle Serving 服务器端的基本计算逻辑。 下面的代码您可以在 Serving代码库下的 `core/general-server/op` 目录查阅。


 ``` c++
-// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-//
-// Licensed under the Apache License, Version 2.0 (the "License");
-// you may not use this file except in compliance with the License.
-// You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.

 #pragma once
 #include <string>
 #include <vector>
-#ifdef BCLOUD
-#ifdef WITH_GPU
-#include "paddle/paddle_inference_api.h"
-#else
-#include "paddle/fluid/inference/api/paddle_inference_api.h"
-#endif
-#else
 #include "paddle_inference_api.h"  // NOLINT
-#endif
 #include "core/general-server/general_model_service.pb.h"
 #include "core/general-server/op/general_infer_helper.h"

@@ -54,14 +36,17 @@ class GeneralInferOp
 }  // namespace paddle_serving
 }  // namespace baidu
 ```
+<a name="1"></a>

 ## 定义一个Op

-上面的头文件声明了一个名为`GeneralInferOp`的PaddleServing运算符。 在运行时，将调用函数 `int inference（)`。 通常，我们将服务器端运算符定义为baidu::paddle_serving::predictor::OpWithChannel的子类，并使用 `GeneralBlob` 数据结构。
+上面的头文件声明了一个名为 `GeneralInferOp` 的 Paddle Serving 运算符。 在运行时，将调用函数 `int inference（)`。 通常，我们将服务器端运算符定义为baidu::paddle_serving::predictor::OpWithChannel 的子类，并使用 `GeneralBlob` 数据结构。
+
+<a name="2"></a>

 ## 在Op之间使用 `GeneralBlob` 

-`GeneralBlob` 是一种可以在服务器端运算符之间使用的数据结构。 `tensor_vector`是`GeneralBlob`中最重要的数据结构。 服务器端的操作员可以将多个`paddle::PaddleTensor`作为输入，并可以将多个`paddle::PaddleTensor`作为输出。 特别是，`tensor_vector`可以在没有内存拷贝的操作下输入到Paddle推理引擎中。
+`GeneralBlob` 是一种可以在服务器端运算符之间使用的数据结构。 `tensor_vector` 是 `GeneralBlob` 中最重要的数据结构。 服务器端的操作员可以将多个 `paddle::PaddleTensor` 作为输入，并可以将多个 `paddle::PaddleTensor `作为输出。 特别是，`tensor_vector` 可以在没有内存拷贝的操作下输入到 Paddle 推理引擎中。

 ``` c++
 struct GeneralBlob {
@@ -86,7 +71,9 @@ struct GeneralBlob {
 };
 ```

-### 实现 `int Inference()`
+<a name="2.1"></a>
+
+**一. 实现 `int Inference()`**

 ``` c++
 int GeneralInferOp::inference() {
@@ -127,14 +114,13 @@ int GeneralInferOp::inference() {
 DEFINE_OP(GeneralInferOp);
 ```

-`input_blob` 和 `output_blob` 都有很多的 `paddle::PaddleTensor`, 且Paddle预测库会被 `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)`调用。此函数中的其他大多数代码都与性能分析有关，将来我们也可能会删除多余的代码。
-
+`input_blob` 和 `output_blob` 都有很多的 `paddle::PaddleTensor`, 且 Paddle 预测库会被 `InferManager::instance().infer(engine_name().c_str(), in, out, batch_size)` 调用。此函数中的其他大多数代码都与性能分析有关，将来我们也可能会删除多余的代码。

-基本上，以上代码可以实现一个新的运算符。如果您想访问字典资源，可以参考`core/predictor/framework/resource.cpp`来添加全局可见资源。资源的初始化在启动服务器的运行时执行。
+<a name="3"></a>

 ## 定义 Python API

-在服务器端为Paddle Serving定义C++运算符后，最后一步是在Python API中为Paddle Serving服务器API添加注册， `python/paddle_serving_server/dag.py`文件里有关于API注册的代码如下
+在服务器端为 Paddle Serving 定义 C++ 运算符后，最后一步是在 Python API 中为 Paddle Serving 服务器 API 添加注册， `python/paddle_serving_server/dag.py` 文件里有关于 API 注册的代码如下


 ``` python
@@ -152,7 +138,7 @@ self.op_list = [
        ]
 ```

-在`python/paddle_serving_server/server.py`文件中仅添加`需要加载模型，执行推理预测的自定义的C++OP类的类名`。例如`GeneralReaderOp`由于只是做一些简单的数据处理而不加载模型调用预测，故在👆的代码中需要添加，而不添加在👇的代码中。
+在 `python/paddle_serving_server/server.py` 文件中仅添加`需要加载模型，执行推理预测的自定义的 C++ OP 类的类名`。例如 `GeneralReaderOp` 由于只是做一些简单的数据处理而不加载模型调用预测，故在上述的代码中需要添加，而不添加在下方的代码中。
 ``` python
 default_engine_types = [
                'GeneralInferOp',

--- a/doc/C++_Serving/Performance_Tuning_CN.md
+++ b/doc/C++_Serving/Performance_Tuning_CN.md
 # C++ Serving性能分析与优化
-# 1.背景知识介绍
+
+## 背景知识介绍
 1) 首先，应确保您知道C++ Serving常用的一些[功能特点](./Introduction_CN.md)和[C++ Serving 参数配置和启动的详细说明](../Serving_Configure_CN.md)。
 2) 关于C++ Serving框架本身的性能分析和介绍，请参考[C++ Serving框架性能测试](./Frame_Performance_CN.md)。
 3) 您需要对您使用的模型、机器环境、需要部署上线的业务有一些了解，例如，您使用CPU还是GPU进行预测；是否可以开启TRT进行加速；你的机器CPU是多少core的；您的业务包含几个模型；每个模型的输入和输出需要做些什么处理；您业务的最大线上流量是多少；您的模型支持的最大输入batch是多少等等.

--- a/doc/C++_Serving/Request_Cache_CN.md
+++ b/doc/C++_Serving/Request_Cache_CN.md
-# Request Cache
+# 请求缓存

 本文主要介绍请求缓存功能及实现原理。

-服务中请求由张量tensor、结果名称fetch_var_names、调试开关profile_server、标识码log_id组成，预测结果包含输出张量等。这里缓存会保存请求与结果的键值对。当请求命中缓存时，服务不会执行模型预测，而是会直接从缓存中提取结果。对于某些特定场景而言，这能显著降低请求耗时。
+## 基本原理

-缓存可以通过设置`--request_cache_size`来开启。该标志默认为0，即不开启缓存。当设置非零值时，服务会以设置大小为存储上限开启缓存。这里设置的内存单位为字节。注意，如果设置`--request_cache_size`为0是不能开启缓存的。
+服务中请求由张量 tensor、结果名称 fetch_var_names、调试开关 profile_server、标识码 log_id 组成，预测结果包含输出张量 tensor 等。这里缓存会保存请求与结果的键值对。当请求命中缓存时，服务不会执行模型预测，而是会直接从缓存中提取结果。对于某些特定场景而言，这能显著降低请求耗时。

-缓存中的键为64位整形数，是由请求中的tensor和fetch_var_names数据生成的128位哈希值。如果请求命中，那么对应的处理结果会提取出来用于构建响应数据。如果请求没有命中，服务则会执行模型预测，在返回结果的同时将处理结果放入缓存中。由于缓存设置了存储上限，因此需要淘汰机制来限制缓存容量。当前，服务采用了最近最少使用（LRU）机制用于淘汰缓存数据。
+缓存可以通过设置`--request_cache_size`来开启。该标志默认为 0，即不开启缓存。当设置非零值时，服务会以设置大小为存储上限开启缓存。这里设置的内存单位为字节。注意，如果设置`--request_cache_size`为 0 是不能开启缓存的。
+
+缓存中的键为 64 位整形数，是由请求中的 tensor 和 fetch_var_names 数据生成的 64 位哈希值。如果请求命中，那么对应的处理结果会提取出来用于构建响应数据。如果请求没有命中，服务则会执行模型预测，在返回结果的同时将处理结果放入缓存中。由于缓存设置了存储上限，因此需要淘汰机制来限制缓存容量。当前，服务采用了最近最少使用（LRU）机制用于淘汰缓存数据。

 ## 注意事项

 - 只有预测成功的请求会进行缓存。如果请求失败或者在预测过程中返回错误，则处理结果不会缓存。
- - 缓存是基于请求数据的哈希值实现。因此，可能会出现两个不同的请求生成了相同的哈希值即哈希碰撞，这时服务可能会返回错误的响应数据。哈希值为64位数据，发生哈希碰撞的可能性较小。
+ - 缓存是基于请求数据的哈希值实现。因此，可能会出现两个不同的请求生成了相同的哈希值即哈希碰撞，这时服务可能会返回错误的响应数据。哈希值为 64 位数据，发生哈希碰撞的可能性较小。
 - 不论使用同步模式还是异步模式，均可以正常使用缓存功能。
--- a/doc/Compile_CN.md
+++ b/doc/Compile_CN.md
@@ -38,18 +38,17 @@

 推荐使用Docker编译，我们已经为您准备好了Paddle Serving编译环境并配置好了上述编译依赖，详见[该文档](Docker_Images_CN.md)。

-我们提供了五个环境的开发镜像，分别是CPU， CUDA10.1+CUDNN7， CUDA10.2+CUDNN7，CUDA10.2+CUDNN8， CUDA11.2+CUDNN8。我们提供了Serving开发镜像涵盖以上环境。与此同时，我们也支持Paddle开发镜像。
+我们提供了五个环境的开发镜像，分别是CPU、 CUDA10.1+CUDNN7、CUDA10.2+CUDNN8、 CUDA11.2+CUDNN8。我们提供了Serving开发镜像涵盖以上环境。与此同时，我们也支持Paddle开发镜像。

 Serving开发镜像是Serving套件为了支持各个预测环境提供的用于编译、调试预测服务的镜像，Paddle开发镜像是Paddle在官网发布的用于编译、开发、训练模型使用镜像。为了让Paddle开发者能够在同一个容器内直接使用Serving。对于上个版本就已经使用Serving用户的开发者来说，Serving开发镜像应该不会感到陌生。但对于熟悉Paddle训练框架生态的开发者，目前应该更熟悉已有的Paddle开发镜像。为了适应所有用户的不同习惯，我们对这两套镜像都做了充分的支持。


 |  环境                         |   Serving开发镜像Tag               |    操作系统      | Paddle开发镜像Tag       |  操作系统            |
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
-|  CPU                         | 0.8.0-devel                       |  Ubuntu 16.04   | 2.2.2                 | Ubuntu 18.04.       |
-|  CUDA10.1 + CUDNN7             | 0.8.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
-|  CUDA10.2 + CUDNN7             | 0.8.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
-|  CUDA10.2 + CUDNN8             | 0.8.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
-|  CUDA11.2 + CUDNN8             | 0.8.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  CPU                         | 0.9.0-devel                       |  Ubuntu 16.04   | 2.3.0                 | Ubuntu 18.04.       |
+|  CUDA10.1 + CUDNN7             | 0.9.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
+|  CUDA10.2 + CUDNN8             | 0.9.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
+|  CUDA11.2 + CUDNN8             | 0.9.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.3.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 

 我们首先要针对自己所需的环境拉取相关镜像。上表**环境**一列下，除了CPU，其余（Cuda**+Cudnn**）都属于GPU环境。
 您可以使用Serving开发镜像。

--- a/doc/Compile_EN.md
+++ b/doc/Compile_EN.md
@@ -37,17 +37,16 @@ In addition, for some C++ secondary development scenarios, we also provide OPENC

 Docker compilation is recommended. We have prepared the Paddle Serving compilation environment for you and configured the above compilation dependencies. For details, please refer to [this document](DOCKER_IMAGES_CN.md).

-We provide five environment development images, namely CPU, CUDA10.1 + CUDNN7, CUDA10.2 + CUDNN7, CUDA10.2 + CUDNN8, CUDA11.2 + CUDNN8. We provide a Serving development image to cover the above environment. At the same time, we also support Paddle development mirroring.
+We provide 4 environment development images, namely CPU, CUDA10.2 + CUDNN7, CUDA10.2 + CUDNN8, CUDA11.2 + CUDNN8. We provide a Serving development image to cover the above environment. At the same time, we also support Paddle development mirroring.

 Serving development mirror is the mirror used to compile and debug prediction services provided by Serving suite in order to support various prediction environments. Paddle development mirror is the mirror used for compilation, development, and training models released by Paddle on the official website. In order to allow Paddle developers to use Serving directly in the same container. For developers who have already used Serving users in the previous version, Serving development image should not be unfamiliar. But for developers who are familiar with the Paddle training framework ecology, they should be more familiar with the existing Paddle development mirrors. In order to adapt to the different habits of all users, we have fully supported both sets of mirrors.

 |  Environment           |   Serving Dev Image Tag               |    OS      | Paddle Dev Image Tag       |  OS            |
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
-|  CPU                         | 0.8.0-devel                       |  Ubuntu 16.04   | 2.2.2                 | Ubuntu 18.04.       |
-|  CUDA10.1 + Cudnn7             | 0.8.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | Nan                     | Nan                 |
-|  CUDA10.2 + Cudnn7             | 0.8.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
-|  CUDA10.2 + Cudnn8             | 0.8.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | Nan                    |  Nan                 |
-|  CUDA11.2 + Cudnn8             | 0.8.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  CPU                         | 0.9.0-devel                       |  Ubuntu 16.04   | 2.3.0                 | Ubuntu 18.04.       |
+|  CUDA10.1 + Cudnn7             | 0.9.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | Nan                     | Nan                 |
+|  CUDA10.2 + Cudnn8             | 0.9.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | Nan                    |  Nan                 |
+|  CUDA11.2 + Cudnn8             | 0.9.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.3.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 

 We first need to pull related images for the environment we need. Under the **Environment** column in the above table, except for the CPU, the rest (Cuda**+Cudnn**) belong to the GPU environment.


--- a/doc/Docker_Images_CN.md
+++ b/doc/Docker_Images_CN.md
@@ -26,7 +26,7 @@
 ## 镜像说明

 若需要基于源代码二次开发编译，请使用后缀为-devel的版本。
-**在TAG列，0.8.0也可以替换成对应的版本号，例如0.5.0/0.4.1等，但需要注意的是，部分开发环境随着某个版本迭代才增加，因此并非所有环境都有对应的版本号可以使用。**
+**在TAG列，0.9.0也可以替换成对应的版本号，例如0.5.0/0.4.1等，但需要注意的是，部分开发环境随着某个版本迭代才增加，因此并非所有环境都有对应的版本号可以使用。**

 **开发镜像：**

@@ -34,12 +34,11 @@

 |                         镜像选择                         |   操作系统    |             TAG              |                          Dockerfile                          |
 | :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
-|                       CPU development                        | Ubuntu16 |         0.8.0-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
-|              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | 0.8.0-cuda10.1-cudnn7-gcc54-devel (not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
-|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | 0.8.0-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
-|              GPU (cuda10.2-cudnn7-tensorRT6) development               | Ubuntu16 | 0.8.0-cuda10.2-cudnn7-devel | [Dockerfile.cuda10.2-cudnn7.devel](../tools/Dockerfile.cuda10.2-cudnn7.devel) |
-|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | 0.8.0-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
-|              GPU (cuda11.2-cudnn8-tensorRT8) development               | Ubuntu16 | 0.8.0-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |
+|                       CPU development                        | Ubuntu16 |         0.9.0-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
+|              GPU (cuda10.1-cudnn7-tensorRT6) development     | Ubuntu16 | 0.9.0-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
+|              GPU (cuda10.2-cudnn7-tensorRT6) development     | Ubuntu16 | 0.9.0-cuda10.2-cudnn7-devel | [Dockerfile.cuda10.2-cudnn7.devel](../tools/Dockerfile.cuda10.2-cudnn7.devel)
+|              GPU (cuda10.2-cudnn8-tensorRT7) development     | Ubuntu16 | 0.9.0-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
+|              GPU (cuda11.2-cudnn8-tensorRT8) development     | Ubuntu16 | 0.9.0-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |


 **运行镜像：**
@@ -48,15 +47,16 @@

 | Env      | Version | Docker images tag            | OS        | Gcc Version | Size |
 |----------|---------|------------------------------|-----------|-------------|------|
-|    CPU   | 0.8.0 | 0.8.0-runtime                 | Ubuntu 16 |  8.2.0       | 3.9 GB |
-| Cuda10.1 | 0.8.0 | 0.8.0-cuda10.1-cudnn7-runtime  | Ubuntu 16 |   8.2.0       | 10 GB |
-| Cuda10.2 | 0.8.0 | 0.8.0-cuda10.2-cudnn8-runtime  | Ubuntu 16 |   8.2.0       | 10.1 GB |
-| Cuda11.2 | 0.8.0 | 0.8.0-cuda11.2-cudnn8-runtime| Ubuntu 16 |    8.2.0       | 14.2 GB |
+|    CPU   | 0.9.0 | 0.9.0-runtime                 | Ubuntu 16 |  8.2.0       | 3.9 GB |
+| CUDA 10.1 + cuDNN 7 | 0.9.0 | 0.9.0-cuda10.1-cudnn7-runtime  | Ubuntu 16 |   8.2.0       | 10 GB |
+| CUDA 10.2 + cuDNN 7 | 0.9.0 | 0.9.0-cuda10.2-cudnn7-runtime  | Ubuntu 16 |   8.2.0       | 10.1 GB |
+| CUDA 10.2 + cuDNN 8 | 0.9.0 | 0.9.0-cuda10.2-cudnn8-runtime  | Ubuntu 16 |   8.2.0       | 10.1 GB |
+| CUDA 11.2 + cuDNN 8 | 0.9.0 | 0.9.0-cuda11.2-cudnn8-runtime  | Ubuntu 16 |   8.2.0       | 14.2 GB |


 **Java镜像：**
 ```
-registry.baidubce.com/paddlepaddle/serving:0.8.0-cuda10.2-java
+registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda10.2-cudnn8-java
 ```

 **XPU镜像：**

--- a/doc/Docker_Images_EN.md
+++ b/doc/Docker_Images_EN.md
@@ -28,10 +28,8 @@ You can get images in two ways:

 If you want to customize your Serving based on source code, use the version with the suffix - devel.

-**cuda10.1-cudnn7-gcc54 image is not ready, you should run from dockerfile if you need it.**
-
 If you need to develop and compile based on the source code, please use the version with the suffix -devel.
-**In the TAG column, 0.8.0 can also be replaced with the corresponding version number, such as 0.5.0/0.4.1, etc., but it should be noted that some development environments only increase with a certain version iteration, so not all environments All have the corresponding version number can be used.**
+**In the TAG column, 0.9.0 can also be replaced with the corresponding version number, such as 0.5.0/0.4.1, etc., but it should be noted that some development environments only increase with a certain version iteration, so not all environments All have the corresponding version number can be used.**

 **Development Docker Images:**

@@ -39,12 +37,11 @@ A variety of development tools are installed in the development image, which can

 |                         Description                         |   OS    |             TAG              |                          Dockerfile                          |
 | :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
-|                       CPU development                        | Ubuntu16 |         0.8.0-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
-|              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | 0.8.0-cuda10.1-cudnn7-gcc54-devel (not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
-|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | 0.8.0-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
-|              GPU (cuda10.2-cudnn7-tensorRT6) development               | Ubuntu16 | 0.8.0-cuda10.2-cudnn7-devel | [Dockerfile.cuda10.2-cudnn7.devel](../tools/Dockerfile.cuda10.2-cudnn7.devel) |
-|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | 0.8.0-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
-|              GPU (cuda11.2-cudnn8-tensorRT8) development               | Ubuntu16 | 0.8.0-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |
+|                       CPU development                        | Ubuntu16 |         0.9.0-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
+|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | 0.9.0-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
+|              GPU (cuda10.2-cudnn7-tensorRT6) development     | Ubuntu16 | 0.9.0-cuda10.2-cudnn7-devel | [Dockerfile.cuda10.2-cudnn7.devel](../tools/Dockerfile.cuda10.2-cudnn7.devel)
+|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | 0.9.0-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
+|              GPU (cuda11.2-cudnn8-tensorRT8) development               | Ubuntu16 | 0.9.0-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |


 **Runtime Docker Images:**
@@ -53,14 +50,15 @@ Runtime Docker Images is lighter than Develop Images, and Running Images are mad

 | Env      | Version | Docker images tag            | OS        | Gcc Version | Size |
 |----------|---------|------------------------------|-----------|-------------|------|
-|    CPU   | 0.8.0 | 0.8.0-runtime                 | Ubuntu 16 |  8.2.0       | 3.9 GB |
-| Cuda10.1 | 0.8.0 | 0.8.0-cuda10.1-cudnn7-runtime  | Ubuntu 16 |   8.2.0       | 10 GB |
-| Cuda10.2 | 0.8.0 | 0.8.0-cuda10.2-cudnn8-runtime  | Ubuntu 16 |   8.2.0       | 10.1 GB |
-| Cuda11.2 | 0.8.0 | 0.8.0-cuda11.2-cudnn8-runtime| Ubuntu 16 |    8.2.0       | 14.2 GB |
+|    CPU   | 0.9.0 | 0.9.0-runtime                 | Ubuntu 16 |  8.2.0       | 3.9 GB |
+| CUDA 10.1 + cuDNN 7 | 0.9.0 | 0.9.0-cuda10.1-cudnn7-runtime  | Ubuntu 16 |   8.2.0       | 10 GB |
+| CUDA 10.2 + cuDNN 7 | 0.9.0 | 0.9.0-cuda10.2-cudnn7-runtime  | Ubuntu 16 |   8.2.0       | 10.1 GB |
+| CUDA 10.2 + cuDNN 8 | 0.9.0 | 0.9.0-cuda10.2-cudnn8-runtime  | Ubuntu 16 |   8.2.0       | 10.1
+| CUDA 11.2 + cuDNN 8 | 0.9.0 | 0.9.0-cuda11.2-cudnn8-runtime  | Ubuntu 16 |    8.2.0       | 14.2 GB |

 **Java SDK Docker Image:**
 ```
-registry.baidubce.com/paddlepaddle/serving:0.8.0-cuda10.2-java
+registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda10.2-cudnn8-java
 ```

 **XPU Docker Images:**

--- a/doc/FAQ_CN.md
+++ b/doc/FAQ_CN.md
--- a/doc/Install_CN.md
+++ b/doc/Install_CN.md
@@ -6,7 +6,7 @@

 **提示-1**：本项目仅支持<mark>**Python3.6/3.7/3.8/3.9**</mark>，接下来所有的与Python/Pip相关的操作都需要选择正确的Python版本。

-**提示-2**：以下示例中GPU环境均为cuda10.2-cudnn7，如果您使用Python Pipeline来部署，并需要Nvidia TensorRT来优化预测性能，请参考[支持的镜像环境和说明](#4支持的镜像环境和说明)来选择其他版本。
+**提示-2**：以下示例中GPU环境均为cuda11.2-cudnn8，如果您使用Python Pipeline来部署，并需要Nvidia TensorRT来优化预测性能，请参考[支持的镜像环境和说明](#4支持的镜像环境和说明)来选择其他版本。


 ## 1.启动开发镜像
@@ -15,16 +15,16 @@
 **CPU：**
 ```
 # 启动 CPU Docker
-docker pull registry.baidubce.com/paddlepaddle/serving:0.8.0-devel
-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.8.0-devel bash
+docker pull registry.baidubce.com/paddlepaddle/serving:0.9.0-devel
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.9.0-devel bash
 docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving
 ```
 **GPU：**
 ```
 # 启动 GPU Docker
-docker pull registry.baidubce.com/paddlepaddle/serving:0.8.0-cuda10.2-cudnn7-devel
-nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.8.0-cuda10.2-cudnn7-devel bash
+docker pull registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda11.2-cudnn8-devel
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda11.2-cudnn8-devel bash
 nvidia-docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving
 ```
@@ -32,8 +32,8 @@ git clone https://github.com/PaddlePaddle/Serving
 **CPU：**
 ```
 # 启动 CPU Docker
-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.2.2 bash
+nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.3.0 bash
 docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving

@@ -43,8 +43,9 @@ bash Serving/tools/paddle_env_install.sh
 **GPU：**
 ```
 # 启动 GPU Docker
-nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7
-nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7 bash
+
+nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0-gpu-cuda11.2-cudnn8
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.3.0-gpu-cuda11.2-cudnn8 bash
 nvidia-docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving

@@ -60,21 +61,21 @@ pip3 install -r python/requirements.txt
 ```

 安装服务whl包，共有3种client、app、server，Server分为CPU和GPU，GPU包根据您的环境选择一种安装
- post102 = CUDA10.2 + cuDNN7 + TensorRT6（推荐）
+- post112 = CUDA11.2 + cuDNN8 + TensorRT8（推荐）
 - post101 = CUDA10.1 + cuDNN7 + TensorRT6
- post112 = CUDA11.2 + cuDNN8 + TensorRT8
+- post102 = CUDA10.2 + cuDNN7 + TensorRT6 (与Paddle 镜像一致)
+- post1028 = CUDA10.2 + cuDNN8 + TensorRT7
+

 ```shell
-pip3 install paddle-serving-client==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
-pip3 install paddle-serving-app==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-client==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-app==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

 # CPU Server
-pip3 install paddle-serving-server==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-server==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

-# GPU Server，需要确认环境再选择执行哪一条，推荐使用CUDA 10.2的包
-pip3 install paddle-serving-server-gpu==0.8.3.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple 
-pip3 install paddle-serving-server-gpu==0.8.3.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
-pip3 install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
+# GPU Server，需要确认环境再选择执行哪一条，推荐使用CUDA 11.2的包
+pip3 install paddle-serving-server-gpu==0.9.0.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```

 默认开启国内清华镜像源来加速下载，如果您使用HTTP代理可以关闭(`-i https://pypi.tuna.tsinghua.edu.cn/simple`)
@@ -85,45 +86,46 @@ paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubun

 paddle-serving-client和paddle-serving-app安装包支持Linux和Windows，其中paddle-serving-client仅支持python3.6/3.7/3.8/3.9。

-**如果您之前使用paddle serving 0.5.X 0.6.X的Cuda10.2环境，需要注意在0.8.0版本，paddle-serving-server-gpu==0.8.0.post102的使用Cudnn7和TensorRT6，而0.6.0.post102使用cudnn8和TensorRT7。如果0.6.0的cuda10.2用户需要升级安装，请使用paddle-serving-server-gpu==0.8.0.post1028**
-
 ## 3.安装Paddle相关Python库
+
 **当您使用`paddle_serving_client.convert`命令或者`Python Pipeline框架`时才需要安装。**
 ```
 # CPU环境请执行
-pip3 install paddlepaddle==2.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddlepaddle==2.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

-# GPU CUDA 10.2环境请执行
-pip3 install paddlepaddle-gpu==2.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+# GPU CUDA 11.2环境请执行
+pip3 install paddlepaddle-gpu==2.3.0.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```
-**注意**： 如果您的Cuda版本不是10.2，或者您需要在GPU环境上使用TensorRT，请勿直接执行上述命令，需要参考[Paddle-Inference官方文档-下载安装Linux预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)选择相应的GPU环境的url链接并进行安装。举例假设您使用python3.6，请执行如下命令。
+**注意**： 其他版本请参考[Paddle-Inference官方文档-下载安装Linux预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) 选择相应的GPU环境的 URL 链接并进行安装。

 ```
-# CUDA10.1 + CUDNN7 + TensorRT6
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.2.post101-cp36-cp36m-linux_x86_64.whl
+# CUDA11.2 + CUDNN8 + TensorRT8 + Python(3.6-3.9)
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp36-cp36m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp37-cp37m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp38-cp38-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp39-cp39-linux_x86_64.whl

-# CUDA10.2 + CUDNN7 + TensorRT6, 需要注意的是此环境和Cuda10.1+Cudnn7+TensorRT6使用同一个paddle whl包
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.2.post101-cp36-cp36m-linux_x86_64.whl
+# CUDA10.1 + CUDNN7 + TensorRT6 + Python(3.6-3.9)
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp36-cp36m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp37-cp37m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp38-cp38-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp39-cp39-linux_x86_64.whl

-# CUDA10.2 + CUDNN8 + TensorRT7
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.2.2-cp36-cp36m-linux_x86_64.whl
+# CUDA10.2 + CUDNN8 + TensorRT7 + Python(3.6-3.9)
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp36-cp36m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp38-cp38-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp39-cp39-linux_x86_64.whl

-# CUDA11.2 + CUDNN8 + TensorRT8
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.2.2.post112-cp36-cp36m-linux_x86_64.whl
 ```

-例如CUDA 10.1的Python3.6用户，请选择表格当中的`cp36-cp36m`和`linux-cuda10.1-cudnn7.6-trt6-gcc8.2`对应的url，复制下来并执行
-```
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.2.post101-cp36-cp36m-linux_x86_64.whl
-```
 ## 4.支持的镜像环境和说明
 |  环境                         |   Serving开发镜像Tag               |    操作系统      | Paddle开发镜像Tag       |  操作系统            |
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
-|  CPU                         | 0.8.0-devel                       |  Ubuntu 16.04   | 2.2.2                 | Ubuntu 18.04.       |
-|  CUDA10.1 + CUDNN7             | 0.8.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
-|  CUDA10.2 + CUDNN7             | 0.8.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
-|  CUDA10.2 + CUDNN8             | 0.8.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
-|  CUDA11.2 + CUDNN8             | 0.8.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  CPU                         | 0.9.0-devel                       |  Ubuntu 16.04   | 2.3.0                | Ubuntu 18.04.       |
+|  CUDA10.1 + CUDNN7           | 0.9.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                   | 无                 |
+|  CUDA10.2 + CUDNN8           | 0.9.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                   | Ubuntu 18.04   |
+|  CUDA11.2 + CUDNN8           | 0.9.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.3.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04   | 

 对于**Windows 10 用户**，请参考文档[Windows平台使用Paddle Serving指导](Windows_Tutorial_CN.md)。

@@ -132,4 +134,4 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x
 ```
 python3 -m paddle_serving_server.serve check
 ```
-详情请参考[环境检查文档](./Check_Env_CN.md)
+详情请参考[环境检查文档](./Check_Env_CN.md)
\ No newline at end of file
--- a/doc/Install_EN.md
+++ b/doc/Install_EN.md
@@ -6,7 +6,7 @@

 **Tip-1**: This project only supports <mark>**Python3.6/3.7/3.8/3.9**</mark>, all subsequent operations related to Python/Pip need to select the correct Python version.

-**Tip-2**: The GPU environments in the following examples are all cuda10.2-cudnn7. If you use Python Pipeline to deploy and need Nvidia TensorRT to optimize prediction performance, please refer to [Supported Mirroring Environment and Instructions](#4.-Supported-Docker-Images-and-Instruction) to choose other versions.
+**Tip-2**: The GPU environments in the following examples are all cuda11.2-cudnn8. If you use Python Pipeline to deploy and need Nvidia TensorRT to optimize prediction performance, please refer to [Supported Mirroring Environment and Instructions](#4.-Supported-Docker-Images-and-Instruction) to choose other versions.

 ## 1. Start the Docker Container
 <mark>**Both Serving Dev Image and Paddle Dev Image are supported at the same time. You can choose 1 from the operation 2 in chapters 1.1 and 1.2.**</mark>Deploying the Serving service on the Paddle docker image requires the installation of additional dependency libraries. Therefore, we directly use the Serving development image.
@@ -15,16 +15,16 @@
 **CPU:**
 ```
 # Start CPU Docker Container
-docker pull registry.baidubce.com/paddlepaddle/serving:0.8.0-devel
-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.8.0-devel bash
+docker pull registry.baidubce.com/paddlepaddle/serving:0.9.0-devel
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.9.0-devel bash
 docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving
 ```
 **GPU:**
 ```
 # Start GPU Docker Container
-docker pull registry.baidubce.com/paddlepaddle/serving:0.8.0-cuda10.2-cudnn7-devel
-nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.8.0-cuda10.2-cudnn7-devel bash
+docker pull registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda11.2-cudnn7-devel
+nvidia-docker run -p 9292:9292 --name test -dit docker pull registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda11.2-cudnn7-devel bash
 nvidia-docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving
 ```
@@ -32,8 +32,8 @@ git clone https://github.com/PaddlePaddle/Serving
 **CPU:**
 ```
 # Start CPU Docker Container
-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2
-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.2.2 bash
+docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.3.0 bash
 docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving

@@ -43,8 +43,8 @@ bash Serving/tools/paddle_env_install.sh
 **GPU:**
 ```
 # Start GPU Docker
-nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7
-nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7 bash
+nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0-gpu-cuda11.2-cudnn8
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/paddle:2.3.0-gpu-cuda11.2-cudnn8 bash
 nvidia-docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving

@@ -61,21 +61,22 @@ pip3 install -r python/requirements.txt
 ```

 Install the service whl package. There are three types of client, app and server. The server is divided into CPU and GPU. Choose one installation according to the environment. 
- post102 = CUDA10.2 + cuDNN7 + TensorRT6(Recommended)
+- post112 = CUDA11.2 + cuDNN8 + TensorRT8（Recommanded）
 - post101 = CUDA10.1 + cuDNN7 + TensorRT6
- post112 = CUDA11.2 + cuDNN8 + TensorRT8
+- post102 = CUDA10.2 + cuDNN8 + TensorRT7
+

 ```shell
-pip3 install paddle-serving-client==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
-pip3 install paddle-serving-app==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-client==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-app==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

 # CPU Server
-pip3 install paddle-serving-server==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-server==0.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

 # GPU environments need to confirm the environment before choosing which one to execute
-pip3 install paddle-serving-server-gpu==0.8.3.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple 
-pip3 install paddle-serving-server-gpu==0.8.3.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
-pip3 install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-server-gpu==0.9.0.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddle-serving-server-gpu==0.9.0.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple 
+pip3 install paddle-serving-server-gpu==0.9.0.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```

 By default, the domestic Tsinghua mirror source is turned on to speed up the download. If you use a proxy, you can turn it off（`-i https://pypi.tuna.tsinghua.edu.cn/simple`).
@@ -86,31 +87,35 @@ The paddle-serving-server and paddle-serving-server-gpu installation packages su

 The paddle-serving-client and paddle-serving-app installation packages support Linux and Windows, and paddle-serving-client only supports python3.6/3.7/3.8/3.9.

-**If you used the CUDA10.2 environment of paddle serving 0.5.X 0.6.X before, you need to pay attention to version 0.8.0, paddle-serving-server-gpu==0.8.0.post102 uses Cudnn7 and TensorRT6, and 0.6.0.post102 uses cudnn8 and TensorRT7. If 0.6.0 cuda10.2 users need to upgrade, please use paddle-serving-server-gpu==0.8.0.post1028**
-
 ## 3. Install Paddle related Python libraries
 **You only need to install it when you use the `paddle_serving_client.convert` command or the `Python Pipeline framework`. **
 ```
 # CPU environment please execute
-pip3 install paddlepaddle==2.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+pip3 install paddlepaddle==2.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

-# GPU CUDA 10.2 environment please execute
-pip3 install paddlepaddle-gpu==2.2.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
+# GPU CUDA 11.2 environment please execute
+pip3 install paddlepaddle-gpu==2.3.0.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```
-**Note**: If your CUDA version is not 10.2 or if you want to use TensorRT(CUDA10.2 included), please do not execute the above commands directly, you need to refer to [Paddle-Inference official document-download and install the Linux prediction library](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) Select the URL link of the corresponding GPU environment and install it. Assuming that you use Python3.6, please follow the codeblock.
+**Note**: If you want to use other versions, please do not execute the above commands directly, you need to refer to [Paddle-Inference official document-download and install the Linux prediction library](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) Select the URL link of the corresponding GPU environment and install it. Assuming that you use Python3.6, please follow the codeblock.

 ```
-# CUDA10.1 + CUDNN7 + TensorRT6
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.2.post101-cp36-cp36m-linux_x86_64.whl
-
-# CUDA10.2 + CUDNN7 + TensorRT6, Attenton that the paddle whl for this env is same to that of CUDA10.1 + Cudnn7 + TensorRT6
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.2.post101-cp36-cp36m-linux_x86_64.whl
+# CUDA11.2 + CUDNN8 + TensorRT8 + Python(3.6-3.9)
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp36-cp36m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp37-cp37m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp38-cp38-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.3.0.post112-cp39-cp39-linux_x86_64.whl

-# CUDA10.2 + Cudnn8 + TensorRT7
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.2.2-cp36-cp36m-linux_x86_64.whl
+# CUDA10.1 + CUDNN7 + TensorRT6 + Python(3.6-3.9)
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp36-cp36m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp37-cp37m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp38-cp38-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.3.0.post101-cp39-cp39-linux_x86_64.whl

-# CUDA11.2 + CUDNN8 + TensorRT8
-pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-2.2.2.post112-cp36-cp36m-linux_x86_64.whl
+# CUDA10.2 + CUDNN8 + TensorRT7 + Python(3.6-3.9)
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp36-cp36m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp37-cp37m-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp38-cp38-linux_x86_64.whl
+pip3 install https://paddle-inference-lib.bj.bcebos.com/2.3.0/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-2.3.0-cp39-cp39-linux_x86_64.whl
 ```

 ## 4. Supported Docker Images and Instruction
@@ -118,11 +123,10 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x

 | Environment | Serving Development Image Tag | Operating System | Paddle Development Image Tag | Operating System |
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
-|  CPU                         | 0.8.0-devel                       |  Ubuntu 16.04   | 2.2.2                 | Ubuntu 18.04.       |
-|  CUDA10.1 + CUDNN7             | 0.8.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
-|  CUDA10.2 + CUDNN7             | 0.8.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
-|  CUDA10.2 + CUDNN8             | 0.8.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
-|  CUDA11.2 + CUDNN8             | 0.8.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.2-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  CPU                         | 0.9.0-devel                       |  Ubuntu 16.04   | 2.3.0                | Ubuntu 18.04.       |
+|  CUDA10.1 + CUDNN7           | 0.9.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                   | 无                 |
+|  CUDA10.2 + CUDNN8           | 0.9.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                   | Ubuntu 18.04   |
+|  CUDA11.2 + CUDNN8           | 0.9.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.3.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04   |

 For **Windows 10 users**, please refer to the document [Paddle Serving Guide for Windows Platform](Windows_Tutorial_CN.md).


--- a/doc/Install_Linux_Env_CN.md
+++ b/doc/Install_Linux_Env_CN.md
+# 原生系统标准环境安装
+
+本文介绍基于原生系统标准环境进行配置安装。
+
+<img src="images/2-2_Environment_CN_1.png">
+
+
+## CentOS 7 环境配置（第一步）
+
+**一.环境准备**
+
+* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
+
+**二.选择 CPU/GPU**
+
+* 如果您的计算机有 NVIDIA® GPU，请确保满足以下条件
+
+    * **CUDA 工具包：10.1/10.2 配合 cuDNN 7 (cuDNN 版本>=7.6.5) 或者 11.2 配合 cuDNN v8.1.1**
+    * **兼容版本的 TensorRT**
+    * **GPU运算能力超过3.5的硬件设备**
+
+        您可参考NVIDIA官方文档了解CUDA和CUDNN的安装流程和配置方法，请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)，[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)，[TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/index.html), [GPU算力](https://developer.nvidia.com/cuda-gpus)
+
+**三.安装必要工具**
+
+需要安装的依赖库及工具详见下表：
+
+|             组件             |             版本要求              |
+| :--------------------------: | :-------------------------------: |
+|         bzip2-devel          |          1.0.6 and later          |
+|              make              |     later     |
+|             gcc              |          8.2.0         |
+|           gcc-c++            |          8.2.0         |
+|            cmake             |          3.15.0 and later          |
+|              Go              |          1.17.2 and later          |
+|        openssl-devel         |              1.0.2k               |
+|           patchelf           |                0.9                |
+
+1. 更新系统源
+
+    更新`yum`的源：
+
+    ```
+    yum update
+    ```
+
+    并添加必要的yum源：
+
+    ```
+    yum install -y epel-release
+    ```
+
+2. 安装工具
+
+    `bzip2`以及`make`：
+
+    ```
+    yum install -y bzip2
+    ```
+
+    ```
+    yum install -y make
+    ```
+
+    cmake 需要3.15以上，建议使用3.16.0:
+
+    ```
+    wget -q https://cmake.org/files/v3.16/cmake-3.16.0-Linux-x86_64.tar.gz
+    ```
+
+    ```
+    tar -zxvf cmake-3.16.0-Linux-x86_64.tar.gz
+    ```
+
+    ```
+    rm cmake-3.16.0-Linux-x86_64.tar.gz
+    ```
+
+    ```
+    PATH=/home/cmake-3.16.0-Linux-x86_64/bin:$PATH
+    ```
+
+    gcc 需要5.4以上，建议使用8.2.0:
+
+    ```
+    wget -q https://paddle-docker-tar.bj.bcebos.com/home/users/tianshuo/bce-python-sdk-0.8.27/gcc-8.2.0.tar.xz && \
+    tar -xvf gcc-8.2.0.tar.xz && \
+    cd gcc-8.2.0 && \
+    sed -i 's#ftp://gcc.gnu.org/pub/gcc/infrastructure/#https://paddle-ci.gz.bcebos.com/#g' ./contrib/download_prerequisites && \
+    unset LIBRARY_PATH CPATH C_INCLUDE_PATH PKG_CONFIG_PATH CPLUS_INCLUDE_PATH INCLUDE && \
+    ./contrib/download_prerequisites && \
+    cd .. && mkdir temp_gcc82 && cd temp_gcc82 && \
+    ../gcc-8.2.0/configure --prefix=/usr/local/gcc-8.2 --enable-threads=posix --disable-checking --disable-multilib && \
+    make -j8 && make install
+    ```
+
+3. 安装GOLANG
+
+    建议使用 go1.17.2:
+
+    ```
+    wget -qO- https://go.dev/dl/go1.17.2.linux-amd64.tar.gz | \
+    tar -xz -C /usr/local && \
+    mkdir /root/go && \
+    mkdir /root/go/bin && \
+    mkdir /root/go/src && \
+    echo "GOROOT=/usr/local/go" >> /root/.bashrc && \
+    echo "GOPATH=/root/go" >> /root/.bashrc && \
+    echo "PATH=/usr/local/go/bin:/root/go/bin:$PATH" >> /root/.bashrc
+    source /root/.bashrc
+    ```
+  
+4. 安装依赖库
+
+    安装相关依赖库 patchelf：
+
+    ```
+    yum install patchelf
+    ```
+
+    配置 ssl 依赖库
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && \
+    tar xf centos_ssl.tar && rm -rf centos_ssl.tar && \
+    mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && \
+    ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && \
+    ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && \
+    ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && \
+    ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
+    ```
+
+## Ubuntu 16.04/18.04 环境配置（第一步）
+
+**一.环境准备**
+
+* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
+
+**二.选择 CPU/GPU**
+
+* 如果您的计算机有 NVIDIA® GPU，请确保满足以下条件
+
+    * **CUDA 工具包 10.1/10.2 配合 cuDNN 7 (cuDNN 版本>=7.6.5)**
+    * **CUDA 工具包 11.2 配合 cuDNN v8.1.1**
+    * **配套版本的 TensorRT**
+    * **GPU运算能力超过3.5的硬件设备**
+
+        您可参考NVIDIA官方文档了解CUDA和CUDNN的安装流程和配置方法，请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)，[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)，[TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/index.html)
+
+**三.安装必要工具**
+
+1. 更新系统源
+
+    更新`apt`的源：
+
+    ```
+    apt update
+    ```
+
+2. 安装工具
+
+    `bzip2`以及`make`：
+
+    ```
+    apt install -y bzip2
+    ```
+    ```
+    apt install -y make
+    ```
+
+    cmake 需要3.15以上，建议使用3.16.0:
+
+    ```
+    wget -q https://cmake.org/files/v3.16/cmake-3.16.0-Linux-x86_64.tar.gz
+    ```
+
+    ```
+    tar -zxvf cmake-3.16.0-Linux-x86_64.tar.gz
+    ```
+
+    ```
+    rm cmake-3.16.0-Linux-x86_64.tar.gz
+    ```
+
+    ```
+    PATH=/home/cmake-3.16.0-Linux-x86_64/bin:$PATH
+    ```
+
+    gcc 需要5.4以上，建议使用8.2.0:
+
+    ```
+    wget -q https://paddle-docker-tar.bj.bcebos.com/home/users/tianshuo/bce-python-sdk-0.8.27/gcc-8.2.0.tar.xz && \
+    tar -xvf gcc-8.2.0.tar.xz && \
+    cd gcc-8.2.0 && \
+    sed -i 's#ftp://gcc.gnu.org/pub/gcc/infrastructure/#https://paddle-ci.gz.bcebos.com/#g' ./contrib/download_prerequisites && \
+    unset LIBRARY_PATH CPATH C_INCLUDE_PATH PKG_CONFIG_PATH CPLUS_INCLUDE_PATH INCLUDE && \
+    ./contrib/download_prerequisites && \
+    cd .. && mkdir temp_gcc82 && cd temp_gcc82 && \
+    ../gcc-8.2.0/configure --prefix=/usr/local/gcc-8.2 --enable-threads=posix --disable-checking --disable-multilib && \
+    make -j8 && make install
+    ```
+
+3. 安装GOLANG
+
+    建议使用 go1.17.2:
+
+    ```
+    wget -qO- https://go.dev/dl/go1.17.2.linux-amd64.tar.gz | \
+    tar -xz -C /usr/local && \
+    mkdir /root/go && \
+    mkdir /root/go/bin && \
+    mkdir /root/go/src && \
+    echo "GOROOT=/usr/local/go" >> /root/.bashrc && \
+    echo "GOPATH=/root/go" >> /root/.bashrc && \
+    echo "PATH=/usr/local/go/bin:/root/go/bin:$PATH" >> /root/.bashrc
+    source /root/.bashrc
+    ```
+  
+4. 安装依赖库
+
+    安装相关依赖库 patchelf：
+
+    ```
+    apt-get install patchelf
+    ```
+
+    配置 ssl 依赖库
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && \
+    tar xf centos_ssl.tar && rm -rf centos_ssl.tar && \
+    mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && \
+    ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && \
+    ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && \
+    ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && \
+    ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
+    ```
+
+## Windows 环境配置（第一步）
+
+由于受限第三方库的支持，Windows平台目前只支持用web service的方式搭建local predictor预测服务。
+
+**一.环境准备**
+
+* **Python 版本 3.6/3.7/3.8/3.9 (64 bit)**
+
+**二.选择 CPU/GPU**
+
+* 如果您的计算机有 NVIDIA® GPU，请确保满足以下条件
+
+    * **CUDA 工具包 10.1/10.2 配合 cuDNN 7 (cuDNN 版本>=7.6.5)**
+    * **CUDA 工具包 11.2 配合 cuDNN v8.1.1**
+    * **配套版本的 TensorRT**
+    * **GPU运算能力超过3.5的硬件设备**
+
+        您可参考NVIDIA官方文档了解CUDA和CUDNN的安装流程和配置方法，请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)，[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)，[TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/index.html)
+
+**三.安装必要工具**
+
+1. 更新 wget 工具
+
+    在链接[下载wget](http://gnuwin32.sourceforge.net/packages/wget.htm)，解压后复制到`C:\Windows\System32`下，如有安全提示需要通过。
+
+2. 安装git工具
+
+    详情参见[Git官网](https://git-scm.com/downloads)
+
+3. 安装必要的C++库（可选）
+
+    部分用户可能会在`import paddle`阶段遇见dll无法链接的问题，建议[安装Visual Studio社区版本](https://visualstudio.microsoft.com/) ，并且安装C++的相关组件。
+
+## 使用 pip 安装（第二步）
+
+**一. 安装服务 whl 包**
+
+   服务 whl 包包括： client、app、server，其中 Server 分为 CPU 和 GPU，GPU 包根据您的环境选择一种安装
+
+   ```
+   pip3 install paddle-serving-client==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+   pip3 install paddle-serving-app==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+   
+   # CPU Server
+   pip3 install paddle-serving-server==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
+   
+   # GPU Server，需要确认环境再选择执行哪一条，推荐使用CUDA 10.2的包
+   # CUDA10.2 + Cudnn7 + TensorRT6（推荐）
+   pip3 install paddle-serving-server-gpu==0.8.3.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple 
+   # CUDA10.1 + TensorRT6
+   pip3 install paddle-serving-server-gpu==0.8.3.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
+   # CUDA11.2 + TensorRT8
+   pip3 install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
+   ```
+
+   默认开启国内清华镜像源来加速下载，如果您使用 HTTP 代理可以关闭(`-i https://pypi.tuna.tsinghua.edu.cn/simple`)
+
+**二. 安装 Paddle 相关 Python 库**
+   **当您使用`paddle_serving_client.convert`命令或者`Python Pipeline 框架`时才需要安装。**
+   ```
+   # CPU 环境请执行
+   pip3 install paddlepaddle==2.2.2
+
+   # GPU CUDA 10.2环境请执行
+   pip3 install paddlepaddle-gpu==2.2.2
+   ```
+   **注意**： 如果您的 Cuda 版本不是10.2，或者您需要在 GPU 环境上使用 TensorRT，请勿直接执行上述命令，需要参考[Paddle-Inference官方文档-下载安装Linux预测库](https:/paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)选择相应的 GPU 环境的 url 链接并进行安装。
+
+**三. 安装完成后的环境检查**
+   当以上步骤均完成后可使用命令行运行环境检查功能，自动运行 Paddle Serving 相关示例，进行环境相关配置校验。
+
+   ```
+   python3 -m paddle_serving_server.serve check
+   # 以下输出表明环境检查正常
+   (Cmd) check_all
+   PaddlePaddle inference environment running success
+   C++ cpu environment running success
+   C++ gpu environment running success
+   Pipeline cpu environment running success
+   Pipeline gpu environment running success
+   ```
+
+   详情请参考[环境检查文档](./Check_Env_CN.md)
\ No newline at end of file
--- a/doc/Java_SDK_CN.md
+++ b/doc/Java_SDK_CN.md
@@ -17,7 +17,7 @@ Paddle Serving 提供了 Java SDK，支持 Client 端用 Java 语言进行预测

 | Paddle Serving Server version | Java SDK version |
 | :---------------------------: | :--------------: |
-|             0.8.0             |      0.0.1       |
+|             0.9.0             |      0.0.1       |

 1.    直接使用提供的Java SDK作为Client进行预测
 ### 安装

--- a/doc/Java_SDK_EN.md
+++ b/doc/Java_SDK_EN.md
@@ -18,7 +18,7 @@ The following table shows compatibilities between Paddle Serving Server and Java

 | Paddle Serving Server version | Java SDK version |
 | :---------------------------: | :--------------: |
-|             0.8.0             |      0.0.1       |
+|             0.9.0             |      0.0.1       |

 1.    Directly use the provided Java SDK as the client for prediction
 ### Install Java SDK
@@ -42,6 +42,4 @@ mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId

 2.    Use it after compiling from the source code. See the [document](../java/README.md).

-
 3.    examples for using the java client, see the See the [document](../java/README.md).
-
--- a/doc/Latest_Packages_CN.md
+++ b/doc/Latest_Packages_CN.md
@@ -8,13 +8,13 @@

 |                           | develop whl                                                                                                                                                              | develop bin                                                                                                                             | stable whl                                                                                                                                                               | stable bin                                                                                                                              |
 |---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
-| cpu-avx-mkl               | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz)                  | [paddle_serving_server-0.8.3-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.8.3.tar.gz)                  |
-| cpu-avx-openblas          | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz)        | [paddle_serving_server-0.8.3-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.8.3.tar.gz)        |
-| cpu-noavx-openblas        | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [ serving-cpu-noavx-openblas-0.0.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.8.3-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl)                          | [serving-cpu-noavx-openblas-0.8.3.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.8.3.tar.gz) |
-| cuda10.1-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl)  | [serving-gpu-101-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.8.3.tar.gz)                          |
-| cuda10.2-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl)  | [serving-gpu-102-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.8.3.tar.gz)                          |
-| cuda10.2-cudnn8-TensorRT7 | [paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz )                     | [paddle_serving_server_gpu-0.8.3.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl) | [serving-gpu-1028-0.8.3.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.8.3.tar.gz )                     |
-| cuda11.2-cudnn8-TensorRT8 | [paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.0.0.tar.gz )                       | [paddle_serving_server_gpu-0.8.3.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post112-py3-none-any.whl)   | [serving-gpu-112-0.8.3.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.8.3.tar.gz )                       |
+| cpu-avx-mkl               | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz)                  | [paddle_serving_server-0.9.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.9.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.9.0.tar.gz)                  |
+| cpu-avx-openblas          | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz)        | [paddle_serving_server-0.9.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.9.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.9.0.tar.gz)        |
+| cpu-noavx-openblas        | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [ serving-cpu-noavx-openblas-0.0.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.9.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.9.0-py3-none-any.whl)                          | [serving-cpu-noavx-openblas-0.9.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.9.0.tar.gz) |
+| cuda10.1-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.9.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.9.0.tar.gz)                          |
+| cuda10.2-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.9.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.9.0.tar.gz)                          |
+| cuda10.2-cudnn8-TensorRT7 | [paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz )                     | [paddle_serving_server_gpu-0.9.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post102-py3-none-any.whl) | [serving-gpu-1028-0.9.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.9.0.tar.gz )                     |
+| cuda11.2-cudnn8-TensorRT8 | [paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.0.0.tar.gz )                       | [paddle_serving_server_gpu-0.9.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post112-py3-none-any.whl)   | [serving-gpu-112-0.9.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.9.0.tar.gz )                       |

 ### 二进制包（Binary Package）
 大多数用户不会用到此章节。但是如果你在无网络的环境下部署Paddle Serving，在首次启动Serving时，无法下载二进制tar文件。因此，提供多种环境二进制包的下载链接，下载后传到无网络环境的指定目录下，即可使用。
@@ -29,16 +29,16 @@

 |  | develop whl                                                                                                                                      | stable whl                                                                                                                                        |
 |-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
-| Python3.6             | [paddle_serving_client-0.0.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl) | [paddle_serving_client-0.8.3-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp36-none-any.whl)  |
-| Python3.7             | [paddle_serving_client-0.0.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl) | [paddle_serving_client-0.8.3-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp37-none-any.whl)  |
-| Python3.8             | [paddle_serving_client-0.0.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl) | [paddle_serving_client-0.8.3-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp38-none-any.whl)  |
-| Python3.9             | [paddle_serving_client-0.0.0-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp39-none-any.whl) | [paddle_serving_client-0.8.3-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp39-none-any.whl)  |
+| Python3.6             | [paddle_serving_client-0.0.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl) | [paddle_serving_client-0.9.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp36-none-any.whl)  |
+| Python3.7             | [paddle_serving_client-0.0.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl) | [paddle_serving_client-0.9.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp37-none-any.whl)  |
+| Python3.8             | [paddle_serving_client-0.0.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl) | [paddle_serving_client-0.9.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp38-none-any.whl)  |
+| Python3.9             | [paddle_serving_client-0.0.0-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp39-none-any.whl) | [paddle_serving_client-0.9.0-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp39-none-any.whl)  |

 ## paddle-serving-app Wheel包

 |         | develop whl                                                                                                                              | stable whl                                                                                                                                  |
 |---------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
-| Python3 | [paddle_serving_app-0.0.0-py3-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl) | [ paddle_serving_app-0.8.3-py3-none-any.whl ]( https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.8.3-py3-none-any.whl) |
+| Python3 | [paddle_serving_app-0.0.0-py3-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl) | [ paddle_serving_app-0.9.0-py3-none-any.whl ]( https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.9.0-py3-none-any.whl) |


 ## 百度昆仑芯片
@@ -62,7 +62,7 @@ https://paddle-serving.bj.bcebos.com/bin/serving-xpu-aarch64-0.0.0.tar.gz
 
 适用于x86 CPU环境的昆仑Wheel包：
 ``` 
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_xpu-0.8.3.post2-py3-none-any.whl
+https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_xpu-0.9.0.post2-py3-none-any.whl

 ```


--- a/doc/Latest_Packages_EN.md
+++ b/doc/Latest_Packages_EN.md
@@ -8,13 +8,13 @@ Check the following table, and copy the address of hyperlink then run `pip3 inst

 |                           | develop whl                                                                                                                                                              | develop bin                                                                                                                             | stable whl                                                                                                                                                               | stable bin                                                                                                                              |
 |---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
-| cpu-avx-mkl               | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz)                  | [paddle_serving_server-0.8.3-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.8.3.tar.gz)                  |
-| cpu-avx-openblas          | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz)        | [paddle_serving_server-0.8.3-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.8.3.tar.gz)        |
-| cpu-noavx-openblas        | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [ serving-cpu-noavx-openblas-0.0.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.8.3-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.8.3-py3-none-any.whl)                          | [serving-cpu-noavx-openblas-0.8.3.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.8.3.tar.gz) |
-| cuda10.1-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post101-py3-none-any.whl)  | [serving-gpu-101-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.8.3.tar.gz)                          |
-| cuda10.2-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl)  | [serving-gpu-102-0.8.3.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.8.3.tar.gz)                          |
-| cuda10.2-cudnn8-TensorRT7 | [paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz )                     | [paddle_serving_server_gpu-0.8.3.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post102-py3-none-any.whl) | [serving-gpu-1028-0.8.3.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.8.3.tar.gz )                     |
-| cuda11.2-cudnn8-TensorRT8 | [paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.0.0.tar.gz )                       | [paddle_serving_server_gpu-0.8.3.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.8.3.post112-py3-none-any.whl)   | [serving-gpu-112-0.8.3.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.8.3.tar.gz )                       |
+| cpu-avx-mkl               | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz)                  | [paddle_serving_server-0.9.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.9.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.9.0.tar.gz)                  |
+| cpu-avx-openblas          | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz)        | [paddle_serving_server-0.9.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.9.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.9.0.tar.gz)        |
+| cpu-noavx-openblas        | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-noavx-openblas-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.9.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.9.0-py3-none-any.whl)                          | [serving-cpu-noavx-openblas-0.9.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.9.0.tar.gz) |
+| cuda10.1-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.9.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.9.0.tar.gz)                          |
+| cuda10.2-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.9.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.9.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.9.0.tar.gz)                          |
+| cuda10.2-cudnn8-TensorRT7 | [paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [serving-gpu-1028-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz)                     | [paddle_serving_server_gpu-0.9.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post102-py3-none-any.whl) | [serving-gpu-1028-0.9.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.9.0.tar.gz )                     |
+| cuda11.2-cudnn8-TensorRT8 | [paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl)  | [serving-gpu-112-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.0.0.tar.gz)                       | [paddle_serving_server_gpu-0.9.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.9.0.post112-py3-none-any.whl)   | [serving-gpu-112-0.9.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.9.0.tar.gz )                       |

 ### Binary Package
 for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
@@ -29,15 +29,15 @@ for most users, we do not need to read this section. But if you deploy your Padd

 |  | develop whl                                                                                                                                      | stable whl                                                                                                                                        |
 |-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
-| Python3.6             | [paddle_serving_client-0.0.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl) | [paddle_serving_client-0.8.3-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp36-none-any.whl)  |
-| Python3.7             | [paddle_serving_client-0.0.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl) | [paddle_serving_client-0.8.3-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp37-none-any.whl)  |
-| Python3.8             | [paddle_serving_client-0.0.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl) | [paddle_serving_client-0.8.3-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp38-none-any.whl)  |
-| Python3.9             | [paddle_serving_client-0.0.0-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp39-none-any.whl) | [paddle_serving_client-0.8.3-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.8.3-cp38-none-any.whl)  |
+| Python3.6             | [paddle_serving_client-0.0.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl) | [paddle_serving_client-0.9.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp36-none-any.whl)  |
+| Python3.7             | [paddle_serving_client-0.0.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl) | [paddle_serving_client-0.9.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp37-none-any.whl)  |
+| Python3.8             | [paddle_serving_client-0.0.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl) | [paddle_serving_client-0.9.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp38-none-any.whl)  |
+| Python3.9             | [paddle_serving_client-0.0.0-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp39-none-any.whl) | [paddle_serving_client-0.9.0-cp39-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.9.0-cp38-none-any.whl)  |
 ## paddle-serving-app

 |         | develop whl                                                                                                                              | stable whl                                                                                                                                  |
 |---------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
-| Python3 | [paddle_serving_app-0.0.0-py3-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl) | [ paddle_serving_app-0.8.3-py3-none-any.whl ]( https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.8.3-py3-none-any.whl) |
+| Python3 | [paddle_serving_app-0.0.0-py3-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl) | [ paddle_serving_app-0.9.0-py3-none-any.whl ]( https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.9.0-py3-none-any.whl) |


 ## Baidu Kunlun user
@@ -61,7 +61,7 @@ https://paddle-serving.bj.bcebos.com/bin/serving-xpu-aarch64-0.0.0.tar.gz
 
 for x86 kunlun user
 ``` 
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_xpu-0.8.3.post2-py3-none-any.whl
+https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_xpu-0.9.0.post2-py3-none-any.whl

 ```


--- a/doc/Model_Zoo_CN.md
+++ b/doc/Model_Zoo_CN.md
--- a/doc/Python_Pipeline/Performance_Tuning_CN.md
+++ b/doc/Python_Pipeline/Performance_Tuning_CN.md
-# Pipeline Serving 性能优化
-
-([English](./Performance_Tuning_EN.md)|简体中文）
-
-## 1. 性能分析与优化
-
-
-### 1.1 如何通过 Timeline 工具进行优化
-
-为了更好地对性能进行优化，PipelineServing 提供了 Timeline 工具，对整个服务的各个阶段时间进行打点。
-
-### 1.2 在 Server 端输出 Profile 信息
-
-Server 端用 yaml 中的 `use_profile` 字段进行控制：
-
-```yaml
-dag:
-    use_profile: true
-```
-
-开启该功能后，Server 端在预测的过程中会将对应的日志信息打印到标准输出，为了更直观地展现各阶段的耗时，提供 Analyst 模块对日志文件做进一步的分析处理。
-
-使用时先将 Server 的输出保存到文件，以 `profile.txt` 为例，脚本将日志中的时间打点信息转换成 json 格式保存到 `trace` 文件，`trace` 文件可以通过 chrome 浏览器的 tracing 功能进行可视化。
-
-```python
-from paddle_serving_server.pipeline import Analyst
-import json
-import sys
-
-if __name__ == "__main__":
-    log_filename = "profile.txt"
-    trace_filename = "trace"
-    analyst = Analyst(log_filename)
-    analyst.save_trace(trace_filename)
-```
-
-具体操作：打开 chrome 浏览器，在地址栏输入 `chrome://tracing/` ，跳转至 tracing 页面，点击 load 按钮，打开保存的 `trace` 文件，即可将预测服务的各阶段时间信息可视化。
-
-### 1.3 在 Client 端输出 Profile 信息
-
-Client 端在 `predict` 接口设置 `profile=True`，即可开启 Profile 功能。
-
-开启该功能后，Client 端在预测的过程中会将该次预测对应的日志信息打印到标准输出，后续分析处理同 Server。
-
-### 1.4 分析方法
-根据pipeline.tracer日志中的各个阶段耗时，按以下公式逐步分析出主要耗时在哪个阶段。
-```
-单OP耗时：
-op_cost = process(pre + mid + post) 
-
-OP期望并发数：
-op_concurrency  = 单OP耗时(s) * 期望QPS
-
-服务吞吐量：
-service_throughput = 1 / 最慢OP的耗时 * 并发数
-
-服务平响：
-service_avg_cost = ∑op_concurrency 【关键路径】
-
-Channel堆积：
-channel_acc_size = QPS(down - up) * time
-
-批量预测平均耗时：
-avg_batch_cost = (N * pre + mid + post) / N 
-```
-
-### 1.5 优化思路
-根据长耗时在不同阶段，采用不同的优化方法.
- OP推理阶段(mid-process):
-  - 增加OP并发度
-  - 开启auto-batching(前提是多个请求的shape一致)
-  - 若批量数据中某条数据的shape很大，padding很大导致推理很慢，可使用mini-batch
-  - 开启TensorRT/MKL-DNN优化
-  - 开启低精度推理
- OP前处理阶段(pre-process):
-  - 增加OP并发度
-  - 优化前处理逻辑
- in/out耗时长（channel堆积>5）
-  - 检查channel传递的数据大小和延迟
-  - 优化传入数据，不传递数据或压缩后再传入
-  - 增加OP并发度
-  - 减少上游OP并发度
--- a/doc/Python_Pipeline/Performance_Tuning_EN.md
+++ b/doc/Python_Pipeline/Performance_Tuning_EN.md
-# Pipeline Serving Performance Optimization
-
-(English|[简体中文](./Performance_Tuning_CN.md))
-
-
-## 1. Performance analysis and optimization
-
-
-### 1.1 How to optimize with the timeline tool
-
-In order to better optimize the performance, PipelineServing provides a timeline tool to monitor the time of each stage of the whole service.
-
-### 1.2 Output profile information on server side
-
-The server is controlled by the `use_profile` field in yaml:
-
-```yaml
-dag:
-    use_profile: true
-```
-
-After the function is enabled, the server will print the corresponding log information to the standard output in the process of prediction. In order to show the time consumption of each stage more intuitively, Analyst module is provided for further analysis and processing of log files.
-
-The output of the server is first saved to a file. Taking `profile.txt` as an example, the script converts the time monitoring information in the log into JSON format and saves it to the `trace` file. The `trace` file can be visualized through the tracing function of Chrome browser.
-
-```shell
-from paddle_serving_server.pipeline import Analyst
-import json
-import sys
-
-if __name__ == "__main__":
-    log_filename = "profile.txt"
-    trace_filename = "trace"
-    analyst = Analyst(log_filename)
-    analyst.save_trace(trace_filename)
-```
-
-Specific operation: open Chrome browser, input in the address bar `chrome://tracing/` , jump to the tracing page, click the load button, open the saved `trace` file, and then visualize the time information of each stage of the prediction service.
-
-### 1.3 Output profile information on client side
-
-The profile function can be enabled by setting `profile=True` in the `predict` interface on the client side.
-
-After the function is enabled, the client will print the log information corresponding to the prediction to the standard output during the prediction process, and the subsequent analysis and processing are the same as that of the server.
-
-### 1.4 Analytical methods
-According to the time consumption of each stage in the pipeline.tracer log, the following formula is used to gradually analyze which stage is the main time consumption.
-
-```
-cost of one single OP：
-op_cost = process(pre + mid + post) 
-
-OP Concurrency: 
-op_concurrency = op_cost(s) * qps_expected
-
-Service throughput：
-service_throughput = 1 / slowest_op_cost * op_concurrency
-
-Service average cost：
-service_avg_cost = ∑op_concurrency in critical Path
-
-Channel accumulations：
-channel_acc_size = QPS(down - up) * time
-
-Average cost of batch predictor：
-avg_batch_cost = (N * pre + mid + post) / N 
-```
-
-### 1.5 Optimization ideas
-According to the long time consuming in stages below, different optimization methods are adopted. 
- OP Inference stage(mid-process):
-  - Increase `concurrency`
-  - Turn on `auto-batching`（Ensure that the shapes of multiple requests are consistent）
-  - Use `mini-batch`, If the shape of data is very large.
-  - Turn on TensorRT for GPU
-  - Turn on MKLDNN for CPU
-  - Turn on low precison inference
- OP preprocess or postprocess stage:
-  - Increase `concurrency`
-  - Optimize processing logic
- In/Out stage(channel accumulation > 5):
-  - Check the size and delay of the data passed by the channel
-  - Optimize the channel to transmit data, do not transmit data or compress it before passing it in
-  - Increase `concurrency`
-  - Decrease `concurrency` upstreams.
--- a/doc/Python_Pipeline/Benchmark_CN.md
+++ b/doc/Python_Pipeline/Benchmark_CN.md
-本次提测的Serving版本，支持GPU预测，希望以此任务为例，对Paddle Serving支持GPU预测的性能给出测试数据。
+# Python Pipeline 性能测试

-## 1. 测试环境说明
+- [测试环境](#1)
+- [性能指标与结论](#2)

+<a name="1"></a>
+
+## 测试环境
+
+测试环境如下表所示：
 |          | GPU | 显存 | CPU | 内存 |
 |----------|---------|----------|----------------------------------------------|------|
 | Serving端 | 4x Tesla P4-8GB | 7611MiB | Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz 48核 | 216G |
@@ -10,7 +16,15 @@
 使用单卡GPU，未开启TensorRT。
 模型：ResNet_v2_50

-## 2. PaddleServing-PipeLine(python)
+<a name="2"></a>
+
+## 性能指标与结论
+
+通过测试，使用 Python Pipeline 模式通过多进程并发，充分利用 GPU 显卡，具有较好的吞吐性能。
+
+
+测试数据如下：
+
 |model_name |thread_num |batch_size |CPU_util(%) |GPU_memory(mb) |GPU_util(%) |qps(samples/s) |total count |mean(ms) |median(ms) |80 percent(ms) |90 percent(ms) |99 percent(ms) |total cost(s) |each cost(s)|
 |:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--|:--
 |ResNet_v2_50 |1 |1 |2.2 |3327 |17.25 |17.633658869240787 |355 |56.428481238996476 |38.646728515625 |39.496826171875 |39.98369140625 |1273.1911083984373 |20.131953477859497 |20.033540725708008|

--- a/doc/Python_Pipeline/Pipeline_Design_CN.md
+++ b/doc/Python_Pipeline/Pipeline_Design_CN.md
--- a/doc/Python_Pipeline/Pipeline_Design_EN.md
+++ b/doc/Python_Pipeline/Pipeline_Design_EN.md
--- a/doc/Python_Pipeline/Pipeline_Features_CN.md
+++ b/doc/Python_Pipeline/Pipeline_Features_CN.md
--- a/doc/Python_Pipeline/Pipeline_Int_CN.md
+++ b/doc/Python_Pipeline/Pipeline_Int_CN.md
+# Python Pipeline 框架
+
+在许多深度学习框架中，模型服务化部署通常用于单模型的一键部署。但在 AI 工业大生产的背景下，端到端的单一深度学习模型不能解决复杂问题，多个深度学习模型组合使用是解决现实复杂问题的常规手段，如文字识别 OCR 服务至少需要检测和识别2种模型；视频理解服务一般需要视频抽帧、切词、音频处理、分类等多种模型组合实现。当前，通用多模型组合服务的设计和实现是非常复杂的，既要能实现复杂的模型拓扑关系，又要保证服务的高并发、高可用和易于开发和维护等。
+
+Paddle Serving 实现了一套通用的多模型组合服务编程框架 Python Pipeline，不仅解决上述痛点，同时还能大幅提高 GPU 利用率，并易于开发和维护。
+
+Python Pipeline 使用案例请阅读[Python Pipeline 快速部署案例](../Quick_Start_CN.md)
+
+通过阅读以下内容掌握 Python Pipeline 核心功能和使用方法、高阶功能用法和性能优化指南等。
+- [Python Pipeline 框架设计](./Pipeline_Design_CN.md)
+- [Python Pipeline 核心功能](./Pipeline_Features_CN.md)
+- [Python Pipeline 优化指南](./Pipeline_Optimize_CN.md)
+- [Python Pipeline 性能指标](./Pipeline_Benchmark_CN.md)
--- a/doc/Python_Pipeline/Pipeline_Optimize_CN.md
+++ b/doc/Python_Pipeline/Pipeline_Optimize_CN.md
--- a/doc/Run_On_Kubernetes_CN.md
+++ b/doc/Run_On_Kubernetes_CN.md
--- a/doc/Save_CN.md
+++ b/doc/Save_CN.md
--- a/doc/Serving_Auth_Docker_CN.md
+++ b/doc/Serving_Auth_Docker_CN.md
@@ -8,13 +8,13 @@
 - 这个服务接口不够安全，需要做相应的鉴权。
 - 这个服务接口不能够控制流量，无法合理利用资源。

-本文档的作用，就以 Uci 房价预测服务为例，来介绍如何强化预测服务API接口安全。API网关作为流量入口，对接口进行统一管理。但API网关可以提供流量加密和鉴权等安全功能。
+本文档的作用，就以 Uci 房价预测服务为例，来介绍如何强化预测服务 API 接口安全。API 网关作为流量入口，对接口进行统一管理。但 API 网关可以提供流量加密和鉴权等安全功能。

 ## Docker部署

-可以使用docker-compose来部署安全网关。这个示例的步骤就是 [部署本地Serving容器] - [部署本地安全网关] - [通过安全网关访问Serving]
+可以使用 docker-compose 来部署安全网关。这个示例的步骤就是 [部署本地Serving容器] - [部署本地安全网关] - [通过安全网关访问Serving]

-**注明：** docker-compose与docker不一样，它依赖于docker，一次可以部署多个docker容器，可以类比于本地版的kubenetes，docker-compose的教程请参考[docker-compose安装](https://docs.docker.com/compose/install/) 
+**注明：** docker-compose 与 docker 不一样，它依赖于 docker，一次可以部署多个 docker 容器，可以类比于本地版的 kubenetes，docker-compose 的教程请参考[docker-compose安装](https://docs.docker.com/compose/install/) 

 ```shell
 docker-compose -f tools/auth/auth-serving-docker.yaml up -d
@@ -30,50 +30,49 @@ ee59a3dd4806        registry.baidubce.com/serving_dev/serving-runtime:cpu-py36
 665fd8a34e15        redis:latest                                                                    "docker-entrypoint.s…"   About an hour ago   Up About an hour             0.0.0.0:6379->6379/tcp                                                                               anquan_redis_1 
 ```

-其中我们之前serving容器 以 9393端口暴露，KONG网关的端口是8443， KONG的Web控制台的端口是8001。接下来我们在浏览器访问 `https://$IP_ADDR:8005`, 其中 IP_ADDR就是宿主机的IP。
->> **注意**: 第一次登录的时候可能需要输入 Name : admin 以及 Kong Admin URL : http://kong:8001
-<img src="images/kong-dashboard.png">
-可以看到在注册结束后，登陆，看到了 DASHBOARD，我们先看SERVICES，可以看到`serving_service`，这意味着我们端口在9393的Serving服务已经在KONG当中被注册。
+其中我们之前 serving 容器 以 9393 端口暴露，KONG 网关的端口是 8443， KONG 的 Web 控制台的端口是 8001。接下来我们在浏览器访问 `https://$IP_ADDR:8001`, 其中 IP_ADDR 就是宿主机的 IP 。

-<img src="images/kong-services.png">
-<img src="images/kong-routes.png">
+<img src="../images/kong-dashboard.png">
+可以看到在注册结束后，登陆，看到了 DASHBOARD，我们先看 SERVICES，可以看到 `serving_service`，这意味着我们端口在 9393 的 Serving 服务已经在 KONG 当中被注册。

-然后在ROUTES中，我们可以看到 serving 被链接到了 `/serving-uci`。
+<img src="../images/kong-services.png">
+<img src="../images/kong-routes.png">

-最后我们点击 CONSUMERS - default_user - Credentials - API KEYS ，我们可以看到 `Api Keys` 下看到很多key
+然后在 ROUTES 中，我们可以看到 serving 被链接到了 `/serving-uci`。

-<img src="images/kong-api_keys.png">
+最后我们点击 CONSUMERS - default_user - Credentials - API KEYS ，我们可以看到 `Api Keys` 下看到很多 key

-接下来可以通过curl访问
+<img src="../images/kong-api_keys.png">
+
+接下来可以通过 curl 访问

 ```shell
 curl -H "Content-Type:application/json" -H "X-INSTANCE-ID:kong_ins" -H "apikey:hP6v25BQVS5CcS1nqKpxdrFkUxze9JWD" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' https://127.0.0.1:8443/serving-uci/uci/prediction -k
 ```

-与之前的Serving HTTP服务相比，有以下区别。
+与之前的 Serving HTTP 服务相比，有以下区别。

- 使用https加密访问，而不是http
- 使用serving_uci的路径映射到网关
- 在header处增加了 `X-INSTANCE-ID`和`apikey`
+- 使用 https 加密访问，而不是 http
+- 使用 serving_uci 的路径映射到网关
+- 在 header 处增加了 `X-INSTANCE-ID` 和 `apikey`


-## K8S部署
+## K8S 部署

-同样，我们也提供了K8S集群部署Serving安全网关的方式。
+同样，我们也提供了 K8S 集群部署 Serving 安全网关的方式。

-### Step 1：启动Serving服务
+**一. 启动 Serving 服务**

-我们仍然以 [Uci房价预测](../examples/C++/fit_a_line/)服务作为例子，这里省略了镜像制作的过程，详情可以参考 [在Kubernetes集群上部署Paddle Serving](./Run_On_Kubernetes_CN.md)。
+我们仍然以 [Uci房价预测](../examples/C++/fit_a_line/)服务作为例子，这里省略了镜像制作的过程，详情可以参考 [在 Kubernetes 集群上部署Paddle Serving](./Run_On_Kubernetes_CN.md)。

 在这里我们直接执行 
 ```
 kubectl apply -f tools/auth/serving-demo-k8s.yaml
 ```

-可以看到

-### Step 2: 安装 KONG (一个集群只需要执行一次就可以)
-接下来我们执行KONG Ingress的安装
+**二.  安装 KONG (一个集群只需要执行一次就可以)**
+接下来我们执行 KONG Ingress 的安装
 ```
 kubectl apply -f tools/auth/kong-install.yaml
 ```
@@ -106,15 +105,15 @@ kong          kong-validation-webhook   ClusterIP   172.16.114.93    <none>

 ```

-### Step 3: 创建Ingress资源
+**三. 创建 Ingress 资源**

-接下来需要做Serving服务和KONG的链接
+接下来需要做 Serving 服务和 KONG 的链接

 ```
 kubectl apply -f tools/auth/kong-ingress-k8s.yaml
 ```

-我们也给出yaml文件内容
+我们也给出 yaml 文件内容
 ```
 apiVersion: extensions/v1beta1
 kind: Ingress
@@ -132,22 +131,22 @@ spec:
          serviceName: {{SERVING_SERVICE_NAME}}
          servicePort: {{SERVICE_PORT}}
 ```
-其中serviceName就是uci，servicePort就是9393，如果是别的服务就需要改这两个字段，最终会映射到`/foo`下。
+其中 serviceName 就是 uci，servicePort 就是 9393，如果是别的服务就需要改这两个字段，最终会映射到`/foo`下。
 在这一步之后，我们就可以 
 ```
 curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://$IP:$PORT/foo/uci/prediction
 ```

-### Step 4: 增加安全网关限制
+**四. 增加安全网关限制**

-之前的接口没有鉴权功能，无法验证用户身份合法性，现在我们添加一个key-auth插件
+之前的接口没有鉴权功能，无法验证用户身份合法性，现在我们添加一个 key-auth 插件

 执行
 ```
 kubectl apply -f key-auth-k8s.yaml
 ```

-其中,yaml文内容为
+其中，yaml 文内容为
 ```
 apiVersion: configuration.konghq.com/v1
 kind: KongPlugin
@@ -156,7 +155,7 @@ metadata:
 plugin: key-auth
 ```

-现在，需要创建secret，key值为用户指定，需要在请求时携带Header中apikey字段
+现在，需要创建 secret，key 值为用户指定，需要在请求时携带 Header 中 apikey 字段
 执行
 ```
 kubectl create secret generic default-apikey  \
@@ -164,14 +163,14 @@ kubectl create secret generic default-apikey  \
   --from-literal=key=ZGVmYXVsdC1hcGlrZXkK
 ```

-在这里，我们的key是随意制定了一串 `ZGVmYXVsdC1hcGlrZXkK`，实际情况也可以
-创建一个用户（consumer）标识访问者身份，并未该用户绑定apikey。
+在这里，我们的 key 是随意制定了一串 `ZGVmYXVsdC1hcGlrZXkK`，实际情况也可以
+创建一个用户（consumer）标识访问者身份，并未该用户绑定 apikey。
 执行
 ```
 kubectl apply -f kong-consumer-k8s.yaml
 ```

-其中,yaml文内容为
+其中，yaml 文内容为
 ```
 apiVersion: configuration.konghq.com/v1
 kind: KongConsumer
@@ -184,13 +183,13 @@ credentials:
 - default-apikey
 ```

-如果我们这时还想再像上一步一样的做curl访问，会发现已经无法访问，此时已经具备了安全能力，我们需要对应的key。
+如果我们这时还想再像上一步一样的做 curl 访问，会发现已经无法访问，此时已经具备了安全能力，我们需要对应的 key。


-### Step 5: 通过API Key访问服务
+**五. 通过 API Key 访问服务**

 执行
 ```
 curl -H "Content-Type:application/json" -H "apikey:ZGVmYXVsdC1hcGlrZXkK" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' https://$IP:$PORT/foo/uci/prediction -k
 ```
-我们可以看到 apikey 已经加入到了curl请求的header当中。
+我们可以看到 apikey 已经加入到了 curl 请求的 header 当中。
--- a/doc/Serving_Configure_CN.md
+++ b/doc/Serving_Configure_CN.md
--- a/doc/Serving_Configure_EN.md
+++ b/doc/Serving_Configure_EN.md
--- a/doc/images/2-1_Docker_Images_CN_1.png
+++ b/doc/images/2-1_Docker_Images_CN_1.png
--- a/doc/images/2-2_Environment_CN_1.png
+++ b/doc/images/2-2_Environment_CN_1.png
--- a/doc/images/2-3_Compile_CN_1.png
+++ b/doc/images/2-3_Compile_CN_1.png
--- a/doc/images/6-1_Cpp_Asynchronous_Framwork_CN_1.png
+++ b/doc/images/6-1_Cpp_Asynchronous_Framwork_CN_1.png
--- a/doc/images/6-5_Cpp_ABTest_CN_1.png
+++ b/doc/images/6-5_Cpp_ABTest_CN_1.png
--- a/doc/images/8-1_Cube_Architecture_CN_1.png
+++ b/doc/images/8-1_Cube_Architecture_CN_1.png
--- a/doc/wechat_group_1.jpeg
+++ b/doc/wechat_group_1.jpeg