Merge branch 'develop' of https://github.com/PaddlePaddle/Serving into fix_dead_link

7dd623f2 · HexToString · a2e1c822 · 521bc16c · 7dd623f2 · 7dd623f2
20 changed file
--- a/README.md
+++ b/README.md
@@ -17,20 +17,21 @@
        <img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
-        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square">
+        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
    </a>
    <br>
 <p>

 ***

-The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples.The core features are as follows:
+The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows:


 - Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md)
- Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md ) such as HTTP, gRPC, bRPC,  and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and  Nvidia TensorRT, and low-precision and quantitative inference.
+- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
+- Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
+- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.
+- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and  Nvidia TensorRT, and low-precision and quantitative inference.
 - Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
 - Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster.
 - Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand.
@@ -57,9 +58,9 @@ This chapter guides you through the installation and deployment steps. It is str
 - [Install Paddle Serving using docker](doc/Install_EN.md)
 - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
 - [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md)
- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md)
+- [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
 - [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop)
+- [Latest Wheel packages](doc/Latest_Packages_CN.md)

 > Use

@@ -68,23 +69,23 @@ The first step is to call the model save interface to generate a model parameter
 - [Quick Start](doc/Quick_Start_EN.md)
 - [Save a servable model](doc/Save_EN.md)
 - [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md)
- [Infer on quantizative models](doc/Low_Precision_CN.md)
- [Data format of classic models](doc/Process_data_CN.md)
- [C++ Serving](doc/C++_Serving/Introduction_CN.md) 
-  - [protocols](doc/C++_Serving/Inference_Protocols_CN.md)
+- [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
+- [Infer on quantizative models](doc/Low_Precision_EN.md)
+- [Data format of classic models(Chinese)](doc/Process_data_CN.md)
+- [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md) 
+  - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md)
  - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
  - [A/B Test](doc/C++_Serving/ABTest_EN.md)
  - [Encryption](doc/C++_Serving/Encryption_EN.md)
  - [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
  - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
 - [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
-  - [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md)
+  - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
  - [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
 - Client SDK
-  - [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md)
+  - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
  - [JAVA SDK](doc/Java_SDK_EN.md)
-  - [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md)
+  - [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
 - [Large-scale sparse parameter server](doc/Cube_Local_EN.md)

 <br>
@@ -93,7 +94,7 @@ The first step is to call the model save interface to generate a model parameter

 For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
 - [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LOD Data](doc/LOD_EN.md)
+- [Processing LoD Data](doc/LOD_EN.md)
 - [FAQ(Chinese)](doc/FAQ_CN.md)

 <h2 align="center">Model Zoo</h2>
@@ -130,15 +131,12 @@ If you want to communicate with developers and other users? Welcome to join us,
 </p>

 ### QQ
- 飞桨推理部署交流群(Group No.：697765514)
+- QQ Group(Group No.：697765514)

 <p align="center">
  <img src="doc/images/qq_group_1.png" width="200">
 </p>

-### Slack
-
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)

 > Contribution


--- a/README_CN.md
+++ b/README_CN.md
@@ -17,7 +17,7 @@
        <img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
-        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square">
+        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
    </a>
    <br>
 <p>
@@ -27,7 +27,7 @@
 Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案，和多种经典预训练模型示例。核心特性如下：

 - 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite，其他机器学习平台（Caffe/TensorFlow/ONNX/PyTorch）可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md)
+- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
 - 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md)；提供C++、Python、Java语言SDK
 - 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架，具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
 - 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件；集成Intel MKLDNN、Nvidia TensorRT加速库，以及低精度和量化推理
@@ -48,17 +48,15 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发

 <h2 align="center">文档</h2>

-***
-
 > 部署

-此章节引导您完成安装和部署步骤，强烈推荐使用Docker部署Paddle Serving，如您不使用docker，省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。
+此章节引导您完成安装和部署步骤，强烈推荐使用Docker部署Paddle Serving，如您不使用docker，省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
 - [使用docker安装Paddle Serving](doc/Install_CN.md)
 - [源码编译安装Paddle Serving](doc/Compile_CN.md)
 - [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
 - [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
 - [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新)
+- [最新Wheel开发包](doc/Latest_Packages_CN.md)

 > 使用

@@ -66,7 +64,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
 - [快速开始](doc/Quick_Start_CN.md)
 - [保存用于Paddle Serving的模型和配置](doc/Save_CN.md)
 - [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性)
+- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
 - [低精度推理](doc/Low_Precision_CN.md)
 - [常见模型数据处理](doc/Process_data_CN.md)
 - [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) 
@@ -76,20 +74,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
  - [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
  - [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
  - [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md)
-  - [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md)
+- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
+  - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
  - [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
 - 客户端SDK
-  - [Python SDK](doc/C++_Serving/Http_Service_CN.md)
+  - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
  - [JAVA SDK](doc/Java_SDK_CN.md)
-  - [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md)
+  - [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
 - [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)

 > 开发者

 为Paddle Serving开发者，提供自定义OP，变长数据处理。
 - [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LOD)处理](doc/LOD_CN.md)
+- [变长数据(LoD)处理](doc/LOD_CN.md)
 - [常见问答](doc/FAQ_CN.md)

 <h2 align="center">模型库</h2>
@@ -104,7 +102,7 @@ Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，

 </p>

-更多模型示例参考Repo，可进入[模型库](doc/Model_Zoo_CN.md)
+更多模型示例进入[模型库](doc/Model_Zoo_CN.md)

 <p align="center">
  <img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/>
@@ -130,13 +128,10 @@ Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，
  <img src="doc/images/qq_group_1.png" width="200">
 </p>

-### Slack
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
-

 > 贡献代码

-如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines](doc/Contribute_EN.md)
+如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
 - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
 - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
 - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程，更新FAQ教程，整理文件目录。

--- a/doc/C++_Serving/ABTest_EN.md
+++ b/doc/C++_Serving/ABTest_EN.md
@@ -31,7 +31,7 @@ The Python code in the file will process the data `test_data/part-0` and write t

 ### Start Server

-Here, we [use docker](../Run_In_Docker_EN.md) to start the server-side service. 
+Here, we [use docker](../Docker_Images_EN.md) to start the server-side service. 

 First, start the BOW server, which enables the `8000` port:


--- a/doc/C++_Serving/Introduction_CN.md
+++ b/doc/C++_Serving/Introduction_CN.md
@@ -76,7 +76,7 @@ C++ Serving采用对称加密算法对模型进行加密，在服务加载模型
 <p>

 ### 4.2 多语言多协议Client
-BRPC网络框架支持[多种底层通信协议](#1.网络框架(BRPC))，即使用目前的C++ Serving框架的Server端，各种语言的Client端，甚至使用curl的方式，只要按照上述协议（具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc)）封装数据并发送，Server端就能够接收、处理和返回结果。
+BRPC网络框架支持[多种底层通信协议](#1网络框架BRPC)，即使用目前的C++ Serving框架的Server端，各种语言的Client端，甚至使用curl的方式，只要按照上述协议（具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc)）封装数据并发送，Server端就能够接收、处理和返回结果。

 对于支持的各种协议我们提供了部分的Client SDK示例供用户参考和使用，用户也可以根据自己的需求去开发新的Client SDK，也欢迎用户添加其他语言/协议（例如GRPC-Go、GRPC-C++ HTTP2-Go、HTTP2-Java等）Client SDK到我们的仓库供其他开发者借鉴和参考。


--- a/doc/Compile_CN.md
+++ b/doc/Compile_CN.md
@@ -51,9 +51,9 @@ Serving开发镜像是Serving套件为了支持各个预测环境提供的用于
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
 |  CPU                         | 0.7.0-devel                       |  Ubuntu 16.04   | 2.2.0                 | Ubuntu 18.04.       |
 |  Cuda10.1+Cudnn7             | 0.7.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
-|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04        |
+|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
 |  Cuda10.2+Cudnn8             | 0.7.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
-|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 

 我们首先要针对自己所需的环境拉取相关镜像。上表**环境**一列下，除了CPU，其余（Cuda**+Cudnn**）都属于GPU环境。
 您可以使用Serving开发镜像。
@@ -106,9 +106,10 @@ find / -name Python.h

 2) 设置`PYTHON_LIBRARIES`

-搜索 libpython3.7.so
+搜索 libpython3.7.so 或 libpython3.7m.so
 ```
 find / -name libpython3.7.so
+find / -name libpython3.7m.so
 ```
 通常会有类似于`**/lib/libpython3.7.so`或者`**/lib/x86_64-linux-gnu/libpython3.7.so`出现，我们只需要取它的文件夹目录就好，比如找到`/usr/local/lib/libpython3.7.so`，那么我们只需要`export PYTHON_LIBRARIES=/usr/local/lib`就好。
 如果没有找到，说明 1）静态编译Python，需要重新安装动态编译的Python 2）全县不足无法查看相关系统目录。
@@ -132,7 +133,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
 export GOPATH=$HOME/go
 export PATH=$PATH:$GOPATH/bin

-python -m install -r python/requirements.txt
+python3.7 -m pip install -r python/requirements.txt
 
 go env -w GO111MODULE=on
 go env -w GOPROXY=https://goproxy.cn,direct

--- a/doc/Compile_EN.md
+++ b/doc/Compile_EN.md
@@ -47,9 +47,9 @@ Serving development mirror is the mirror used to compile and debug prediction se
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
 |  CPU                         | 0.7.0-devel                       |  Ubuntu 16.04   | 2.2.0                 | Ubuntu 18.04.       |
 |  Cuda10.1+Cudnn7             | 0.7.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | Nan                     | Nan                 |
-|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04        |
+|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
 |  Cuda10.2+Cudnn8             | 0.7.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | Nan                    |  Nan                 |
-|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 

 We first need to pull related images for the environment we need. Under the **Environment** column in the above table, except for the CPU, the rest (Cuda**+Cudnn**) belong to the GPU environment.

@@ -93,9 +93,10 @@ If not found. Explanation 1) The development version of Python is not installed

 2) Set `PYTHON_LIBRARIES`

-Search for libpython3.7.so
+Search for libpython3.7.so or libpython3.7m.so
 ```
 find / -name libpython3.7.so
+find / -name libpython3.7m.so
 ```
 Usually there will be something similar to `**/lib/libpython3.7.so` or `**/lib/x86_64-linux-gnu/libpython3.7.so`, we only need to take its folder directory, For example, find `/usr/local/lib/libpython3.7.so`, then we only need `export PYTHON_LIBRARIES=/usr/local/lib`.
 If it is not found, it means 1) Statically compiling Python, you need to reinstall the dynamically compiled Python 2) The county is not enough to view the relevant system catalogs.
@@ -119,7 +120,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
 export GOPATH=$HOME/go
 export PATH=$PATH:$GOPATH/bin

-python -m install -r python/requirements.txt
+python3.7 -m pip install -r python/requirements.txt
 
 go env -w GO111MODULE=on
 go env -w GOPROXY=https://goproxy.cn,direct

--- a/doc/FAQ_CN.md
+++ b/doc/FAQ_CN.md
 # FAQ

+## 版本升级问题

+#### Q: 从v0.6.x升级到v0.7.0版本时，运行Python Pipeline程序时报错信息如下：
+```
+Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postprocess() takes 4 positional arguments but 5 were given
+```
+**A:** 在服务端程序（例如 web_service.py)的postprocess函数定义中增加参数data_id，改为 def postprocess(self, input_dicts, fetch_dict, **data_id**, log_id) 即可。

 ## 基础知识

@@ -10,7 +16,7 @@

 #### Q: paddle-serving是否支持Int32支持

-**A:** 在protobuf定feed_type和fetch_type编号与数据类型对应如下
+**A:** 在protobuf定feed_type和fetch_type编号与数据类型对应如下，完整信息可参考[Serving配置与启动参数说明](./Serving_Configure_CN.md#模型配置文件)

     0-int64
    
@@ -332,7 +338,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999

 使用gdb调试core文件的方法为：gdb <可执行文件> <core文件>，进入后输入bt指令，一般即可显示出错在哪一行。

-注意：可执行文件路径是C++ bin文件的路径，而不是python命令，一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.6.2/serving
+注意：可执行文件路径是C++ bin文件的路径，而不是python命令，一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.7.0/serving


 ## 性能优化
--- a/doc/Install_CN.md
+++ b/doc/Install_CN.md
@@ -43,15 +43,15 @@ bash Serving/tools/paddle_env_install.sh
 **GPU：**
 ```
 # 启动 GPU Docker
-docker pull paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7
-nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7 bash
+docker pull paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7
+nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 bash
 nvidia-docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving

 # Paddle开发镜像需要执行以下脚本增加Serving所需依赖项
 bash Serving/tools/paddle_env_install.sh
 ```
-## 2.安装Paddle Serving相关Python库
+## 2.安装Paddle Serving稳定wheel包

 安装所需的pip依赖
 ```
@@ -77,6 +77,8 @@ paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubun

 paddle-serving-client和paddle-serving-app安装包支持Linux和Windows，其中paddle-serving-client仅支持python3.6/3.7/3.8。

+**如果您之前使用paddle serving 0.5.X 0.6.X的Cuda10.2环境，需要注意在0.7.0版本，paddle-serving-server-gpu==0.7.0.post102的使用Cudnn7和TensorRT6，而0.6.0.post102使用cudnn8和TensorRT7。如果0.6.0的cuda10.2用户需要升级安装，请使用paddle-serving-server-gpu==0.7.0.post1028**
+
 ## 3.安装Paddle相关Python库
 **当您使用`paddle_serving_client.convert`命令或者`Python Pipeline框架`时才需要安装。**
 ```
@@ -97,8 +99,8 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
 |  CPU                         | 0.7.0-devel                       |  Ubuntu 16.04   | 2.2.0                 | Ubuntu 18.04.       |
 |  Cuda10.1+Cudnn7             | 0.7.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
-|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04        |
+|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
 |  Cuda10.2+Cudnn8             | 0.7.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
-|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 

 对于**Windows 10 用户**，请参考文档[Windows平台使用Paddle Serving指导](Windows_Tutorial_CN.md)。
--- a/doc/Install_EN.md
+++ b/doc/Install_EN.md
@@ -43,8 +43,8 @@ bash Serving/tools/paddle_env_install.sh
 **GPU:**
 ```
 # Start GPU Docker
-docker pull paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7
-nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7 bash
+docker pull paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7
+nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 bash
 nvidia-docker exec -it test bash
 git clone https://github.com/PaddlePaddle/Serving

@@ -52,7 +52,7 @@ git clone https://github.com/PaddlePaddle/Serving
 bash Serving/tools/paddle_env_install.sh
 ```

-## 2. Install Paddle Serving related whl Packages
+## 2. Install Paddle Serving stable wheel packages

 Install the required pip dependencies
 ```
@@ -78,6 +78,8 @@ The paddle-serving-server and paddle-serving-server-gpu installation packages su

 The paddle-serving-client and paddle-serving-app installation packages support Linux and Windows, and paddle-serving-client only supports python3.6/3.7/3.8.

+**If you used the Cuda10.2 environment of paddle serving 0.5.X 0.6.X before, you need to pay attention to version 0.7.0, paddle-serving-server-gpu==0.7.0.post102 uses Cudnn7 and TensorRT6, and 0.6.0.post102 uses cudnn8 and TensorRT7. If 0.6.0 cuda10.2 users need to upgrade, please use paddle-serving-server-gpu==0.7.0.post1028**
+
 ## 3. Install Paddle related Python libraries
 **You only need to install it when you use the `paddle_serving_client.convert` command or the `Python Pipeline framework`. **
 ```
@@ -100,8 +102,8 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x
 | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
 |  CPU                         | 0.7.0-devel                       |  Ubuntu 16.04   | 2.2.0                 | Ubuntu 18.04.       |
 |  Cuda10.1+Cudnn7             | 0.7.0-cuda10.1-cudnn7-devel       |  Ubuntu 16.04   | 无                     | 无                 |
-|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04        |
+|  Cuda10.2+Cudnn7             | 0.7.0-cuda10.2-cudnn7-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04        |
 |  Cuda10.2+Cudnn8             | 0.7.0-cuda10.2-cudnn8-devel       |  Ubuntu 16.04   | 无                    |  无                 |
-|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04        | 
+|  Cuda11.2+Cudnn8             | 0.7.0-cuda11.2-cudnn8-devel       |  Ubuntu 16.04   | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04        | 

 For **Windows 10 users**, please refer to the document [Paddle Serving Guide for Windows Platform](Windows_Tutorial_CN.md).
--- a/doc/Latest_Packages_CN.md
+++ b/doc/Latest_Packages_CN.md
 # Latest Wheel Packages

-## CPU server
-### Python 3
-```
-# Compile by gcc8.2
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl
-```
+## Paddle-Serving-Server (x86 CPU/GPU)

-## GPU server
-### Python 3
-```
-#cuda10.1 Cudnn 7 with TensorRT 6, Compile by gcc8.2
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl
-#cuda10.2 Cudnn 7 with TensorRT 6, Compile by gcc5.4
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl
-#cuda10.2 Cudnn 8 with TensorRT 7, Compile by gcc8.2
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl
-#cuda11.2 Cudnn 8 with TensorRT 8 (beta), Compile by gcc8.2
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl
-```
+Check the following table, and copy the address of hyperlink then run `pip3 install`. For example, if you want to install `paddle-serving-server-0.0.0-py3-non-any.whl`, right click the hyper link and copy the link address, the final command is `pip3 install https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl`.

-## Client
+|                           | develop whl                                                                                                                                                              | develop bin                                                                                                                             | stable whl                                                                                                                                                               | stable bin                                                                                                                              |
+|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
+| cpu-avx-mkl               | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz)                  | [paddle_serving_server-0.7.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.7.0-py3-none-any.whl)                          | [serving-cpu-avx-mkl-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.7.0.tar.gz)                  |
+| cpu-avx-openblas          | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz)        | [paddle_serving_server-0.7.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.7.0-py3-none-any.whl)                          | [serving-cpu-avx-openblas-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.7.0.tar.gz)        |
+| cpu-noavx-openblas        | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl)                          | [ serving-cpu-noavx-openblas-0.0.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.7.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.7.0-py3-none-any.whl)                          | [ serving-cpu-noavx-openblas-0.7.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.7.0.tar.gz) |
+| cuda10.1-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl)  | [serving-gpu-101-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.7.0.tar.gz)                          |
+| cuda10.2-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz)                          | [paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl)  | [serving-gpu-102-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.7.0.tar.gz)                          |
+| cuda10.2-cudnn8-TensorRT7 | [paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz )                     | [paddle_serving_server_gpu-0.7.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.7.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.7.0.tar.gz )                     |
+| cuda11.2-cudnn8-TensorRT8 | [paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.0.0.tar.gz )                       | [paddle_serving_server_gpu-0.7.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.7.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.7.0.tar.gz )                       |

-### Python 3.6
-```
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl
-```
-### Python 3.8
-```
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl
-```
-### Python 3.7
-```
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
-```
+### Binary Package
+for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
+
+### How to setup SERVING_BIN offline?
+
+- download the serving server whl package and bin package, and make sure they are for the same environment
+- download the serving client whl and serving app whl, pay attention to the Python version.
+- `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda11-0.0.0/serving` (take Cuda 11 as the example)
+
+## paddle-serving-client 
+
+|  | develop whl                                                                                                                                      | stable whl                                                                                                                                        |
+|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
+| Python3.6             | [paddle_serving_client-0.0.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl) | [paddle_serving_client-0.7.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp36-none-any.whl)) |
+| Python3.7             | [paddle_serving_client-0.0.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl) | [paddle_serving_client-0.7.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp37-none-any.whl)  |
+| Python3.8             | [paddle_serving_client-0.0.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl) | [paddle_serving_client-0.7.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp38-none-any.whl)  |
+
+## paddle-serving-app
+
+|         | develop whl                                                                                                                              | stable whl                                                                                                                                  |
+|---------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
+| Python3 | [paddle_serving_app-0.0.0-py3-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl) | [ paddle_serving_app-0.7.0-py3-none-any.whl ]( https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.7.0-py3-none-any.whl) |

-## App
-### Python 3
-```
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl
-```

 ## Baidu Kunlun user
 for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as follows. Users should use the xpu-beta docker [DOCKER IMAGES](./Docker_Images_CN.md) 
@@ -49,9 +46,15 @@ for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as f

 for arm kunlun user
 ```
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_server_xpu-0.7.0.post2-cp36-cp36m-linux_aarch64.whl
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_client-0.7.0-cp36-cp36m-linux_aarch64.whl
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36-cp36m-linux_aarch64.whl
+# paddle-serving-server
+https://paddle-serving.bj.bcebos.com/whl/xpu/arm/paddle_serving_server_xpu-0.0.0.post2-py3-none-any.whl
+# paddle-serving-client
+https://paddle-serving.bj.bcebos.com/whl/xpu/arm/paddle_serving_client-0.0.0-cp36-none-any.whl
+# paddle-serving-app
+https://paddle-serving.bj.bcebos.com/whl/xpu/arm/paddle_serving_app-0.0.0-py3-none-any.whl
+
+# SERVING BIN
+https://paddle-serving.bj.bcebos.com/bin/serving-xpu-aarch64-0.0.0.tar.gz
 ```
 
 for x86 kunlun user
@@ -62,30 +65,3 @@ https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36
 ```


-### Binary Package
-for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
-
-#### Bin links
-```
-# CPU AVX MKL
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz
-# CPU AVX OPENBLAS
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz
-# CPU NOAVX OPENBLAS
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz
-# Cuda 10.1
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz
-# Cuda 10.2 + Cudnn 7
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz
-# Cuda 10.2 + Cudnn 8
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz
-# Cuda 11.2
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-cuda112-0.0.0.tar.gz
-```
-
-#### How to setup SERVING_BIN offline?
-
- download the serving server whl package and bin package, and make sure they are for the same environment
- download the serving client whl and serving app whl, pay attention to the Python version.
- `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda11-0.0.0/serving` (take Cuda 11 as the example)
-
--- a/doc/Run_On_Kubernetes_CN.md
+++ b/doc/Run_On_Kubernetes_CN.md
@@ -20,15 +20,16 @@ kubectl apply -f https://bit.ly/kong-ingress-dbless

 ### 制作Serving运行镜像（可选）：

-首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表]()文档相比，开发镜像用于调试、编译代码，携带了大量的开发工具，因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器，请直接跳过这一部分。
+首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表](./Docker_Images_CN.md)文档相比，开发镜像用于调试、编译代码，携带了大量的开发工具，因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器，请直接跳过这一部分。

 在`tools/generate_runtime_docker.sh`文件下，它的使用方式如下

 ```bash
-bash tools/generate_runtime_docker.sh --env cuda10.1 --python 3.6 --serving 0.6.0 --paddle 2.0.1 --name serving_runtime:cuda10.1-py36
+bash tools/generate_runtime_docker.sh --env cuda10.1 --python 3.6 --name serving_runtime:cuda10.1-py36
 ```

-会生成 cuda10.1，python 3.6，serving版本0.6.0 还有 paddle版本2.0.1的运行镜像。如果有其他疑问，可以执行下列语句得到帮助信息。
+会生成 cuda10.1，python 3.6，serving版本0.7.0 还有 paddle版本2.2.0的运行镜像。如果有其他疑问，可以执行下列语句得到帮助信息。
+如果您需要老版本Serving运行镜像，请checkout到老版本分支。

 ```
 bash tools/generate_runtime_docker.sh --help
@@ -83,8 +84,8 @@ python3.6 web_service.py
 web service模式本质上和pipeline模式类似，因此我们以`Serving/examples/C++/PaddleNLP/bert`为例

 ```bash
-#假设您已经拥有Serving运行镜像，假设镜像名为registry.baidubce.com/paddlepaddle/serving:0.6.0-cuda10.2-py36
-docker run --rm -dit --name webservice_serving_demo registry.baidubce.com/paddlepaddle/serving:0.6.0-cpu-py36 bash
+#假设您已经拥有Serving运行镜像，假设镜像名为registry.baidubce.com/paddlepaddle/serving:0.7.0-cpu-py36
+docker run --rm -dit --name webservice_serving_demo registry.baidubce.com/paddlepaddle/serving:0.7.0-cpu-py36 bash
 cd Serving/examples/C++/PaddleNLP/bert
 ### download model 
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz

--- a/doc/Serving_Configure_CN.md
+++ b/doc/Serving_Configure_CN.md
@@ -91,7 +91,7 @@ workdir_9393
 | `model`                                        | str[]| `""`    | Path of paddle model directory to be served           |
 | `mem_optim_off`                                | -    | -       | Disable memory / graphic memory optimization          |
 | `ir_optim`                                     | bool | False   | Enable analysis and optimization of calculation graph |
-| `use_mkl` (Only for cpu version)               | -    | -       | Run inference with MKL                                |
+| `use_mkl` (Only for cpu version)               | -    | -       | Run inference with MKL. Need open with ir_optim.                                |
 | `use_trt` (Only for trt version)               | -    | -       | Run inference with TensorRT. Need open with ir_optim.                           |
 | `use_lite` (Only for Intel x86 CPU or ARM CPU) | -    | -       | Run PaddleLite inference. Need open with ir_optim.                              |
 | `use_xpu`                                      | -    | -       | Run PaddleLite inference with Baidu Kunlun XPU. Need open with ir_optim.        |
@@ -357,17 +357,22 @@ op:
            #Fetch结果列表，以client_config中fetch_var的alias_name为准
            fetch_list: ["concat_1.tmp_0"]

-            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
-            devices: ""
-
            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
            device_type: 0

-            #use_mkldnn
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: ""
+
+            #use_mkldnn, 开启mkldnn时，必须同时设置ir_optim=True，否则无效
            #use_mkldnn: True

-            #ir_optim
+            #ir_optim, 开启TensorRT时，必须同时设置ir_optim=True，否则无效
            ir_optim: True
+            
+            #precsion, 预测精度，降低预测精度可提升预测速度
+            #GPU 支持: "fp32"(default), "fp16", "int8"；
+            #CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
+            precision: "fp32"
    rec:
        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
        concurrency: 3
@@ -390,17 +395,22 @@ op:
            #Fetch结果列表，以client_config中fetch_var的alias_name为准
            fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]

-            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
-            devices: ""
-
            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
            device_type: 0

-            #use_mkldnn
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: ""
+
+            #use_mkldnn, 开启mkldnn时，必须同时设置ir_optim=True，否则无效
            #use_mkldnn: True

-            #ir_optim
+            #ir_optim, 开启TensorRT时，必须同时设置ir_optim=True，否则无效
            ir_optim: True
+            
+            #precsion, 预测精度，降低预测精度可提升预测速度
+            #GPU 支持: "fp32"(default), "fp16", "int8"；
+            #CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
+            precision: "fp32"
 ```

 ### 单机多卡
@@ -454,4 +464,4 @@ Python Pipeline支持低精度推理，CPU、GPU和TensoRT支持的精度类型
 #GPU 支持: "fp32"(default), "fp16(TensorRT)", "int8"；
 #CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
 precision: "fp32"
-```
\ No newline at end of file
+```
--- a/doc/Serving_Configure_EN.md
+++ b/doc/Serving_Configure_EN.md
@@ -91,7 +91,7 @@ More flags:
 | `model`                                        | str[]| `""`    | Path of paddle model directory to be served           |
 | `mem_optim_off`                                | -    | -       | Disable memory / graphic memory optimization          |
 | `ir_optim`                                     | bool | False   | Enable analysis and optimization of calculation graph |
-| `use_mkl` (Only for cpu version)               | -    | -       | Run inference with MKL                                |
+| `use_mkl` (Only for cpu version)               | -    | -       | Run inference with MKL. Need open with ir_optim.                                |
 | `use_trt` (Only for trt version)               | -    | -       | Run inference with TensorRT. Need open with ir_optim.                            |
 | `use_lite` (Only for Intel x86 CPU or ARM CPU) | -    | -       | Run PaddleLite inference. Need open with ir_optim.                              |
 | `use_xpu`                                      | -    | -       | Run PaddleLite inference with Baidu Kunlun XPU. Need open with ir_optim.        |
@@ -380,17 +380,22 @@ op:
            #Fetch data list
            fetch_list: ["concat_1.tmp_0"]

-            #Device ID
-            devices: ""
-
            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
            device_type: 0

-            #use_mkldnn
+            #Device ID
+            devices: ""
+
+            #use_mkldnn, When running on mkldnn，must set ir_optim=True
            #use_mkldnn: True

-            #ir_optim
+            #ir_optim, When running on TensorRT，must set ir_optim=True
            ir_optim: True
+            
+            #precsion, Decrease accuracy can increase speed
+            #GPU 支持: "fp32"(default), "fp16", "int8"；
+            #CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
+            precision: "fp32"
    rec:
        #concurrency，is_thread_op=True，thread otherwise process
        concurrency: 3
@@ -413,17 +418,22 @@ op:
            #Fetch data list
            fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]

-            #Device ID
-            devices: ""
-
            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
            device_type: 0
+            
+            #Device ID
+            devices: ""

-            #use_mkldnn
+            #use_mkldnn, When running on mkldnn，must set ir_optim=True
            #use_mkldnn: True

-            #ir_optim
+            #ir_optim, When running on TensorRT，must set ir_optim=True
            ir_optim: True
+            
+            #precsion, Decrease accuracy can increase speed
+            #GPU 支持: "fp32"(default), "fp16", "int8"；
+            #CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
+            precision: "fp32"
 ```

 ### Single-machine and multi-card inference

--- a/doc/Serving_Design_CN.md
+++ b/doc/Serving_Design_CN.md
@@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能：1）模型管

 | 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景|
 |-----|------|-----|-----|------|------|
-| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务|
-| 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率，单算子多模型组合场景，异步模式|
-| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|
+| 低 | 最高 | 低 | 最高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务|
+| 最高 | 较高 | 较高 |较高|Python Pipeline Serving| 兼顾吞吐和效率，单算子多模型组合场景，异步模式|
+| 较高 | 低 | 较高| 低 |Python webservice| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|


 性能指标说明：
@@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎，基本处理单元是OP和Chann
 <center>
 <img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </center>
----
-
-## 6. 未来计划
-
-### 6.1 向量检索、树结构检索
-在推荐与广告场景的召回系统中，通常需要采用基于向量的快速检索或者基于树结构的快速检索，Paddle Serving会对这方面的检索引擎进行集成或扩展。

-### 6.2 服务监控
-集成普罗米修斯监控，一套开源的监控&报警&时间序列数据库的组合，适合k8s和docker的监控系统。
--- a/doc/Serving_Design_EN.md
+++ b/doc/Serving_Design_EN.md
@@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro

 | Response time | throughput | development efficiency | Resource utilization | selection | Applications|
 |-----|------|-----|-----|------|------|
-| LOW | HIGH | LOW | HIGH |C++ Serving | High-performance，recall and ranking services of large-scale online recommendation systems|
-| HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
-| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput，Low-traffic services or projects that require rapid iteration, model effect verification|
+| Low | Highest | Low | Highest |C++ Serving | High-performance，recall and ranking services of large-scale online recommendation systems|
+| Higest | Higher | Higher | Higher |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
+| Higer | Low | Higher| Low |Python webservice| High-throughput，Low-traffic services or projects that require rapid iteration, model effect verification|

 Performance index description：
 1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better.
@@ -199,16 +199,3 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p
 <img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </center>

----
-
-
-## 6. Future Plan
-
-### 5.1 Auto Deployment on Cloud
-In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job.
-
-### 6.2 Vector Indexing and Tree based Indexing
-In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
-
-### 6.3 Service Monitoring
-Paddle Serving will integrate Prometheus monitoring, which is a set of open source monitoring & alarm & time series database combination, suitable for k8s and docker monitoring systems.
--- a/doc/images/wechat_group_1.jpeg
+++ b/doc/images/wechat_group_1.jpeg
--- a/java/README.md
+++ b/java/README.md
@@ -75,7 +75,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Pipeli

 2.Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/Python_Pipeline/Pipeline_Design_EN.md) for details). Pipeline Serving Client for Java is released.

-3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)），needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example，path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml)）
+3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)），needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example，path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../examples/Pipeline/imdb_model_ensemble/config.yml)）

 ### Customization guidance


--- a/tools/dockerfiles/build_scripts/install_trt.sh
+++ b/tools/dockerfiles/build_scripts/install_trt.sh
@@ -24,7 +24,7 @@ if [[ "$VERSION" == "cuda10.1" ]];then
  rm TensorRT6-cuda10.1-cudnn7.tar.gz
 elif [[ "$VERSION" == "cuda11.2" ]];then
  wget https://paddle-ci.gz.bcebos.com/TRT/TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz --no-check-certificate
-  tar -zxf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
+  tar -zxf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz -C /usr/local
  cp -rf /usr/local/TensorRT-8.0.3.4/include/* /usr/include/ && cp -rf /usr/local/TensorRT-8.0.3.4/lib/* /usr/lib/
  rm -rf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
 elif [[ "$VERSION" == "cuda10.2" ]];then

--- a/tools/dockerfiles/build_scripts/install_whl.sh
+++ b/tools/dockerfiles/build_scripts/install_whl.sh
@@ -20,7 +20,7 @@ RUN_ENV=$3 # cpu/10.1 10.2
 PYTHON_VERSION=$4
 serving_release=
 client_release="paddle-serving-client==$SERVING_VERSION"
-app_release="paddle-serving-app==0.3.1"
+app_release="paddle-serving-app==$SERVING_VERSION"

 if [[ $PYTHON_VERSION == "3.6" ]];then
    CPYTHON="36"
@@ -33,48 +33,28 @@ elif [[ $PYTHON_VERSION == "3.8" ]];then
    CPYTHON_PADDLE="38"
 fi

-if [[ $SERVING_VERSION == "0.5.0" ]]; then
-    if [[ "$RUN_ENV" == "cpu" ]];then
-        server_release="paddle-serving-server==$SERVING_VERSION"
-        serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-mkl-${SERVING_VERSION}.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/$PADDLE_VERSION-cpu-avx-mkl/paddlepaddle-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    elif [[ "$RUN_ENV" == "cuda10.1" ]];then
-        server_release="paddle-serving-server-gpu==$SERVING_VERSION.post101"
-        serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-gpu-101-${SERVING_VERSION}.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post101-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    elif [[ "$RUN_ENV" == "cuda10.2" ]];then
-        server_release="paddle-serving-server-gpu==$SERVING_VERSION.post102"
-        serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-gpu-102-${SERVING_VERSION}.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.2-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    elif [[ "$RUN_ENV" == "cuda11" ]];then
-        server_release="paddle-serving-server-gpu==$SERVING_VERSION.post11"
-        serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda11-${SERVING_VERSION}.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda11.0-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post110-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    fi
-    client_release="paddle-serving-client==$SERVING_VERSION"
-    app_release="paddle-serving-app==0.3.1"
-else 
-    if [[ "$RUN_ENV" == "cpu" ]];then
-        server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-$SERVING_VERSION-py3-none-any.whl"
-        serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-$SERVING_VERSION.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/$PADDLE_VERSION-cpu-avx-mkl/paddlepaddle-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    elif [[ "$RUN_ENV" == "cuda10.1" ]];then
-        server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post101-py3-none-any.whl"
-        serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-$SERVING_VERSION.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post101-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    elif [[ "$RUN_ENV" == "cuda10.2" ]];then
-        server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post102-py3-none-any.whl"
-        serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-$SERVING_VERSION.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.2-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    elif [[ "$RUN_ENV" == "cuda11" ]];then
-        server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post11-py3-none-any.whl"
-        serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-cuda11-$SERVING_VERSION.tar.gz"
-        paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda11.0-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post110-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
-    fi
-    client_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-$SERVING_VERSION-cp$CPYTHON-none-any.whl"
-    app_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-$SERVING_VERSION-py3-none-any.whl"
+if [[ "$RUN_ENV" == "cpu" ]];then
+  server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-$SERVING_VERSION-py3-none-any.whl"
+  serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-$SERVING_VERSION.tar.gz"
+  paddle_whl="paddlepaddle==$PADDLE_VERSION"
+elif [[ "$RUN_ENV" == "cuda10.1" ]];then
+  server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post101-py3-none-any.whl"
+  serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-$SERVING_VERSION.tar.gz"
+  paddle_whl="https://paddle-inference-lib.bj.bcebos.com/$PADDLE_VERSION/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-$PADDLE_VERSION.post101-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
+elif [[ "$RUN_ENV" == "cuda10.2" ]] ;then
+  server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post1028-py3-none-any.whl"
+  serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-$SERVING_VERSION.tar.gz"
+  paddle_whl="https://paddle-inference-lib.bj.bcebos.com/$PADDLE_VERSION/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
+elif [[ "$RUN_ENV" == "cuda11.2" ]];then
+  server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post112-py3-none-any.whl"
+  serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-$SERVING_VERSION.tar.gz"
+  paddle_whl="https://paddle-inference-lib.bj.bcebos.com/$PADDLE_VERSION/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-$PADDLE_VERSION.post112-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
 fi

+client_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-$SERVING_VERSION-cp$CPYTHON-none-any.whl"
+app_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-$SERVING_VERSION-py3-none-any.whl"
+
+
 if [[ "$RUN_ENV" == "cpu" ]];then
    python$PYTHON_VERSION -m pip install $client_release $app_release $server_release
    python$PYTHON_VERSION -m pip install $paddle_whl
@@ -105,15 +85,15 @@ elif [[ "$RUN_ENV" == "cuda10.2" ]];then
    echo "export SERVING_BIN=$PWD/serving_bin/serving">>/root/.bashrc
    rm -rf serving-gpu-102-${SERVING_VERSION}.tar.gz
    cd -
-elif [[ "$RUN_ENV" == "cuda11" ]];then
+elif [[ "$RUN_ENV" == "cuda11.2" ]];then
    python$PYTHON_VERSION -m pip install $client_release $app_release $server_release
    python$PYTHON_VERSION -m pip install $paddle_whl
    cd /usr/local/
    wget $serving_bin
-    tar xf serving-gpu-cuda11-${SERVING_VERSION}.tar.gz
-    mv $PWD/serving-gpu-cuda11-${SERVING_VERSION} $PWD/serving_bin
+    tar xf serving-gpu-112-${SERVING_VERSION}.tar.gz
+    mv $PWD/serving-gpu-112-${SERVING_VERSION} $PWD/serving_bin
    echo "export SERVING_BIN=$PWD/serving_bin/serving">>/root/.bashrc
-    rm -rf serving-gpu-cuda11-${SERVING_VERSION}.tar.gz
+    rm -rf serving-gpu-112-${SERVING_VERSION}.tar.gz
    cd -
 fi


--- a/tools/generate_runtime_docker.sh
+++ b/tools/generate_runtime_docker.sh
@@ -7,10 +7,10 @@ function usage
 {
    echo "usage: sh tools/generate_runtime_docker.sh --SOME_ARG ARG_VALUE"
    echo "   ";
-    echo "   --env                 : running env, cpu/cuda10.1/cuda10.2/cuda11";
+    echo "   --env                 : running env, cpu/cuda10.1/cuda10.2/cuda11.2";
    echo "   --python              : python version, 3.6/3.7/3.8 ";
-    echo "   --serving             : serving version(0.6.0)";
-    echo "   --paddle              : paddle version(2.1.0)"
+    #echo "   --serving             : serving version(0.6.0/0.6.2)";
+    #echo "   --paddle              : paddle version(2.1.0/2.2.0)"
    echo "   --image_name          : image name(default serving_runtime:env-python)"
    echo "  -h | --help            : helper";
 }
@@ -25,8 +25,8 @@ function parse_args
      case "$1" in
          --env )               env="$2";             shift;;
          --python )            python="$2";     shift;;
-          --serving )           serving="$2";      shift;;
-          --paddle )            paddle="$2";      shift;;
+          #--serving )           serving="$2";      shift;;
+          #--paddle )            paddle="$2";      shift;;
      --image_name )          image_name="$2";    shift;;
          -h | --help )         usage;            exit;; # quit and show usage
          * )                 args+=("$1")             # if no match, add it to the positional args
@@ -66,9 +66,11 @@ function run
      base_image="nvidia\/cuda:10.1-cudnn7-runtime-ubuntu16.04"
  elif [ $env == "cuda10.2" ]; then
      base_image="nvidia\/cuda:10.2-cudnn8-runtime-ubuntu16.04"
-  elif [ $env == "cuda11" ]; then
-      base_image="nvidia\/cuda:11.0.3-cudnn8-runtime-ubuntu16.04"
+  elif [ $env == "cuda11.2" ]; then
+      base_image="nvidia\/cuda:11.2.0-cudnn8-runtime-ubuntu16.04"
  fi
+  python="2.2.0"
+  serving="0.7.0"
  echo "base image: $base_image"
  echo "named arg: python: $python"
  echo "named arg: serving: $serving"