Merge pull request #27 from PaddlePaddle/v0.5.0

V0.5.0 merge

Merge pull request #27 from PaddlePaddle/v0.5.0
V0.5.0 merge
29e90b6a · TeslaZhao · GitHub · cd57c522 · 7ae851bd · 29e90b6a
105 changed file
--- a/README.md
+++ b/README.md
@@ -20,88 +20,135 @@
    <br>
 <p>
+- [Motivation](./README.md#motivation)
+- [AIStudio Tutorial](./README.md#aistudio-tutorial)
+- [Installation](./README.md#installation)
+- [Quick Start Example](./README.md#quick-start-example)
+- [Document](README.md#document)
+- [Community](README.md#community)
 <h2 align="center">Motivation</h2>
 We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
+<h3 align="center">Some Key Features of Paddle Serving</h3>
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
+- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
+- **Highly concurrent and efficient communication** between clients and servers supported.
+- **Multiple programming languages** supported on client side, such as C++, python and Java.
+***
+- Any model trained by [PaddlePaddle](https://github.com/paddlepaddle/paddle) can be directly used or [Model Conversion Interface](./doc/SAVE_CN.md) for online deployment of Paddle Serving.
+- Support [Multi-model Pipeline Deployment](./doc/PIPELINE_SERVING.md), and provide the requirements of the REST interface and RPC interface itself, [Pipeline example](./python/examples/pipeline).
+- Support the model zoos from the Paddle ecosystem, such as [PaddleDetection](./python/examples/detection), [PaddleOCR](./python/examples/ocr), [PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/tools/recserving/movie_recommender).
+- Provide a variety of pre-processing and post-processing to facilitate users in training, deployment and other stages of related code, bridging the gap between AI developers and application developers, please refer to
+[Serving Examples](./python/examples/).
 <p align="center">
    <img src="doc/demo.gif" width="700">
 </p>
+<h2 align="center">AIStudio Tutorial</h2>
+Here we provide tutorial on AIStudio(Chinese Version) [AIStudio教程-Paddle Serving服务化部署框架](https://aistudio.baidu.com/aistudio/projectdetail/1550674)
+The tutorial provides 
+<ul>
+<li>Paddle Serving Environment Setup</li>
+  <ul>
+    <li>Running in docker images
+    <li>pip install Paddle Serving
+  </ul>
+<li>Quick Experience of Paddle Serving</li>
+<li>Advanced Tutorial of Model Deployment</li>
+  <ul>
+    <li>Save/Convert Models for Paddle Serving</li>
+    <li>Setup Online Inference Service</li>
+  </ul>
+<li>Paddle Serving Examples</li>
+  <ul>
+    <li>Paddle Serving for Detections</li>
+    <li>Paddle Serving for OCR</li>
+  </ul>
+</ul>
 <h2 align="center">Installation</h2>
-We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
+We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
+**Attention:**: Currently, the default GPU environment of paddlepaddle 2.0 is Cuda 10.2, so the sample code of GPU Docker is based on Cuda 10.2. We also provides docker images and whl packages for other GPU environments. If users use other environments, they need to carefully check and select the appropriate version.
 ```
 # Run CPU Docker
-docker pull hub.baidubce.com/paddlepaddle/serving:latest
+docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
 docker exec -it test bash
+git clone https://github.com/PaddlePaddle/Serving
 ```
 ```
 # Run GPU Docker
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+nvidia-docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
 nvidia-docker exec -it test bash
+git clone https://github.com/PaddlePaddle/Serving
 ```
 ```shell
-pip install paddle-serving-client==0.4.0 
+pip install paddle-serving-client==0.5.0
-pip install paddle-serving-server==0.4.0 # CPU
+pip install paddle-serving-server==0.5.0 # CPU
-pip install paddle-serving-app==0.2.0
+pip install paddle-serving-app==0.3.0
-pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.5.0.post102 #GPU with CUDA10.2 + TensorRT7
-pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
+# DO NOT RUN ALL COMMANDS! check your GPU env and select the right one
-pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT
+pip install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
+pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
 ```
 You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
-If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
+If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you want to compile by yourself, please refer to [How to compile Paddle Serving?](./doc/COMPILE.md)
 Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.
-Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7.
+Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7/3.8.
-Recommended to install paddle >= 1.8.4.
-For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
-<h2 align="center"> Pre-built services with Paddle Serving</h2>
+Recommended to install paddle >= 2.0.0
-<h3 align="center">Chinese Word Segmentation</h4>
+```
+# CPU users, please run
+pip install paddlepaddle==2.0.0
-``` shell
+# GPU Cuda10.2 please run
-> python -m paddle_serving_app.package --get_model lac
+pip install paddlepaddle-gpu==2.0.0
-> tar -xzf lac.tar.gz
-> python lac_web_service.py lac_model/ lac_workdir 9393 &
-> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
-{"result":[{"word_seg":"我|爱|北京|天安门"}]}
 ```
-<h3 align="center">Image Classification</h4>
+**Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list
+](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables_en.html#multi-version-whl-package-list-release)
-<p align="center">
+Select the url link of the corresponding GPU environment and install it. For example, for Python2.7 users of Cuda 9.0, please select `cp27-cp27mu` and
-    <br>
+The url corresponding to `cuda9.0_cudnn7-mkl`, copy it and run
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+```
-    <br>
+pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl
-<p>
-``` shell
-> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
-> tar -xzf resnet_v2_50_imagenet.tar.gz
-> python resnet50_imagenet_classify.py resnet50_serving_model &
-> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
 ```
+If it is other environment and Python version, please find the corresponding link in the table and install it with pip.
+For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
 <h2 align="center">Quick Start Example</h2>
-This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
+This quick start example is mainly for those users who already have a model to deploy, and we also provide a model that can be used for deployment. in case if you want to know how to complete the process from offline training to online service, please refer to the AiStudio tutorial above.
 ### Boston House Price Prediction model
+get into the Serving git directory, and change dir to `fit_a_line`
 ``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+cd Serving/python/examples/fit_a_line
-tar -xzf uci_housing.tar.gz
+sh get_data.sh
 ```
 Paddle Serving provides HTTP and RPC based service for users to access
@@ -123,6 +170,8 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
 | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT  |
+| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
+| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
 </center>
@@ -145,26 +194,8 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
 Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is `python/examples/fit_a_line`
-```python
+```
-from paddle_serving_server.web_service import WebService
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
-import numpy as np
-class UciService(WebService):
-    def preprocess(self, feed=[], fetch=[]):
-        feed_batch = []
-        is_batch = True
-        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
-        for i, ins in enumerate(feed):
-            nums = np.array(ins["x"]).reshape(1, 1, 13)
-            new_data[i] = nums
-        feed = {"x": new_data}
-        return feed, fetch, is_batch
-uci_service = UciService(name="uci")
-uci_service.load_model_config("uci_housing_model")
-uci_service.prepare_server(workdir="workdir", port=9292)
-uci_service.run_rpc_service()
-uci_service.run_web_service()
 ```
 for client side,
 ```
@@ -175,32 +206,17 @@ the response is
 {"result":{"price":[[18.901151657104492]]}}
 ```
-<h2 align="center">Some Key Features of Paddle Serving</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
 <h2 align="center">Document</h2>
 ### New to Paddle Serving
 - [How to save a servable model?](doc/SAVE.md)
- [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md)
 - [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
+- [Paddle Serving Examples](python/examples)
-### Tutorial at AIStudio
- [Introduction to PaddleServing](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [Image Segmentation on Paddle Serving](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [Sentimental Analysis](https://aistudio.baidu.com/aistudio/projectdetail/509014)
 ### Developers
- [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
 - [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
- [Golang client](doc/IMDB_GO_CLIENT.md)
 - [Compile from source code](doc/COMPILE.md)
+- [Develop Pipeline Serving](doc/PIPELINE_SERVING.md)
 - [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md)
 - [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md)
@@ -211,15 +227,13 @@ the response is
 - [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
 - [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
-### FAQ
- [FAQ(Chinese)](doc/FAQ.md)
 ### Design
 - [Design Doc](doc/DESIGN_DOC.md)
-<h2 align="center">Community</h2>
+### FAQ
+- [FAQ(Chinese)](doc/FAQ.md)
+<h2 align="center">Community</h2>
 ### Slack

--- a/README_CN.md
+++ b/README_CN.md
@@ -20,91 +20,135 @@
    <br>
 <p>
+- [动机](./README_CN.md#动机)
+- [教程](./README_CN.md#教程)
+- [安装](./README_CN.md#安装)
+- [快速开始示例](./README_CN.md#快速开始示例)
+- [文档](README_CN.md#文档)
+- [社区](README_CN.md#社区)
 <h2 align="center">动机</h2>
 Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 当用户使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络，就同时拥有了该模型的预测服务。
+<h3 align="center">Paddle Serving的核心功能</h3>
+- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
+- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
+- 支持客户端和服务端之间 **高并发和高效通信**.
+- 支持 **多种编程语言** 开发客户端，例如C++, Python和Java.
+***
+- 任何经过[PaddlePaddle](https://github.com/paddlepaddle/paddle)训练的模型，都可以经过直接保存或是[模型转换接口](./doc/SAVE_CN.md)，用于Paddle Serving在线部署。
+- 支持[多模型串联服务部署](./doc/PIPELINE_SERVING_CN.md), 同时提供Rest接口和RPC接口以满足您的需求，[Pipeline示例](./python/examples/pipeline)。
+- 支持Paddle生态的各大模型库, 例如[PaddleDetection](./python/examples/detection)，[PaddleOCR](./python/examples/ocr)，[PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/tools/recserving/movie_recommender)。
+- 提供丰富多彩的前后处理，方便用户在训练、部署等各阶段复用相关代码，弥合AI开发者和应用开发者之间的鸿沟，详情参考[模型示例](./python/examples/)。
 <p align="center">
    <img src="doc/demo.gif" width="700">
 </p>
+<h2 align="center">教程</h2>
+Paddle Serving开发者为您提供了简单易用的[AIStudio教程-Paddle Serving服务化部署框架](https://aistudio.baidu.com/aistudio/projectdetail/1550674)
+教程提供了如下内容
+<ul>
+<li>Paddle Serving环境安装</li>
+  <ul>
+    <li>Docker镜像启动方式
+    <li>pip安装Paddle Serving
+  </ul>
+<li>快速体验部署在线推理服务</li>
+<li>部署在线推理服务进阶流程</li>
+  <ul>
+    <li>获取可用于部署在线服务的模型</li>
+    <li>启动推理服务</li>
+  </ul>
+<li>Paddle Serving在线部署实例</li>
+  <ul>
+    <li>使用Paddle Serving部署图像检测服务</li>
+    <li>使用Paddle Serving部署OCR Pipeline在线服务</li>
+  </ul>
+</ul>
 <h2 align="center">安装</h2>
 **强烈建议**您在**Docker内构建**Paddle Serving，请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)。更多镜像请查看[Docker镜像列表](doc/DOCKER_IMAGES_CN.md)。
+**提示**：目前paddlepaddle 2.0版本的默认GPU环境是Cuda 10.2，因此GPU Docker的示例代码以Cuda 10.2为准。镜像和pip安装包也提供了其余GPU环境，用户如果使用其他环境，需要仔细甄别并选择合适的版本。
 ```
 # 启动 CPU Docker
-docker pull hub.baidubce.com/paddlepaddle/serving:latest
+docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
 docker exec -it test bash
+git clone https://github.com/PaddlePaddle/Serving
 ```
 ```
 # 启动 GPU Docker
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+nvidia-docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
 nvidia-docker exec -it test bash
+git clone https://github.com/PaddlePaddle/Serving
 ```
 ```shell
-pip install paddle-serving-client==0.4.0
+pip install paddle-serving-client==0.5.0
-pip install paddle-serving-server==0.4.0 # CPU
+pip install paddle-serving-server==0.5.0 # CPU
-pip install paddle-serving-app==0.2.0
+pip install paddle-serving-app==0.3.0
-pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.5.0.post102 #GPU with CUDA10.2 + TensorRT7
-pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
+# 其他GPU环境需要确认环境再选择执行哪一条
-pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT
+pip install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0 
+pip install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0 
+pip install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
+pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
 ```
 您可能需要使用国内镜像源（例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`）来加速下载。
-如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。
+如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。如果您想自行编译，请参照[Paddle Serving编译文档](./doc/COMPILE_CN.md)
 paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。
-paddle-serving-client和paddle-serving-app安装包支持Linux和Windows，其中paddle-serving-client仅支持python2.7/3.5/3.6。
+paddle-serving-client和paddle-serving-app安装包支持Linux和Windows，其中paddle-serving-client仅支持python2.7/3.5/3.6/3.7/3.8。
-推荐安装1.8.4及以上版本的paddle
-对于**Windows 10 用户**，请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)。
+推荐安装2.0.0及以上版本的paddle
-<h2 align="center"> Paddle Serving预装的服务 </h2>
-<h3 align="center">中文分词</h4>
-``` shell
-> python -m paddle_serving_app.package --get_model lac
-> tar -xzf lac.tar.gz
-> python lac_web_service.py lac_model/ lac_workdir 9393 &
-> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
-{"result":[{"word_seg":"我|爱|北京|天安门"}]}
 ```
+# CPU环境请执行
+pip install paddlepaddle==2.0.0
-<h3 align="center">图像分类</h4>
+# GPU Cuda10.2环境请执行
+pip install paddlepaddle-gpu==2.0.0
+```
-<p align="center">
+**注意**： 如果您的Cuda版本不是10.2，请勿直接执行上述命令，需要参考[Paddle官方文档-多版本whl包列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-``` shell
+选择相应的GPU环境的url链接并进行安装，例如Cuda 9.0的Python2.7用户，请选择表格当中的`cp27-cp27mu`和`cuda9.0_cudnn7-mkl`对应的url，复制下来并执行
-> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+```
-> tar -xzf resnet_v2_50_imagenet.tar.gz
+pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl
-> python resnet50_imagenet_classify.py resnet50_serving_model &
-> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
 ```
+如果是其他环境和Python版本，请在表格中找到对应的链接并用pip安装。
+对于**Windows 10 用户**，请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)。
 <h2 align="center">快速开始示例</h2>
-这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的，而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程，请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)
+这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的，而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程，请参考前文的AiStudio教程。
 <h3 align="center">波士顿房价预测</h3>
+进入到Serving的git目录下，进入到`fit_a_line`例子
 ``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+cd Serving/python/examples/fit_a_line
-tar -xzf uci_housing.tar.gz
+sh get_data.sh
 ```
 Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
@@ -127,10 +171,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `mem_optim_off` | - | - | Disable memory optimization |
 | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
 | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
-| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT  |
+| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT  |
+| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
+| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
-我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文
-档 [requests](https://requests.readthedocs.io/en/master/)。
 </center>
 ``` python
@@ -151,26 +195,8 @@ print(fetch_map)
 <h3 align="center">HTTP服务</h3>
 用户也可以将数据格式处理逻辑放在服务器端进行，这样就可以直接用curl去访问服务，参考如下案例，在目录`python/examples/fit_a_line`
-```python
+```
-from paddle_serving_server.web_service import WebService
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
-import numpy as np
-class UciService(WebService):
-    def preprocess(self, feed=[], fetch=[]):
-        feed_batch = []
-        is_batch = True
-        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
-        for i, ins in enumerate(feed):
-            nums = np.array(ins["x"]).reshape(1, 1, 13)
-            new_data[i] = nums
-        feed = {"x": new_data}
-        return feed, fetch, is_batch
-uci_service = UciService(name="uci")
-uci_service.load_model_config("uci_housing_model")
-uci_service.prepare_server(workdir="workdir", port=9292)
-uci_service.run_rpc_service()
-uci_service.run_web_service()
 ```
 客户端输入
 ```
@@ -181,32 +207,17 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1
 {"result":{"price":[[18.901151657104492]]}}
 ```
-<h2 align="center">Paddle Serving的核心功能</h2>
- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
 <h2 align="center">文档</h2>
 ### 新手教程
 - [怎样保存用于Paddle Serving的模型？](doc/SAVE_CN.md)
- [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md)
 - [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
+- [Paddle Serving示例合辑](python/examples)
-### AIStudio教程
- [PaddleServing作业](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [PaddleServing图像分割](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [PaddleServing情感分析](https://aistudio.baidu.com/aistudio/projectdetail/509014)
 ### 开发者教程
- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
 - [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
 - [如何编译PaddleServing?](doc/COMPILE_CN.md)
+- [如何开发Pipeline?](doc/PIPELINE_SERVING_CN.md)
 - [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md)
 - [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md)
@@ -217,12 +228,12 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1
 - [CPU版Benchmarks](doc/BENCHMARKING.md)
 - [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
-### FAQ
- [常见问答](doc/FAQ.md)
 ### 设计文档
 - [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
+### FAQ
+- [常见问答](doc/FAQ.md)
 <h2 align="center">社区</h2>
 ### Slack

--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -36,7 +36,7 @@ Here, we [use docker](RUN_IN_DOCKER.md) to start the server-side service.
 First, start the BOW server, which enables the `8000` port:
 ``` shell
-docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash
+docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
 docker exec -it bow-server /bin/bash
 pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
 pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
@@ -47,7 +47,7 @@ exit
 Similarly, start the LSTM server, which enables the `9000` port:
 ```bash
-docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash
+docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
 docker exec -it lstm-server /bin/bash
 pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
 pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple

--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -35,7 +35,7 @@ pip install Shapely
 首先启动BOW Server，该服务启用`8000`端口：
 ```bash
-docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash
+docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
 docker exec -it bow-server /bin/bash
 pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
 pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
@@ -46,7 +46,7 @@ exit
 同理启动LSTM Server，该服务启用`9000`端口：
 ```bash
-docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash
+docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
 docker exec -it lstm-server /bin/bash
 pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
 pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple

--- a/doc/BAIDU_KUNLUN_XPU_SERVING.md
+++ b/doc/BAIDU_KUNLUN_XPU_SERVING.md
+# Paddle Serving Using Baidu Kunlun Chips
+(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
+Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips
+ (such as Phytium FT-2000+/64). We will improve
+ the deployment capability on various heterogeneous hardware servers in the future. 
+# Compilation and installation
+Refer to [compile](COMPILE.md) document to setup the compilation environment。
+## Compilatiton
+* Compile the Serving Server
+```
+cd Serving
+mkdir -p server-build-arm && cd server-build-arm
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DSERVER=ON ..
+make -j10
+```
+You can run `make install` to produce the target in `./output` directory. Add `-DCMAKE_INSTALL_PREFIX=./output` to specify the output path to CMake command shown above。
+* Compile the Serving Client
+```
+mkdir -p client-build-arm && cd client-build-arm
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DCLIENT=ON ..
+make -j10
+```
+* Compile the App
+```
+cd Serving 
+mkdir -p app-build-arm && cd app-build-arm
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DAPP=ON ..
+make -j10
+```
+## Install the wheel package
+After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
+For example, after the Server Compiation step，the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package。
+# Request parameters description
+In order to deploy serving
+ service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite，please specify the following parameters during deployment。
+|param|param description|about|
+|:--|:--|:--|
+|use_lite|using Paddle-Lite Engine|use the inference capability of Paddle-Lite|
+|use_xpu|using Baidu Kunlun for inference|need to be used with the use_lite option|
+|ir_optim|open the graph optimization|refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
+# Deplyment examples
+## Download the model
+```
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+tar -xzf uci_housing.tar.gz
+```
+## Start RPC service
+There are mainly three deployment methods：
+* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu；
+* deploy on the ARM server standalone with Paddle-Lite；
+* deploy on the ARM server standalone without Paddle-Lite。
+The first two deployment methods are recommended。
+Start the rpc service, deploying on ARM server with Baidu Kunlun chips，and accelerate with Paddle-Lite and Baidu Kunlun xpu.
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
+```
+Start the rpc service, deploying on ARM server，and accelerate with Paddle-Lite.
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
+```
+Start the rpc service, deploying on ARM server.
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
+```
+## 
+```
+from paddle_serving_client import Client
+import numpy as np
+client = Client()
+client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
+        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
+print(fetch_map)
+```
+Some examples are provided below, and other models can be modifed with reference to these examples。
+|sample name|sample links|
+|:-----|:--|
+|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
+|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
--- a/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md
+++ b/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md
+# Paddle Serving使用百度昆仑芯片部署
+(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
+Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器（如飞腾 FT-2000+/64）上进行部署，后续完善对其他异构硬件服务器部署能力。
+# 编译、安装
+基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。
+## 编译
+* 编译server部分
+```
+cd Serving
+mkdir -p server-build-arm && cd server-build-arm
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DSERVER=ON ..
+make -j10
+```
+可以执行`make install`把目标产出放在`./output`目录下，cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
+* 编译client部分
+```
+mkdir -p client-build-arm && cd client-build-arm
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DCLIENT=ON ..
+make -j10
+```
+* 编译app部分
+```
+cd Serving 
+mkdir -p app-build-arm && cd app-build-arm
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DAPP=ON ..
+make -j10
+```
+## 安装wheel包
+以上编译步骤完成后，会在各自编译目录$build_dir/python/dist生成whl包，分别安装即可。例如server步骤，会在server-build-arm/python/dist目录下生成whl包, 使用命令```pip install -u xxx.whl```进行安装。
+# 请求参数说明
+为了支持arm+xpu服务部署，使用Paddle-Lite加速能力，请求时需使用以下参数。
+|参数|参数说明|备注|
+|:--|:--|:--|
+|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力|
+|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用|
+|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
+# 部署使用示例
+## 下载模型
+```
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+tar -xzf uci_housing.tar.gz
+```
+## 启动rpc服务
+主要有三种启动配置：
+* 使用arm cpu+xpu部署，使用Paddle-Lite xpu优化加速能力；
+* 单独使用arm cpu部署，使用Paddle-Lite优化加速能力；
+* 使用arm cpu部署，不使用Paddle-Lite加速。
+推荐使用前两种部署方式。
+启动rpc服务，使用arm cpu+xpu部署，使用Paddle-Lite xpu优化加速能力
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
+```
+启动rpc服务，使用arm cpu部署, 使用Paddle-Lite加速能力
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
+```
+启动rpc服务，使用arm cpu部署, 不使用Paddle-Lite加速能力
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
+```
+## client调用
+```
+from paddle_serving_client import Client
+import numpy as np
+client = Client()
+client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
+        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
+print(fetch_map)
+```
+以下提供部分样例，其他模型可参照进行修改。
+|示例名称|示例链接|
+|:-----|:--|
+|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
+|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
--- a/doc/BERT_10_MINS.md
+++ b/doc/BERT_10_MINS.md
@@ -2,35 +2,57 @@
 ([简体中文](./BERT_10_MINS_CN.md)|English)
-The goal of Bert-As-Service is to give a sentence, and the service can represent the sentence as a semantic vector and return it to the user. [Bert model](https://arxiv.org/abs/1810.04805) is a popular model in the current NLP field. It has achieved good results on a variety of public NLP tasks. The semantic vector calculated by the Bert model is used as input to other NLP models, which will also greatly improve the performance of the model. Bert-As-Service allows users to easily obtain the semantic vector representation of text and apply it to their own tasks. In order to achieve this goal, we have shown in four steps that using Paddle Serving can build such a service in ten minutes. All the code and files in the example can be found in [Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) of Paddle Serving.
+The goal of Bert-As-Service is to give a sentence, and the service can represent the sentence as a semantic vector and return it to the user. [Bert model](https://arxiv.org/abs/1810.04805) is a popular model in the current NLP field. It has achieved good results on a variety of public NLP tasks. The semantic vector calculated by the Bert model is used as input to other NLP models, which will also greatly improve the performance of the model. Bert-As-Service allows users to easily obtain the semantic vector representation of text and apply it to their own tasks. In order to achieve this goal, we have shown in five steps that using Paddle Serving can build such a service in ten minutes. All the code and files in the example can be found in [Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) of Paddle Serving.
-#### Step1: Save the serviceable model
+If your python version is 3.X, replace the 'pip' field in the following command with 'pip3',replace 'python' with 'python3'.
-Paddle Serving supports various models trained based on Paddle, and saves the serviceable model by specifying the input and output variables of the model. For convenience, we can load a trained bert Chinese model from paddlehub and save a deployable service with two lines of code. The server and client configurations are placed in the `bert_seq20_model` and` bert_seq20_client` folders, respectively.
+### Step1: Getting Model
-[//file]:#bert_10.py
+#### method 1:
-``` python
+This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub).
-import paddlehub as hub
-model_name = "bert_chinese_L-12_H-768_A-12"
-module = hub.Module(model_name)
-inputs, outputs, program = module.context(
-    trainable=True, max_seq_len=20)
-feed_keys = ["input_ids", "position_ids", "segment_ids",
-             "input_mask"]
-fetch_keys = ["pooled_output", "sequence_output"]
-feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
-fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys]))
-import paddle_serving_client.io as serving_io
+Install paddlehub first
-serving_io.save_model("bert_seq20_model", "bert_seq20_client",
-                      feed_dict, fetch_dict, program)
 ```
+pip install paddlehub
+```
+run 
+```
+python prepare_model.py 128
+```
+**PaddleHub only support Python 3.5+**
+the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
+the config file and model file for server side are saved in the folder bert_seq128_model.
+the config file generated for client side is saved in the folder bert_seq128_client.
+#### method 2:
+You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder:
+```shell
+wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
+tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
+```
+### Step2: Getting Dict and Sample Dataset
+```
+sh get_data.sh
+```
+this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
-#### Step2: Launch Service
-[//file]:#server.sh
+### Step3: Launch Service
-``` shell
-python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 --port 9292 --gpu_ids 0
+start cpu inference service,Run
+```
+python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292  #cpu inference service
+```
+Or,start gpu inference service,Run
+```
+python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
 ```
 | Parameters | Meaning                                  |
 | ---------- | ---------------------------------------- |
@@ -39,52 +61,55 @@ python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 -
 | port       | server port number                       |
 | gpu_ids    | GPU index number                         |
-#### Step3: data preprocessing logic on Client Side
+### Step4: data preprocessing logic on Client Side
 Paddle Serving has many built-in corresponding data preprocessing logics. For the calculation of Chinese Bert semantic representation, we use the ChineseBertReader class under paddle_serving_app for data preprocessing. Model input fields  of multiple models corresponding to a raw Chinese sentence can be easily fetched by developers
 Install paddle_serving_app
-[//file]:#pip_app.sh
 ```shell
 pip install paddle_serving_app
 ```
-#### Step4: Client Visit Serving
+### Step5: Client Visit Serving
-the script of client side bert_client.py is as follow:
-[//file]:#bert_client.py
+#### method 1: RPC Inference
-``` python
-import sys
-from paddle_serving_client import Client
-from paddle_serving_client.utils import benchmark_args
-from paddle_serving_app.reader import ChineseBertReader
-import numpy as np
-args = benchmark_args()
-reader = ChineseBertReader({"max_seq_len": 128})
+Run
-fetch = ["pooled_output"]
+```
-endpoint_list = ['127.0.0.1:9292']
+head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
-client = Client()
+```
-client.load_client_config(args.model)
-client.connect(endpoint_list)
+the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
+#### method 2: HTTP Inference
-for line in sys.stdin:
+This method is divided into two steps: 
-    feed_dict = reader.process(line)
-    for key in feed_dict.keys():
+1. Start an HTTP prediction server.
-        feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
-    result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
+start cpu HTTP inference service,Run
+```
+ python bert_web_service.py bert_seq128_model/ 9292 #launch cpu inference service
 ```
-run
+Or,start gpu HTTP inference service,Run
+```
+ export CUDA_VISIBLE_DEVICES=0,1
+```
+set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
+```
+ python bert_web_service_gpu.py bert_seq128_model/ 9292 #launch gpu inference service
+```
-[//file]:#bert_10_cli.sh
+2. Prediction via HTTP request
-```shell
-cat data.txt | python bert_client.py
 ```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
+```
-read samples from data.txt, print results at the standard output.
 ### Benchmark

--- a/doc/BERT_10_MINS_CN.md
+++ b/doc/BERT_10_MINS_CN.md
@@ -2,30 +2,56 @@
 (简体中文|[English](./BERT_10_MINS.md))
-Bert-As-Service的目标是给定一个句子，服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型，在多种公开的NLP任务上都取得了很好的效果，使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标，我们通过四个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。
+Bert-As-Service的目标是给定一个句子，服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型，在多种公开的NLP任务上都取得了很好的效果，使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标，我们通过以下几个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。
-#### Step1：保存可服务模型
-Paddle Serving支持基于Paddle进行训练的各种模型，并通过指定模型的输入和输出变量来保存可服务模型。为了方便，我们可以从paddlehub加载一个已经训练好的bert中文模型，并利用两行代码保存一个可部署的服务，服务端和客户端的配置分别放在`bert_seq20_model`和`bert_seq20_client`文件夹。
+若使用python的版本为3.X, 将以下命令中的pip 替换为pip3, python替换为python3.
-``` python
-import paddlehub as hub
-model_name = "bert_chinese_L-12_H-768_A-12"
-module = hub.Module(model_name)
-inputs, outputs, program = module.context(trainable=True, max_seq_len=20)
-feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask"]
-fetch_keys = ["pooled_output", "sequence_output"]
-feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
-fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys]))
-import paddle_serving_client.io as serving_io
+### Step1：获取模型
-serving_io.save_model("bert_seq20_model", "bert_seq20_client", feed_dict, fetch_dict, program)
+#### 方法1：
+示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。
+请先安装paddlehub
 ```
+pip install paddlehub
+```
+执行
+```
+python prepare_model.py 128
+```
+参数128表示BERT模型中的max_seq_len，即预处理后的样本长度。
+生成server端配置文件与模型文件，存放在bert_seq128_model文件夹。
+生成client端配置文件，存放在bert_seq128_client文件夹。
+#### 方法2：
+您也可以从bos上直接下载上述模型（max_seq_len=128），解压后server端配置文件与模型文件存放在bert_chinese_L-12_H-768_A-12_model文件夹，client端配置文件存放在bert_chinese_L-12_H-768_A-12_client文件夹：
+```shell
+wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
+tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
+```
-#### Step2：启动服务
+### Step2：获取词典和样例数据
+```
+sh get_data.sh
+```
+脚本将下载中文词典vocab.txt和中文样例数据data-c.txt
+### Step3：启动服务
+启动cpu预测服务，执行
+```
+python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292  #启动cpu预测服务
+```
+或者，启动gpu预测服务，执行
+```
+python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
-``` shell
-python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
 ```
 | 参数    | 含义                       |
@@ -35,7 +61,8 @@ python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 -
 | port    | server端端口号             |
 | gpu_ids | GPU索引号                  |
-#### Step3：客户端数据预处理逻辑
+### Step4：客户端数据预处理逻辑
 Paddle Serving内建了很多经典典型对应的数据预处理逻辑，对于中文Bert语义表示的计算，我们采用paddle_serving_app下的ChineseBertReader类进行数据预处理，开发者可以很容易获得一个原始的中文句子对应的多个模型输入字段。
@@ -45,39 +72,40 @@ Paddle Serving内建了很多经典典型对应的数据预处理逻辑，对于
 pip install paddle_serving_app
 ```
-#### Step4：客户端访问
+### Step5：客户端访问
+#### 方法1：通过RPC方式执行预测
+执行
+```
+head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
-客户端脚本 bert_client.py内容如下
+```
+启动client读取data-c.txt中的数据进行预测，预测结果为文本的向量表示（由于数据较多，脚本中没有将输出进行打印），server端的地址在脚本中修改。
-``` python
-import sys
-from paddle_serving_client import Client
-from paddle_serving_client.utils import benchmark_args
-from paddle_serving_app.reader import ChineseBertReader
-import numpy as np
-args = benchmark_args()
-reader = ChineseBertReader({"max_seq_len": 128})
+#### 方法2：通过HTTP方式执行预测
-fetch = ["pooled_output"]
+该方式分为两步
-endpoint_list = ['127.0.0.1:9292']
+1、启动一个HTTP预测服务端。
-client = Client()
-client.load_client_config(args.model)
-client.connect(endpoint_list)
-for line in sys.stdin:
+启动cpu HTTP预测服务，执行
-    feed_dict = reader.process(line)
-    for key in feed_dict.keys():
-        feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
-    result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
 ```
+python bert_web_service.py bert_seq128_model/ 9292 #启动CPU预测服务
-执行
+```
-```shell
+或者，启动gpu HTTP预测服务，执行
-cat data.txt | python bert_client.py
+```
+ export CUDA_VISIBLE_DEVICES=0,1
+```
+通过环境变量指定gpu预测服务使用的gpu，示例中指定索引为0和1的两块gpu
+```
+python bert_web_service_gpu.py bert_seq128_model/ 9292 #启动gpu预测服务
 ```
-从data.txt文件中读取样例，并将结果打印到标准输出。
+2、通过HTTP请求执行预测。
+```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
+```
 ### 性能测试

--- a/doc/COMPILE.md
+++ b/doc/COMPILE.md
@@ -6,12 +6,11 @@
 |            module            |              version              |
 | :--------------------------: | :-------------------------------: |
-|              OS              |             CentOS 7              |
+|              OS              |     Ubuntu16 and 18/CentOS 7      |
-|             gcc              |          4.8.5 and later          |
+|             gcc              | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
-|           gcc-c++            |          4.8.5 and later          |
+|           gcc-c++            | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
-|             git              |          3.82 and later           |
 |            cmake             |          3.2.0 and later          |
-|            Python            |  2.7.2 and later / 3.6 and later  |
+|            Python            |  2.7.2 and later / 3.5.1 and later |
 |              Go              |          1.9.2 and later          |
 |             git              |         2.17.1 and later          |
 |         glibc-static         |               2.17                |
@@ -19,19 +18,13 @@
 |         bzip2-devel          |          1.0.6 and later          |
 | python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
 |         sqlite-devel         |         3.7.17 and later          |
-|           patchelf           |           0.9 and later           |
+|           patchelf           |                0.9                |
 |           libXext            |               1.3.3               |
 |            libSM             |               1.2.2               |
 |          libXrender          |              0.9.10               |
 It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
-This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
- Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
- Set  `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`
 ## Get Code
 ``` python
@@ -39,19 +32,47 @@ git clone https://github.com/PaddlePaddle/Serving
 cd Serving && git submodule update --init --recursive
 ```
+## PYTHONROOT settings
-## PYTHONROOT Setting
 ```shell
-# for example, the path of python is /usr/bin/python, you can set /usr as PYTHONROOT
+# For example, the path of python is /usr/bin/python, you can set PYTHONROOT
-export PYTHONROOT=/usr/
+export PYTHONROOT=/usr
 ```
-In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
+If you are using a Docker development image, please follow the following to determine the Python version to be compiled, and set the corresponding environment variables
+```
+#Python 2.7
+export PYTHONROOT=/usr/local/python2.7.15/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python2.7
+#Python 3.5
+export PYTHONROOT=/usr/local/python3.5.1
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.5m
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.5m.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.5
+#Python3.6
+export PYTHONROOT=/usr/local/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.6m
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.6m.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.6
+#Python3.7
+export PYTHONROOT=/usr/local/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.7m
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.7m.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.7
+#Python3.8
+export PYTHONROOT=/usr/local/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.8
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.8.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.8
+```
 ## Install Python dependencies
@@ -59,14 +80,11 @@ In the default centos7 image we provide, the Python path is `/usr/bin/python`. I
 pip install -r python/requirements.txt
 ```
-If Python3 is used, replace `pip` with `pip3`.
+If you use other Python version, please use the right `pip` accordingly.
 ## GOPATH Setting
+The default GOPATH is set to `$HOME/go`, you can also set it to other values. **If it is the Docker environment provided by Serving, you do not need to set up.**
-## Compile Arguments
-The default GOPATH is `$HOME/go`, which you can set to other values.
 ```shell
 export GOPATH=$HOME/go
 export PATH=$PATH:$GOPATH/bin
@@ -100,52 +118,42 @@ make -j10
 you can execute `make install` to put targets under directory `./output`, you need to add`-DCMAKE_INSTALL_PREFIX=./output`to specify output path to cmake command shown above.
 ### Integrated GPU version paddle inference library
-### CUDA_PATH is the cuda install path,use the command(whereis cuda) to check,it should be /usr/local/cuda.
-### CUDNN_LIBRARY && CUDA_CUDART_LIBRARY is the lib path, it should be /usr/local/cuda/lib64/
-``` shell
+Compared with CPU environment, GPU environment needs to refer to the following table,
-export CUDA_PATH='/usr/local/cuda'
+**It should be noted that the following table is used as a reference for non-Docker compilation environment. The Docker compilation environment has been configured with relevant parameters and does not need to be specified in cmake process. **
-export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
-export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
-mkdir server-build-gpu && cd server-build-gpu
+| cmake environment variable | meaning | GPU environment considerations | whether Docker environment is needed |
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+|-----------------------|------------------------- ------------|-------------------------------|----- ---------------|
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
+| CUDA_TOOLKIT_ROOT_DIR | cuda installation path, usually /usr/local/cuda | Required for all environments | No
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
+(/usr/local/cuda) |
-    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
+| CUDNN_LIBRARY | The directory where libcudnn.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
-    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
+| CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
-    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \  
+| TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) |
-    -DSERVER=ON \
-    -DWITH_GPU=ON ..
-make -j10
-```
-### Integrated TRT version paddle inference library
+If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.
-```
+``` shell
 export CUDA_PATH='/usr/local/cuda'
 export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
 export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
+export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
-mkdir server-build-trt && cd server-build-trt
+mkdir server-build-gpu && cd server-build-gpu
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
+    -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
+    -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-    -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
+    -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH}
    -DSERVER=ON \
-    -DWITH_GPU=ON \
+    -DWITH_GPU=ON ..
-    -DWITH_TRT=ON ..
 make -j10
 ```
-execute `make install` to put targets under directory `./output`
+Execute `make install` to put the target output in the `./output` directory.
-**Attention：** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
+**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](COMPILE.md#Notes) ).
 ## Compile Client
@@ -208,7 +216,6 @@ Please use the example under `python/examples` to verify.
 |      CLIENT      |       Compile Paddle Serving Client        | OFF  |
 |      SERVER      |       Compile Paddle Serving Server        | OFF  |
 |       APP        |     Compile Paddle Serving App package     | OFF  |
-| WITH_ELASTIC_CTR |        Compile ELASITC-CTR solution        | OFF  |
 |       PACK       |              Compile for whl               | OFF  |
 ### WITH_GPU Option
@@ -231,10 +238,12 @@ Note here:
 The following is the base library version matching relationship used by the PaddlePaddle release version for reference:
 |          |  CUDA   |   CuDNN      | TensorRT |
-| :----:   | :-----: | :----------------------: | :----:   |
+| :----:   | :-----: | :----------: | :----:   |
-| post9    |  9.0    | CuDNN 7.3.1 for CUDA 9.0 |          |
+| post9    |  9.0    | CuDNN 7.6.4  |          |
-| post10   |  10.0   | CuDNN 7.5.1 for CUDA 10.0|          |
+| post10   |  10.0   | CuDNN 7.6.5  |          |
-| trt      |  10.1   | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5  |
+| post101  |  10.1   | CuDNN 7.6.5  | 6.0.1    |
+| post102  |  10.2   | CuDNN 8.0.5  | 7.1.3    |
+| post11   |  11.0   | CuDNN 8.0.4  | 7.1.3    |
 ### How to make the compiler detect the CuDNN library

--- a/doc/COMPILE_CN.md
+++ b/doc/COMPILE_CN.md
@@ -6,12 +6,11 @@
 |             组件             |             版本要求              |
 | :--------------------------: | :-------------------------------: |
-|              OS              |             CentOS 7              |
+|              OS              |     Ubuntu16 and 18/CentOS 7      |
-|             gcc              |          4.8.5 and later          |
+|             gcc              | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
-|           gcc-c++            |          4.8.5 and later          |
+|           gcc-c++            | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
-|             git              |          3.82 and later           |
 |            cmake             |          3.2.0 and later          |
-|            Python            |  2.7.2 and later / 3.6 and later  |
+|            Python            |  2.7.2 and later / 3.5.1 and later |
 |              Go              |          1.9.2 and later          |
 |             git              |         2.17.1 and later          |
 |         glibc-static         |               2.17                |
@@ -24,13 +23,7 @@
 |            libSM             |               1.2.2               |
 |          libXrender          |              0.9.10               |
-推荐使用Docker编译，我们已经为您准备好了Paddle Serving编译环境，详见[该文档](DOCKER_IMAGES_CN.md)。
+推荐使用Docker编译，我们已经为您准备好了Paddle Serving编译环境并配置好了上述编译依赖，详见[该文档](DOCKER_IMAGES_CN.md)。
-本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译，只需要调整cmake的Python相关选项即可：
- 将`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
- 将`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
- 将`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`
 ## 获取代码
@@ -39,19 +32,46 @@ git clone https://github.com/PaddlePaddle/Serving
 cd Serving && git submodule update --init --recursive
 ```
 ## PYTHONROOT设置
 ```shell
 # 例如python的路径为/usr/bin/python，可以设置PYTHONROOT
-export PYTHONROOT=/usr/
+export PYTHONROOT=/usr
 ```
-我们提供默认Centos7的Python路径为`/usr/bin/python`，如果您要使用我们的Centos6镜像，需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`。
+如果您使用的是Docker开发镜像，请按照如下，确定好需要编译的Python版本，设置对应的环境变量
+```
+#Python 2.7
+export PYTHONROOT=/usr/local/python2.7.15/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python2.7
+#Python 3.5
+export PYTHONROOT=/usr/local/python3.5.1
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.5m
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.5m.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.5
+#Python3.6
+export PYTHONROOT=/usr/local/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.6m
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.6m.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.6
+#Python3.7
+export PYTHONROOT=/usr/local/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.7m
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.7m.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.7
+#Python3.8
+export PYTHONROOT=/usr/local/
+export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.8
+export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.8.so
+export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.8
+```
 ## 安装Python依赖
@@ -59,11 +79,11 @@ export PYTHONROOT=/usr/
 pip install -r python/requirements.txt
 ```
-如果使用 Python3，请以 `pip3` 替换 `pip`。
+如果使用其他Python版本，请使用对应版本的`pip`。
 ## GOPATH 设置
-默认 GOPATH 设置为 `$HOME/go`，您也可以设置为其他值。
+默认 GOPATH 设置为 `$HOME/go`，您也可以设置为其他值。** 如果是Serving提供的Docker环境，可以不需要设置。**
 ```shell
 export GOPATH=$HOME/go
 export PATH=$PATH:$GOPATH/bin
@@ -87,9 +107,9 @@ go get -u google.golang.org/grpc@v1.33.0
 ``` shell
 mkdir server-build-cpu && cd server-build-cpu
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR/ \
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
+    -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
+    -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
    -DSERVER=ON ..
 make -j10
 ```
@@ -97,44 +117,35 @@ make -j10
 可以执行`make install`把目标产出放在`./output`目录下，cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
 ### 集成GPU版本Paddle Inference Library
-### CUDA_PATH是cuda的安装路径，可以使用命令行whereis cuda命令确认你的cuda安装路径，通常应该是/usr/local/cuda
-### CUDNN_LIBRARY CUDA_CUDART_LIBRARY 是cuda库文件的路径，通常应该是/usr/local/cuda/lib64/
-``` shell
-export CUDA_PATH='/usr/local/cuda'
-export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
-export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
-mkdir server-build-gpu && cd server-build-gpu
+相比CPU环境，GPU环境需要参考以下表格,
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+**需要说明的是，以下表格对非Docker编译环境作为参考，Docker编译环境已经配置好相关参数，无需在cmake过程指定。**
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-    -DSERVER=ON \
-    -DWITH_GPU=ON ..
-make -j10
-```
-### 集成TensorRT版本Paddle Inference Library
+| cmake环境变量         | 含义                                | GPU环境注意事项               | Docker环境是否需要 |
+|-----------------------|-------------------------------------|-------------------------------|--------------------|
+| CUDA_TOOLKIT_ROOT_DIR | cuda安装路径，通常为/usr/local/cuda | 全部环境都需要                | 否(/usr/local/cuda)                 |
+| CUDNN_LIBRARY         | libcudnn.so.*所在目录，通常为/usr/local/cuda/lib64/  | 全部环境都需要                | 否(/usr/local/cuda/lib64/)                 |
+| CUDA_CUDART_LIBRARY   | libcudart.so.*所在目录，通常为/usr/local/cuda/lib64/ | 全部环境都需要                | 否(/usr/local/cuda/lib64/)                 |
+| TENSORRT_ROOT         | libnvinfer.so.*所在目录的上一级目录，取决于TensorRT安装目录 | Cuda 9.0/10.0不需要，其他需要 | 否(/usr)                 |
-```
+非Docker环境下，用户可以参考如下执行方式，具体的路径以当时环境为准，代码仅作为参考。
+``` shell
 export CUDA_PATH='/usr/local/cuda'
 export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
 export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
 export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
-mkdir server-build-trt && cd server-build-trt
+mkdir server-build-gpu && cd server-build-gpu
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
+    -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
+    -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-    -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
+    -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH}
    -DSERVER=ON \
-    -DWITH_GPU=ON \
+    -DWITH_GPU=ON ..
-    -DWITH_TRT=ON ..
 make -j10
 ```
@@ -146,9 +157,9 @@ make -j10
 ``` shell
 mkdir client-build && cd client-build
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
+    -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
+    -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \    
    -DCLIENT=ON ..
 make -j10
 ```
@@ -161,10 +172,9 @@ make -j10
 ```bash
 mkdir app-build && cd app-build
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
+cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
+    -DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
+    -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \    
-    -DCMAKE_INSTALL_PREFIX=./output \
    -DAPP=ON ..
 make
 ```
@@ -183,7 +193,7 @@ make
 运行python端Server时，会检查`SERVING_BIN`环境变量，如果想使用自己编译的二进制文件，请将设置该环境变量为对应二进制文件的路径，通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`。
 其中BUILD_DIR为server-build-cpu或server-build-gpu的绝对路径。
-可以cd server-build-cpu路径下，执行export SERVING_BIN=${PWD}/core/general-server/serving
+可以cd server-build-cpu路径下，执行`export SERVING_BIN=${PWD}/core/general-server/serving`
@@ -207,7 +217,6 @@ make
 |      CLIENT      |       Compile Paddle Serving Client        | OFF  |
 |      SERVER      |       Compile Paddle Serving Server        | OFF  |
 |       APP        |     Compile Paddle Serving App package     | OFF  |
-| WITH_ELASTIC_CTR |        Compile ELASITC-CTR solution        | OFF  |
 |       PACK       |              Compile for whl               | OFF  |
 ### WITH_GPU选项
@@ -226,13 +235,15 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选
 1. 编译Serving所在的系统上所安装的CUDA/CUDNN等基础库版本，需要兼容实际的GPU设备。例如，Tesla V100卡至少要CUDA 9.0。如果编译时所用CUDA等基础库版本过低，由于生成的GPU代码和实际硬件设备不兼容，会导致Serving进程无法启动，或出现coredump等严重问题。
 2. 运行Paddle Serving的系统上安装与实际GPU设备兼容的CUDA driver，并安装与编译期所用的CUDA/CuDNN等版本兼容的基础库。如运行Paddle Serving的系统上安装的CUDA/CuDNN的版本低于编译时所用版本，可能会导致奇怪的cuda函数调用失败等问题。
-以下是PaddlePaddle发布版本所使用的基础库版本匹配关系，供参考：
+以下是PaddleServing 镜像的Cuda与Cudnn，TensorRT的匹配关系，供参考：
 |          |  CUDA   |   CuDNN      | TensorRT |
-| :----:   | :-----: | :----------------------: | :----:   |
+| :----:   | :-----: | :----------: | :----:   |
-| post9    |  9.0    | CuDNN 7.3.1 for CUDA 9.0 |          |
+| post9    |  9.0    | CuDNN 7.6.4  |          |
-| post10   |  10.0   | CuDNN 7.5.1 for CUDA 10.0|          |
+| post10   |  10.0   | CuDNN 7.6.5  |          |
-| trt      |  10.1   | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5  |
+| post101  |  10.1   | CuDNN 7.6.5  | 6.0.1    |
+| post102  |  10.2   | CuDNN 8.0.5  | 7.1.3    |
+| post11   |  11.0   | CuDNN 8.0.4  | 7.1.3    |
 ### 如何让Paddle Serving编译系统探测到CuDNN库

--- a/doc/DESIGN_DOC.md
+++ b/doc/DESIGN_DOC.md
--- a/doc/DESIGN_DOC_CN.md
+++ b/doc/DESIGN_DOC_CN.md
@@ -6,28 +6,30 @@
 Paddle Serving是一个PaddlePaddle开源的在线服务框架，长期目标就是围绕着人工智能落地的最后一公里提供越来越专业、可靠、易用的服务。
- 工业级：为了达到工业级深度学习模型在线部署的要求，
+- 工业级：为了达到工业级深度学习模型在线部署的要求，Paddle Serving提供很多大规模场景需要的部署功能：1）模型管理、模型热加载、模型加解密；2）支持跨平台、多种硬件部署；3）分布式稀疏参数索引功能；4）在线A/B流量测试
-Paddle Serving提供很多大规模场景需要的部署功能：1）模型管理、模型热加载、模型加解密。2）支持跨平台、多种硬件部署和推理。3）分布式稀疏参数索引功能。4）在线A/B流量测试
- 高性能：从低延时和高吞吐2个维度思考提升模型推理的性能。1）集成Paddle Inference高性能预测引擎；2）支持Nvidia Tensor RT高性能推理引擎；3）高性能网络框架；4）异步Pipeline模式大幅提升吞吐量
+- 高性能：从低延时和高吞吐2个维度思考提升模型推理的性能。1）集成Paddle Inference高性能预测引擎；2）支持Nvidia Tensor RT高性能推理引擎；3）集成高性能网络框架brpc；4）异步Pipeline模式大幅提升吞吐量
 - 简单易用：为了让使用Paddle的用户能够以极低的成本部署模型，PaddleServing设计了一套与Paddle训练框架无缝打通的预测部署API，普通模型可以使用一行命令进行服务部署。20多种常见模型案例和文档。
- 功能扩展：当前，Paddle Serving支持C++、Python、Golang、Java 4种语言客户端，能力上也会持续加强。在Paddle Serving的框架设计方面，尽管当前Paddle Serving以支持Paddle模型的部署为核心功能，
+- 功能扩展：当前，Paddle Serving支持C++、Python、Golang、Java 4种语言客户端，未来会支持更多语。在Paddle Serving的框架设计方面，尽管当前Paddle Serving以支持Paddle模型的部署为核心功能，
 用户可以很容易嵌入其他的机器学习库部署在线预测。
----
+## 2. 概要设计
+任何优秀软件产品一定从用户需求出发，具有清晰的定位和良好的概要设计。Paddle Serving也不例外，Paddle Serving目标围绕着人工智能落地的最后一公里提供越来越专业、可靠、易用的服务。通过调研大量用户的使用场景，并将这些场景抽象归纳，例如在线服务侧重高并发，低平响；离线服务侧重批量高吞吐，高资源利用率；算法开发者擅长使用Python做模型训练和推理等。
 ## 2. 整体设计
 任何优秀产品一定从用户需求出发，具有清晰的定位和良好的设计。Paddle Serving也不例外，Paddle Serving目标围绕着人工智能落地的最后一公里提供越来越专业、可靠、易用的服务。通过调研大量用户的使用场景，并将这些场景抽象归纳，例如在线服务侧重高并发，低平响；离线服务侧重批量高吞吐，高资源利用率；算法开发同学擅长使用Python做模型训练和推理等。
 ### 2.1 设计选型
 为了满足不同场景的用户需求，Paddle Serving的产品定位采用更低维度特征，如响应时间、吞吐、开发效率等，实现目标的选型和技术选型。
-| 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 类似场景|
+| 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景|
 |-----|------|-----|-----|------|------|
-| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务。支持批量推理|
+| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务|
 | 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率，单算子多模型组合场景，异步模式|
 | 高 | 低 | 高| 低 |Python webserver| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|
@@ -55,20 +57,20 @@ Paddle Serving从做顶层设计时考虑到不同团队在工业级场景中会
 > 跨平台运行
 跨平台是不依赖于操作系统，也不依赖硬件环境。一个操作系统下开发的应用，放到另一个操作系统下依然可以运行。因此，设计上既要考虑开发语言、组件是跨平台的，同时也要考虑不同系统上编译器的解释差异。
-Docker 是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像，镜像列表参考《[Docker镜像](DOCKER_IMAGES_CN.md)》，根据用户的使用场景选择。为方便用户使用Docker镜像，我们提供了帮助文档《[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)》。目前，Python webserver模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](WINDOWS_TUTORIAL_CN.md)》
+Docker 是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像，镜像列表参考《[Docker镜像](DOCKER_IMAGES_CN.md)》，根据用户的使用场景选择镜像。为方便用户使用Docker，我们提供了帮助文档《[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)》。目前，Python webserver模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](WINDOWS_TUTORIAL_CN.md)》
 > 支持多种开发语言SDK
-为了方便不同场景使用Serving，Paddle Serving提供了4种开发语言SDK，包括Python、C++、Java、Golang。Golang SDK在持续建设中，有兴趣的开源开发者可以提交PR。
+Paddle Serving提供了4种开发语言SDK，包括Python、C++、Java、Golang。Golang SDK在建设中，有兴趣的开源开发者可以提交PR。
-+ Python 参考python/examples下client示例 或 4.2 web服务示例
+ Python，参考python/examples下client示例 或 4.2 web服务示例
-+ C++使用文档 《[从零开始写一个预测服务](deprecated/CREATING.md)》
+ C++，参考《[从零开始写一个预测服务](deprecated/CREATING.md)》
-+ Java使用文档 《[Paddle Serving Client Java SDK](JAVA_SDK_CN.md)》
+ Java，参考《[Paddle Serving Client Java SDK](JAVA_SDK_CN.md)》
-+ Golang示例文档 《[如何在Paddle Serving使用Go Client](IMDB_GO_CLIENT_CN.md)》
+ Golang，参考《[如何在Paddle Serving使用Go Client](deprecated/IMDB_GO_CLIENT_CN.md)》
 > 支持多种硬件设备
-主流深度学习平台的推理框架仅支持X86平台的CPU和GPU推理，随着AI算法复杂度高速增长，推动芯片算力不断提升，推动物联网应用加速落地，在多种硬件环境的推理场景越来越多。Paddle Serving集成高性能Paddle Inference和Paddle Lite，提供在多种硬件设备上推理服务。目前，除了X86 CPU、GPU外，Paddle Serving已实现ARM CPU和昆仑 XPU上部署推理服务，未来会有更多的硬件加入Paddle Serving。
+知名的深度学习平台的推理框架仅支持X86平台的CPU和GPU推理。随着AI算法复杂度高速增长，芯片算力大幅提升，推动物联网应用加速落地，在多种硬件上部署。Paddle Serving集成高性能推理引擎Paddle Inference和移动端推理引擎Paddle Lite，在多种硬件设备上提供推理服务。目前，除了X86 CPU、GPU外，Paddle Serving已实现ARM CPU和昆仑 XPU上部署推理服务，未来会有更多的硬件加入Paddle Serving。
 > 跨深度学习平台模型转换
@@ -105,18 +107,15 @@ fetch_var {
 分布式稀疏参数索引通常在广告推荐中出现，并与分布式训练配合形成完整的离线-在线一体化部署。下图解释了其中的流程，产品的在线服务接受用户请求后将请求发送给预估服务，同时系统会记录用户的请求以进行相应的训练日志处理和拼接。离线分布式训练系统会针对流式产出的训练日志进行模型增量训练，而增量产生的模型会配送至分布式稀疏参数索引服务，同时对应的稠密的模型参数也会配送至在线的预估服务。在线服务由两部分组成，一部分是针对用户的请求提取特征后，将需要进行模型的稀疏参数索引的特征发送请求给分布式稀疏参数索引服务，针对分布式稀疏参数索引服务返回的稀疏参数再进行后续深度学习模型的计算流程，从而完成预估。
-> 云上部署
-云端部署能力正在建设中，待开放
 ----
 ## 3. C++ Serving设计
-C++ Serving目标实现高并发、低延时的高性能推理服务。其网络框架和核心执行引擎均是基于C/C++编写，并且提供一定的工业级应用能力，包括模型管理、模型安全、A/B Testing
+C++ Serving目标实现高并发、低延时的高性能推理服务。其网络框架和核心执行引擎均是基于C/C++编写，并且提供强大的工业级应用能力，包括模型管理、模型安全、A/B Testing
-### 3.1 网络框架
+### 3.1 通信机制
 C++ Serving采用[better-rpc](https://github.com/apache/incubator-brpc)进行底层的通信。better-rpc是百度开源的一款PRC通信库，具有高并发、低延时等特点，已经支持了包括百度在内上百万在线预估实例、上千个在线预估服务，稳定可靠。与gRPC网络框架相比，具有更低的延时，更高的并发性能；缺点是跨操作系统平台、跨语言能力不足。
 ### 3.2 核心执行引擎
 C++ Serving的核心执行引擎是一个有向无环图，图中的每个节点代表预估服务的一个环节，例如计算模型预测打分就是其中一个环节。有向无环图有利于可并发节点充分利用部署实例内的计算资源，缩短延时。一个例子，当同一份输入需要送入两个不同的模型进行预估，并将两个模型预估的打分进行加权求和时，两个模型的打分过程即可以通过有向无环图的拓扑关系并发。
@@ -128,7 +127,7 @@ C++ Serving的核心执行引擎是一个有向无环图，图中的每个节点
 ### 3.3 模型管理与热加载
-addle Serving的C++引擎支持模型管理功能，支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性，需要在服务不中断的情况下对模型进行热加载。Paddle Serving对该特性进行了支持，并提供了一个监控产出模型更新本地模型的工具，具体例子请参考《[Paddle Serving中的模型热加载](HOT_LOADING_IN_SERVING_CN.md)》。
+Paddle Serving的C++引擎支持模型管理功能，支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性，需要在服务不中断的情况下对模型进行热加载。Paddle Serving对该特性进行了支持，并提供了一个监控产出模型更新本地模型的工具，具体例子请参考《[Paddle Serving中的模型热加载](HOT_LOADING_IN_SERVING_CN.md)》。
 ### 3.4 模型加解密
@@ -136,6 +135,7 @@ Paddle Serving采用对称加密算法对模型进行加密，在服务加载模
 ### 3.5 A/B Test
 在对模型进行充分的离线评估后，通常需要进行在线A/B测试，来决定是否大规模上线服务。下图为使用Paddle Serving做A/B测试的基本结构，Client端做好相应的配置后，自动将流量分发给不同的Server，从而完成A/B测试。具体例子请参考《[如何使用Paddle Serving做ABTEST](ABTEST_IN_PADDLE_SERVING_CN.md)》。
 <p align="center">
@@ -210,3 +210,6 @@ Pipeline Serving核心设计是图执行引擎，基本处理单元是OP和Chann
 ### 6.2 向量检索、树结构检索
 在推荐与广告场景的召回系统中，通常需要采用基于向量的快速检索或者基于树结构的快速检索，Paddle Serving会对这方面的检索引擎进行集成或扩展。
+### 6.3 服务监控
+集成普罗米修斯监控，一套开源的监控&报警&时间序列数据库的组合，适合k8s和docker的监控系统。
--- a/doc/DOCKER_IMAGES.md
+++ b/doc/DOCKER_IMAGES.md
@@ -8,11 +8,10 @@ This document maintains a list of docker images provided by Paddle Serving.
 You can get images in two ways:
-1. Pull image directly from `hub.baidubce.com ` or `docker.io` through TAG:
+1. Pull image directly from `registry.baidubce.com ` through TAG:
   ```shell
-   docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
+   docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
-   docker pull paddlepaddle/serving:<TAG> # hub.docker.com
   ```
 2. Building image based on dockerfile
@@ -20,7 +19,7 @@ You can get images in two ways:
   Create a new folder and copy Dockerfile to this folder, and run the following command:
   ```shell
-   docker build -t <image-name>:<images-tag> .
+   docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
   ```
@@ -47,18 +46,54 @@ If you want to customize your Serving based on source code, use the version with
 **Java Client:**
 ```
-hub.baidubce.com/paddlepaddle/serving:latest-java
+registry.baidubce.com/paddlepaddle/serving:latest-java
 ```
 **XPU:**
 ```
-hub.baidubce.com/paddlepaddle/serving:xpu-beta
+registry.baidubce.com/paddlepaddle/serving:xpu-beta
 ```
 ## Requirements for running CUDA containers
 Running a CUDA container requires a machine with at least one CUDA-capable GPU and a driver compatible with the CUDA toolkit version you are using. 
-The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit doesn't have to be installed.
+The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit does not have to be installed.
 For the relationship between CUDA toolkit version, Driver version and GPU architecture, please refer to [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA).
+# (Attachment) The List of All the Docker images
+Develop Images:
+| Env      | Version | Docker images tag            | OS        | Gcc Version |
+|----------|---------|------------------------------|-----------|-------------|
+|    CPU   | 0.5.0   | 0.5.0-devel                 | Ubuntu 16 |  8.2.0       |
+|          | <=0.4.0 | 0.4.0-devel                  | CentOS 7  | 4.8.5       |
+|  Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7-devel    | Ubuntu 16 |  4.8.5       |
+|          | <=0.4.0 | 0.4.0-cuda9.0-cudnn7-devel   | CentOS 7  | 4.8.5       |
+| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7-devel | Ubuntu 16 |    4.8.5       |
+|          | <=0.4.0 | 0.4.0-cuda10.0-cudnn7-devel  | CentOS 7  | 4.8.5       |
+| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel    | CentOS 7  | 4.8.5     |
+| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | Nan                          | Nan       | Nan         |
+| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8-devel | Ubuntu 18 |    8.2.0       |
+|          | <=0.4.0 | Nan                          | Nan       | Nan         |
+Running Images:
+| Env      | Version | Docker images tag     | OS        | Gcc Version |
+|----------|---------|-----------------------|-----------|-------------|
+|    CPU   | 0.5.0   | 0.5.0                 | Ubuntu 16 | 8.2.0       |
+|          | <=0.4.0 | 0.4.0                 | CentOS 7  | 4.8.5       |
+|  Cuda9.0 | 0.5.0   | 0.5.0-cuda9.0-cudnn7   | Ubuntu 16 | 4.8.5      |
+|          | <=0.4.0 | 0.4.0-cuda9.0-cudnn7  | CentOS 7  | 4.8.5       |
+| Cuda10.0 | 0.5.0   | 0.5.0-cuda10.0-cudnn7 | Ubuntu 16 | 4.8.5       |
+|          | <=0.4.0 | 0.4.0-cuda10.0-cudnn7 | CentOS 7  | 4.8.5       |
+| Cuda10.1 | 0.5.0   | 0.5.0-cuda10.1-cudnn7 | Ubuntu 16 | 8.2.0       |
+|          | <=0.4.0 | 0.4.0-cuda10.1-cudnn7 | CentOS 7  | 4.8.5       |
+| Cuda10.2 | 0.5.0   | 0.5.0-cuda10.2-cudnn8 | Ubuntu 16 | 8.2.0       |
+|          | <=0.4.0 | Nan                   | Nan       | Nan         |
+| Cuda11.0 | 0.5.0   | 0.5.0-cuda11.0-cudnn8 | Ubuntu 18 | 8.2.0       |
+|          | <=0.4.0 | Nan                   | Nan       | Nan         |
--- a/doc/DOCKER_IMAGES_CN.md
+++ b/doc/DOCKER_IMAGES_CN.md
@@ -8,11 +8,10 @@
 您可以通过两种方式获取镜像。
-1. 通过 TAG 直接从 `hub.baidubce.com ` 或 `docker.io` 拉取镜像：
+1. 通过 TAG 直接从 `registry.baidubce.com ` 或 拉取镜像，具体TAG请参见下文的**镜像说明**章节的表格。
   ```shell
-   docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
+   docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
-   docker pull paddlepaddle/serving:<TAG> # hub.docker.com
   ```
 2. 基于 Dockerfile 构建镜像
@@ -20,7 +19,8 @@
   建立新目录，复制对应 Dockerfile 内容到该目录下 Dockerfile 文件。执行
   ```shell
-   docker build -t <image-name>:<images-tag> .
+   cd tools
+   docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
   ```
@@ -29,6 +29,8 @@
 运行时镜像不能用于开发编译。
 若需要基于源代码二次开发编译，请使用后缀为-devel的版本。
+**在TAG列，latest也可以替换成对应的版本号，例如0.5.0/0.4.1等，但需要注意的是，部分开发环境随着某个版本迭代才增加，因此并非所有环境都有对应的版本号可以使用。**
 |                         镜像选择                         |   操作系统    |             TAG              |                          Dockerfile                          |
 | :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
@@ -47,14 +49,15 @@
 **Java镜像：**
 ```
-hub.baidubce.com/paddlepaddle/serving:latest-java
+registry.baidubce.com/paddlepaddle/serving:latest-java
 ```
 **XPU镜像：**
 ```
-hub.baidubce.com/paddlepaddle/serving:xpu-beta
+registry.baidubce.com/paddlepaddle/serving:xpu-beta
 ```
 ## 运行CUDA容器的要求
 运行CUDA容器需要至少具有一个支持CUDA的GPU以及与您所使用的CUDA工具包版本兼容的驱动程序。
@@ -62,3 +65,40 @@ hub.baidubce.com/paddlepaddle/serving:xpu-beta
 运行CUDA容器的机器**只需要相应的NVIDIA驱动程序**，而CUDA工具包不是必要的。
 相关CUDA工具包版本、驱动版本和GPU架构的关系请参阅 [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)。
+# （附录）所有镜像列表
+编译镜像：
+| Env      | Version | Docker images tag            | OS        | Gcc Version |
+|----------|---------|------------------------------|-----------|-------------|
+|    CPU   | 0.5.0   | 0.5.0-devel                 | Ubuntu 16 |  8.2.0       |
+|          | <=0.4.0 | 0.4.0-devel                  | CentOS 7  | 4.8.5       |
+|  Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7-devel    | Ubuntu 16 |  4.8.5       |
+|          | <=0.4.0 | 0.4.0-cuda9.0-cudnn7-devel   | CentOS 7  | 4.8.5       |
+| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7-devel | Ubuntu 16 |    4.8.5       |
+|          | <=0.4.0 | 0.4.0-cuda10.0-cudnn7-devel  | CentOS 7  | 4.8.5       |
+| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel    | CentOS 7  | 4.8.5     |
+| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | Nan                          | Nan       | Nan         |
+| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8-devel | Ubuntu 18 |    8.2.0       |
+|          | <=0.4.0 | Nan                          | Nan       | Nan         |
+运行镜像:
+| Env      | Version | Docker images tag     | OS        | Gcc Version |
+|----------|---------|-----------------------|-----------|-------------|
+|    CPU   | 0.5.0   | 0.5.0                 | Ubuntu 16 | 8.2.0       |
+|          | <=0.4.0 | 0.4.0                 | CentOS 7  | 4.8.5       |
+|  Cuda9.0 | 0.5.0   | 0.5.0-cuda9.0-cudnn7   | Ubuntu 16 | 4.8.5      |
+|          | <=0.4.0 | 0.4.0-cuda9.0-cudnn7  | CentOS 7  | 4.8.5       |
+| Cuda10.0 | 0.5.0   | 0.5.0-cuda10.0-cudnn7 | Ubuntu 16 | 4.8.5       |
+|          | <=0.4.0 | 0.4.0-cuda10.0-cudnn7 | CentOS 7  | 4.8.5       |
+| Cuda10.1 | 0.5.0   | 0.5.0-cuda10.1-cudnn7 | Ubuntu 16 | 8.2.0       |
+|          | <=0.4.0 | 0.4.0-cuda10.1-cudnn7 | CentOS 7  | 4.8.5       |
+| Cuda10.2 | 0.5.0   | 0.5.0-cuda10.2-cudnn8 | Ubuntu 16 | 8.2.0       |
+|          | <=0.4.0 | Nan                   | Nan       | Nan         |
+| Cuda11.0 | 0.5.0   | 0.5.0-cuda11.0-cudnn8 | Ubuntu 18 | 8.2.0       |
+|          | <=0.4.0 | Nan                   | Nan       | Nan         |
--- a/doc/ENCRYPTION.md
+++ b/doc/ENCRYPTION.md
@@ -42,11 +42,3 @@ Once the server gets the key, it uses the key to parse the model and starts the
 ### Example of Model Encryption Inference
 Example of model encryption inference, See the [`/python/examples/encryption/`](../python/examples/encryption/)。
-### Other Details
-Interface of encryption method in paddlepaddle official website:
-[Python encryption method](https://github.com/HexToString/Serving/blob/develop/python/paddle_serving_app/local_predict.py)
-[C++ encryption method](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html#analysispre)
--- a/doc/ENCRYPTION_CN.md
+++ b/doc/ENCRYPTION_CN.md
@@ -42,11 +42,3 @@ python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --
 ### 模型加密推理示例
 模型加密推理示例, 请参见[`/python/examples/encryption/`](../python/examples/encryption/)。
-### 其他详细信息
-飞桨官方网站加密方法接口
-[Python加密方法接口](https://github.com/HexToString/Serving/blob/develop/python/paddle_serving_app/local_predict.py)
-[C++加密方法接口](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html#analysispre)
--- a/doc/GRPC_IMPL_CN.md
+++ b/doc/GRPC_IMPL_CN.md
@@ -115,24 +115,12 @@ python test_asyn_client.py
 python test_batch_client.py
 ```
-#### 通用 pb 预测
-``` shell
-python test_general_pb_client.py
-```
 #### 预测超时
 ``` shell
 python test_timeout_client.py
 ```
-#### List 输入
-``` shell
-python test_list_input_client.py
-```
 ## 3.更多示例
 详见[`python/examples/grpc_impl_example`](../python/examples/grpc_impl_example)下的示例文件。
--- a/doc/JAVA_SDK.md
+++ b/doc/JAVA_SDK.md
@@ -18,7 +18,7 @@ The following table shows compatibilities between Paddle Serving Server and Java
 | Paddle Serving Server version | Java SDK version |
 | :---------------------------: | :--------------: |
-|             0.3.2             |      0.0.1       |
+|             0.5.0             |      0.0.1       |
 1.    Directly use the provided Java SDK as the client for prediction
 ### Install Java SDK

--- a/doc/JAVA_SDK_CN.md
+++ b/doc/JAVA_SDK_CN.md
@@ -17,7 +17,7 @@ Paddle Serving 提供了 Java SDK，支持 Client 端用 Java 语言进行预测
 | Paddle Serving Server version | Java SDK version |
 | :---------------------------: | :--------------: |
-|             0.3.2             |      0.0.1       |
+|             0.5.0             |      0.0.1       |
 1.    直接使用提供的Java SDK作为Client进行预测
 ### 安装

--- a/doc/LATEST_PACKAGES.md
+++ b/doc/LATEST_PACKAGES.md
@@ -87,3 +87,34 @@ https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_client-0.0.0-cp36-no
 # App
 https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_app-0.0.0-py3-none-any.whl 
 ```
+### Binary Package
+for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
+#### Bin links
+```
+# CPU AVX MKL
+https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-mkl-0.0.0.tar.gz
+# CPU AVX OPENBLAS
+https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-openblas-0.0.0.tar.gz
+# CPU NOAVX OPENBLAS
+https://paddle-serving.bj.bcebos.com/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz
+# Cuda 9
+https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda9-0.0.0.tar.gz
+# Cuda 10
+https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda10-0.0.0.tar.gz
+# Cuda 10.1
+https://paddle-serving.bj.bcebos.com/bin/serving-gpu-101-0.0.0.tar.gz
+# Cuda 10.2
+https://paddle-serving.bj.bcebos.com/bin/serving-gpu-102-0.0.0.tar.gz
+# Cuda 11
+https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda11-0.0.0.tar.gz
+```
+#### How to setup SERVING_BIN offline?
+- download the serving server whl package and bin package, and make sure they are for the same environment
+- download the serving client whl and serving app whl, pay attention to the Python version.
+- `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda10-0.0.0/serving` (take Cuda 10.0 as the example)
--- a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
+++ b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
@@ -5,7 +5,7 @@
 例如：
 ```shell
-python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
+python -m paddle_serving_server_gpu.serve --model bert_seq128_model --port 9292 --gpu_ids 0
 python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
 ```

--- a/doc/RUN_IN_DOCKER.md
+++ b/doc/RUN_IN_DOCKER.md
@@ -17,14 +17,14 @@ This document takes Python2 as an example to show how to run Paddle Serving in d
 Refer to [this document](DOCKER_IMAGES.md) for a docker image:
 ```shell
-docker pull hub.baidubce.com/paddlepaddle/serving:latest
+docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
 ```
 ### Create container
 ```bash
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-devel
 docker exec -it test bash
 ```
@@ -46,20 +46,20 @@ The GPU version is basically the same as the CPU version, with only some differe
 Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
 ```shell
-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
 ```
 ### Create container
 ```bash
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
 nvidia-docker exec -it test bash
 ```
 or
 ```bash
-docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker run --gpus all -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
 docker exec -it test bash
 ```
@@ -69,7 +69,7 @@ The `-p` option is to map the `9292` port of the container to the `9292` port of
 The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
-If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.
+If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version. [LATEST_PACKAGES](./LATEST_PACKAGES.md)
 ## Precautious

--- a/doc/RUN_IN_DOCKER_CN.md
+++ b/doc/RUN_IN_DOCKER_CN.md
@@ -16,14 +16,16 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
 参考[该文档](DOCKER_IMAGES_CN.md)获取镜像：
+以CPU编译镜像为例
 ```shell
-docker pull hub.baidubce.com/paddlepaddle/serving:latest
+docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
 ```
 ### 创建容器并进入
 ```bash
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-devel
 docker exec -it test bash
 ```
@@ -37,15 +39,19 @@ docker exec -it test bash
 ## GPU 版本
+```shell
+docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
+```
 ### 创建容器并进入
 ```bash
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
 nvidia-docker exec -it test bash
 ```
 或者
 ```bash
-docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker run --gpus all -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
 docker exec -it test bash
 ```
@@ -55,7 +61,7 @@ docker exec -it test bash
 镜像里自带对应镜像tag版本的`paddle_serving_server_gpu`，`paddle_serving_client`，`paddle_serving_app`，如果用户不需要更改版本，可以直接使用，适用于没有外网服务的环境。
-如果需要更换版本，请参照首页的指导，下载对应版本的pip包。
+如果需要更换版本，请参照首页的指导，下载对应版本的pip包。[最新安装包合集](LATEST_PACKAGES.md)
 ## 注意事项

--- a/doc/SAVE.md
+++ b/doc/SAVE.md
@@ -2,7 +2,74 @@
 ([简体中文](./SAVE_CN.md)|English)
-## Save from training or prediction script 
+## Export from saved model files
+you can use a build-in python module called `paddle_serving_client.convert` to convert it.
+```python
+python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
+```
+If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
+```python
+import paddle_serving_client.io as serving_io
+serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
+```
+Arguments are the same as `inference_model_to_serving` API.
+| Argument | Type | Default | Description |
+|--------------|------|-----------|--------------------------------|
+| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
+| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
+| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
+| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
+| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
+**Demo: Convert From Dynamic Graph**
+PaddlePaddle 2.0 provides a new dynamic graph mode, so here we use imagenet ResNet50 dynamic graph as an example to teach how to export from a saved model and use it for real online inference scenarios.
+```
+wget https://paddle-serving.bj.bcebos.com/others/dygraph_res50.tar #模型
+wget https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg #示例输入（向日葵）
+tar xf dygraph_res50.tar
+python -m paddle_serving_client.convert --dirname . --model_filename dygraph_model.pdmodel --params_filename dygraph_model.pdiparams --serving_server serving_server --serving_client serving_client```
+We can see that the `serving_server` and `serving_client` folders hold the server and client configuration of the model respectively
+Start the server (GPU)
+```
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
+```
+Client (`test_client.py`)
+```
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+client = Client()
+client.load_client_config(
+    "serving_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"inputs": img}, fetch=["save_infer_model/scale_0.tmp_0"])
+print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
+```
+Run
+```
+python test_client.py
+```
+You can see that the prediction has been successfully executed. The above is the content predicted by the dynamic graph ResNet50 model on Serving. The use of other dynamic graph models is similar.
+## Save from training or prediction script (Static Graph Mode) 
 Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
 ``` python
 import paddle_serving_client.io as serving_io
@@ -31,22 +98,3 @@ for line in sys.stdin:
    fetch_map = client.predict(feed=feed, fetch=fetch)
    print("{} {}".format(fetch_map["prediction"][1], label[0]))
 ```
-## Export from saved model files
-If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
-```python
-import paddle_serving_client.io as serving_io
-serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
-```
-Or you can use a build-in python module called `paddle_serving_client.convert` to convert it.
-```python
-python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
-```
-Arguments are the same as `inference_model_to_serving` API.
-| Argument | Type | Default | Description |
-|--------------|------|-----------|--------------------------------|
-| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
-| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
-| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
-| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
-| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
--- a/doc/SAVE_CN.md
+++ b/doc/SAVE_CN.md
@@ -2,14 +2,79 @@
 (简体中文|[English](./SAVE.md))
-## 从训练或预测脚本中保存
+## 从已保存的模型文件中导出
+如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型，你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
+```python
+python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
+```
+也可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
+```python
+import paddle_serving_client.io as serving_io
+serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client",  model_filename=None, params_filename=None)
+```
+模块参数与`inference_model_to_serving`接口参数相同。
+| 参数 | 类型 | 默认值 | 描述 |
+|--------------|------|-----------|--------------------------------|
+| `dirname` | str | - | 需要转换的模型文件存储路径，Program结构文件和参数文件均保存在此目录。|
+| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
+| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
+| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None，则使用 `__model__` 作为默认的文件名 |
+| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中，它才需要被指定。如果模型参数是存储在各自分离的文件中，设置它的值为None |
+**示例：从动态图模型中导出**
+PaddlePaddle 2.0提供了全新的动态图模式，因此我们这里以imagenet ResNet50动态图为示例教学如何从已保存模型导出，并用于真实的在线预测场景。
+```
+wget https://paddle-serving.bj.bcebos.com/others/dygraph_res50.tar #模型
+wget https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg #示例输入（向日葵）
+tar xf dygraph_res50.tar
+python -m paddle_serving_client.convert --dirname . --model_filename dygraph_model.pdmodel --params_filename dygraph_model.pdiparams --serving_server serving_server --serving_client serving_client
+```
+我们可以看到`serving_server`和`serving_client`文件夹分别保存着模型的服务端和客户端配置
+启动服务端（GPU）
+```
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
+```
+客户端写法，保存为`test_client.py`
+```
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+client = Client()
+client.load_client_config(
+    "serving_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"inputs": img}, fetch=["save_infer_model/scale_0.tmp_0"])
+print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
+```
+执行
+```
+python test_client.py
+```
+即可看到成功的执行了预测，以上就是动态图ResNet50模型在Serving上预测的内容，其他动态图模型使用方式与之类似。
+## 从训练或预测脚本中保存(静态图)
 目前，Paddle Serving提供了一个save_model接口供用户访问，该接口与Paddle的`save_inference_model`类似。
 ``` python
 import paddle_serving_client.io as serving_io
 serving_io.save_model("imdb_model", "imdb_client_conf",
                      {"words": data}, {"prediction": prediction},
-                      fluid.default_main_program())
+                      paddle.static.default_main_program())
 ```
 imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。
@@ -32,22 +97,3 @@ for line in sys.stdin:
    fetch_map = client.predict(feed=feed, fetch=fetch)
    print("{} {}".format(fetch_map["prediction"][1], label[0]))
 ```
-## 从已保存的模型文件中导出
-如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型，则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
-```python
-import paddle_serving_client.io as serving_io
-serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client",  model_filename=None, params_filename=None)
-```
-或者你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
-```python
-python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
-```
-模块参数与`inference_model_to_serving`接口参数相同。
-| 参数 | 类型 | 默认值 | 描述 |
-|--------------|------|-----------|--------------------------------|
-| `dirname` | str | - | 需要转换的模型文件存储路径，Program结构文件和参数文件均保存在此目录。|
-| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
-| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
-| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None，则使用 `__model__` 作为默认的文件名 |
-| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的二进制文件中，它才需要被指定。如果模型参数是存储在各自分离的文件中，设置它的值为None |
--- a/doc/TRAIN_TO_SERVICE.md
+++ b/doc/TRAIN_TO_SERVICE.md
-# An End-to-end Tutorial from Training to Inference Service Deployment
-([简体中文](./TRAIN_TO_SERVICE_CN.md)|English)
-Paddle Serving is Paddle's high-performance online inference service framework, which can flexibly support the deployment of most models. In this article, the IMDB review sentiment analysis task is used as an example to show the entire process from model training to deployment of inference service through 9 steps.
-## Step1：Prepare for Running Environment
-Paddle Serving can be deployed on Linux environments.Currently the server supports deployment on Centos7. [Docker deployment is recommended](RUN_IN_DOCKER.md). The rpc client supports deploymen on Centos7 and Ubuntu 18.On other systems or in environments where you do not want to install the serving module, you can still access the server-side prediction service through the http service.
-You can choose to install the cpu or gpu version of the server module according to the requirements and machine environment, and install the client module on the client machine. When you want to access the server with http, there is not need to install client module.
-```shell
-pip install paddle_serving_server #cpu version server side 
-pip install paddle_serving_server_gpu #gpu version server side
-pip install paddle_serving_client #client version
-```
-After simple preparation, we will take the IMDB review sentiment analysis task as an example to show the process from model training to deployment of prediction services. All the code in the example can be found in the [IMDB example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb) of the Paddle Serving code base, the data and dictionary used in the example The file can be obtained by executing the get_data.sh script in the IMDB sample code.
-## Step2：Determine Tasks and Raw Data Format
-IMDB review sentiment analysis task is to classify the content of movie reviews to determine whether the review is a positive review or a negative review.
-First let's take a look at the raw data:
-```
-saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
-```
-This is a sample of English comments. The sample uses | as the separator. The content of the comment is before the separator. The label is the sample after the separator. 0 is the negative while 1 is the positive.
-## Step3：Define Reader, divide training set and test set
-For the original text we need to convert it to a numeric id that the neural network can use. The imdb_reader.py script defines the method of text idization, and the words are mapped to integers through the dictionary file imdb.vocab.
-<details>
-  <summary>imdb_reader.py</summary>
-```python
-import sys
-import os
-import paddle
-import re
-import paddle.fluid.incubate.data_generator as dg
-class IMDBDataset(dg.MultiSlotDataGenerator):
-    def load_resource(self, dictfile):
-        self._vocab = {}
-        wid = 0
-        with open(dictfile) as f:
-            for line in f:
-                self._vocab[line.strip()] = wid
-                wid += 1
-        self._unk_id = len(self._vocab)
-        self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
-        self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
-    def get_words_only(self, line):
-        sent = line.lower().replace("<br />", " ").strip()
-        words = [x for x in self._pattern.split(sent) if x and x != " "]
-        feas = [
-            self._vocab[x] if x in self._vocab else self._unk_id for x in words
-        ]
-        return feas
-    def get_words_and_label(self, line):
-        send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
-                                                              " ").strip()
-        label = [int(line.split('|')[-1])]
-        words = [x for x in self._pattern.split(send) if x and x != " "]
-        feas = [
-            self._vocab[x] if x in self._vocab else self._unk_id for x in words
-        ]
-        return feas, label
-    def infer_reader(self, infer_filelist, batch, buf_size):
-        def local_iter():
-            for fname in infer_filelist:
-                with open(fname, "r") as fin:
-                    for line in fin:
-                        feas, label = self.get_words_and_label(line)
-                        yield feas, label
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-    def generate_sample(self, line):
-        def memory_iter():
-            for i in range(1000):
-                yield self.return_value
-        def data_iter():
-            feas, label = self.get_words_and_label(line)
-            yield ("words", feas), ("label", label)
-        return data_iter
-```
-</details>
-The sample after mapping is similar to the following format:
-```
-257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
-```
-In this way, the neural network can train the transformed text information as feature values.
-## Step4：Define CNN network for training and saving
-Net we use [CNN Model](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn) for training, in nets.py we define the network structure.
-<details>
-  <summary>nets.py</summary>
-```python
-import sys
-import time
-import numpy as np
-import paddle
-import paddle.fluid as fluid
-def cnn_net(data,
-            label,
-            dict_dim,
-            emb_dim=128,
-            hid_dim=128,
-            hid_dim2=96,
-            class_dim=2,
-            win_size=3):
-    """ conv net. """
-    emb = fluid.layers.embedding(
-        input=data, size=[dict_dim, emb_dim], is_sparse=True)
-    conv_3 = fluid.nets.sequence_conv_pool(
-        input=emb,
-        num_filters=hid_dim,
-        filter_size=win_size,
-        act="tanh",
-        pool_type="max")
-    fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
-    prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
-    cost = fluid.layers.cross_entropy(input=prediction, label=label)
-    avg_cost = fluid.layers.mean(x=cost)
-    acc = fluid.layers.accuracy(input=prediction, label=label)
-    return avg_cost, acc, prediction
-```
-</details>
-Use training dataset for training. The training script is local_train.py. After training, use the paddle_serving_client.io.save_model function to save the model files and configuration files used by the  servingdeployment.
-<details>
-  <summary>local_train.py</summary>
-```python
-import os
-import sys
-import paddle
-import logging
-import paddle.fluid as fluid
-logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
-logger = logging.getLogger("fluid")
-logger.setLevel(logging.INFO)
-# load dict file
-def load_vocab(filename):
-    vocab = {}
-    with open(filename) as f:
-        wid = 0
-        for line in f:
-            vocab[line.strip()] = wid
-            wid += 1
-    vocab["<unk>"] = len(vocab)
-    return vocab
-if __name__ == "__main__":
-    from nets import cnn_net
-    model_name = "imdb_cnn"
-    vocab = load_vocab('imdb.vocab')
-    dict_dim = len(vocab)
-    #define model input
-    data = fluid.layers.data(
-        name="words", shape=[1], dtype="int64", lod_level=1)
-    label = fluid.layers.data(name="label", shape=[1], dtype="int64")
-    #define dataset，train_data is the dataset directory
-    dataset = fluid.DatasetFactory().create_dataset()
-    filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
-    dataset.set_use_var([data, label])
-    pipe_command = "python imdb_reader.py"
-    dataset.set_pipe_command(pipe_command)
-    dataset.set_batch_size(4)
-    dataset.set_filelist(filelist)
-    dataset.set_thread(10)
-    #define model
-    avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
-    optimizer = fluid.optimizer.SGD(learning_rate=0.001)
-    optimizer.minimize(avg_cost)
-    #execute training
-    exe = fluid.Executor(fluid.CPUPlace())
-    exe.run(fluid.default_startup_program())
-    epochs = 100
-    import paddle_serving_client.io as serving_io
-    for i in range(epochs):
-        exe.train_from_dataset(
-            program=fluid.default_main_program(), dataset=dataset, debug=False)
-        logger.info("TRAIN --> pass: {}".format(i))
-        if i == 64:
-            #At the end of training, use the model save interface in PaddleServing to save the models and configuration files required by Serving
-            serving_io.save_model("{}_model".format(model_name),
-                                  "{}_client_conf".format(model_name),
-                                  {"words": data}, {"prediction": prediction},
-                                  fluid.default_main_program())
-```
-</details>
-![Training process](./imdb_loss.png) As can be seen from the above figure, the loss of the model starts to converge after the 65th round. We save the model and configuration file after the 65th round of training is completed. The saved files are divided into imdb_cnn_client_conf and imdb_cnn_model folders. The former contains client-side configuration files, and the latter contains server-side configuration files and saved model files.
-The parameter list of the save_model function is as follows:
-| Parameter            | Meaning                                                        |
-| -------------------- | ------------------------------------------------------------ |
-| server_model_folder  |  Directory for server-side configuration files and model files |
-| client_config_folder | Directory for saving client configuration files              |
-| feed_var_dict        | The input of the inference model. The dict type and key can be customized. The value is the input variable in the model. Each key corresponds to a variable. When using the prediction service, the input data uses the key as the input name. |
-| fetch_var_dict       | The output of the model used for prediction, dict type, key can be customized, value is the input variable in the model, and each key corresponds to a variable. When using the prediction service, use the key to get the returned data  |
-| main_program         | Model's program                                                |
-## Step5: Deploy RPC Prediction Service
-The Paddle Serving framework supports two types of prediction service methods. One is to communicate through RPC and the other is to communicate through HTTP. The deployment and use of RPC prediction service will be introduced first. The deployment and use of HTTP prediction service will be introduced at Step 8. .
-```shell
-python -m paddle_serving_server.serve --model imdb_cnn_model / --port 9292 #cpu prediction service
-python -m paddle_serving_server_gpu.serve --model imdb_cnn_model / --port 9292 --gpu_ids 0 #gpu prediction service
-```
-The parameter --model in the command specifies the server-side model and configuration file directory previously saved, --port specifies the port of the prediction service. When deploying the gpu prediction service using the gpu version, you can use --gpu_ids to specify the gpu used.
-After executing one of the above commands, the RPC prediction service deployment of the IMDB sentiment analysis task is completed.
-## Step6: Reuse Reader, define remote RPC client
-Below we access the RPC prediction service through Python code, the script is test_client.py
-<details>
-  <summary>test_client.py</summary>
-```python
-from paddle_serving_client import Client
-from imdb_reader import IMDBDataset
-import sys
-client = Client()
-client.load_client_config(sys.argv[1])
-client.connect(["127.0.0.1:9292"])
-#The code of the data preprocessing part is reused here to convert the original text into a numeric id
-imdb_dataset = IMDBDataset()
-imdb_dataset.load_resource(sys.argv[2])
-for line in sys.stdin:
-    word_ids, label = imdb_dataset.get_words_and_label(line)
-    feed = {"words": word_ids}
-    fetch = ["acc", "cost", "prediction"]
-    fetch_map = client.predict(feed=feed, fetch=fetch)
-    print("{} {}".format(fetch_map["prediction"][1], label[0]))
-```
-</details>
-The script receives data from standard input and prints out the probability that the sample whose infer result is 1 and its real label.
-## Step7: Call the RPC service to test the model effect
-The client implemented in the previous step runs the prediction service as an example. The usage method is as follows:
-```shell
-cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
-```
-Using 2084 samples in the test_data/part-0 file for test testing, the model prediction accuracy is 88.19%.
-**Note**: The effect of each model training may be slightly different, and the accuracy of predictions using the trained model will be close to the examples but may not be exactly the same.
-## Step8: Deploy HTTP Prediction Service
-When using the HTTP prediction service, the client does not need to install any modules of Paddle Serving, it only needs to be able to send HTTP requests. Of course, the HTTP method consumes more time in the communication phase than the RPC method.
-For the IMDB sentiment analysis task, the original text needs to be preprocessed before prediction. In the RPC prediction service, we put the preprocessing in the client's script, and in the HTTP prediction service, we put the preprocessing on the server. Paddle Serving's HTTP prediction service framework prepares data pre-processing and post-processing interfaces for this situation. We just need to rewrite it according to the needs of the task.
-Serving provides sample code, which is obtained by executing the imdb_web_service_demo.sh script in [IMDB Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb).
-Let's take a look at the script text_classify_service.py that starts the HTTP prediction service.
-<details>
-  <summary>text_clssify_service.py</summary>
-```python
-from paddle_serving_server.web_service import WebService
-from imdb_reader import IMDBDataset
-import sys
-#extend class WebService
-class IMDBService(WebService):
-    def prepare_dict(self, args={}):
-        if len(args) == 0:
-            exit(-1)
-        self.dataset = IMDBDataset()
-        self.dataset.load_resource(args["dict_file_path"])
-		#rewrite preprocess() to implement data preprocessing, here we reuse reader script for training
-    def preprocess(self, feed={}, fetch=[]):
-        if "words" not in feed:
-            exit(-1)
-        res_feed = {}
-        res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
-        return res_feed, fetch
-#Here you need to use the name parameter to specify the name of the prediction service.
-imdb_service = IMDBService(name="imdb")
-imdb_service.load_model_config(sys.argv[1])
-imdb_service.prepare_server(
-    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
-imdb_service.run_server()
-```
-</details>
-run
-```shell
-python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
-```
-In the above command, the first parameter is the saved server-side model and configuration file. The second parameter is the working directory, which will save some configuration files for the prediction service. The directory may not exist but needs to be specified. The prediction service will be created by itself. the third parameter is Port number, the fourth parameter is the dictionary file.
-## Step9: Call the prediction service with plaintext data
-After starting the HTTP prediction service, you can make prediction with a single command:
-```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
-```
-When the inference process is normal, the prediction probability is returned, as shown below.
-```
-{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
-```
-**Note**: The effect of each model training may be slightly different, and the inferred probability value using the trained model may not be consistent with the example.
--- a/doc/TRAIN_TO_SERVICE_CN.md
+++ b/doc/TRAIN_TO_SERVICE_CN.md
-# 端到端完成从训练到部署全流程
-(简体中文|[English](./TRAIN_TO_SERVICE.md))
-Paddle Serving是Paddle的高性能在线预测服务框架，可以灵活支持大多数模型的部署。本文中将以IMDB评论情感分析任务为例通过9步展示从模型的训练到部署预测服务的全流程。
-## Step1：准备环境
-Paddle Serving可以部署在Linux环境上，目前server端支持在Centos7上部署，推荐使用[Docker部署](RUN_IN_DOCKER_CN.md)。rpc client端可以在Centos7和Ubuntu18上部署，在其他系统上或者不希望安装serving模块的环境中仍然可以通过http服务来访问server端的预测服务。
-可以根据需求和机器环境来选择安装cpu或gpu版本的server模块，在client端机器上安装client模块。使用http请求的方式来访问server时，client端机器不需要安装client模块。
-```shell
-pip install paddle_serving_server #cpu版本server端
-pip install paddle_serving_server_gpu #gpu版本server端
-pip install paddle_serving_client #client端
-```
-简单准备后，我们将以IMDB评论情感分析任务为例，展示从模型训练到部署预测服务的流程。示例中的所有代码都可以在Paddle Serving代码库的[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中找到，示例中使用的数据和词典文件可以通过执行IMDB示例代码中的get_data.sh脚本得到。
-## Step2：确定任务和原始数据格式
-IMDB评论情感分析任务是对电影评论的内容进行二分类，判断该评论是属于正面评论还是负面评论。
-首先我们来看一下原始的数据：
-```
-saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
-```
-这是一条英文评论样本，样本中使用|作为分隔符，分隔符之前为评论的内容，分隔符之后是样本的标签，0代表负样本，即负面评论，1代表正样本，即正面评论。
-## Step3：定义Reader，划分训练集、测试集
-对于原始文本我们需要将它转化为神经网络可以使用的数字id。imdb_reader.py脚本中定义了文本id化的方法，通过词典文件imdb.vocab将单词映射为整形数。
-<details>
-  <summary>imdb_reader.py</summary>
-```python
-import sys
-import os
-import paddle
-import re
-import paddle.fluid.incubate.data_generator as dg
-class IMDBDataset(dg.MultiSlotDataGenerator):
-    def load_resource(self, dictfile):
-        self._vocab = {}
-        wid = 0
-        with open(dictfile) as f:
-            for line in f:
-                self._vocab[line.strip()] = wid
-                wid += 1
-        self._unk_id = len(self._vocab)
-        self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
-        self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
-    def get_words_only(self, line):
-        sent = line.lower().replace("<br />", " ").strip()
-        words = [x for x in self._pattern.split(sent) if x and x != " "]
-        feas = [
-            self._vocab[x] if x in self._vocab else self._unk_id for x in words
-        ]
-        return feas
-    def get_words_and_label(self, line):
-        send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
-                                                              " ").strip()
-        label = [int(line.split('|')[-1])]
-        words = [x for x in self._pattern.split(send) if x and x != " "]
-        feas = [
-            self._vocab[x] if x in self._vocab else self._unk_id for x in words
-        ]
-        return feas, label
-    def infer_reader(self, infer_filelist, batch, buf_size):
-        def local_iter():
-            for fname in infer_filelist:
-                with open(fname, "r") as fin:
-                    for line in fin:
-                        feas, label = self.get_words_and_label(line)
-                        yield feas, label
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-    def generate_sample(self, line):
-        def memory_iter():
-            for i in range(1000):
-                yield self.return_value
-        def data_iter():
-            feas, label = self.get_words_and_label(line)
-            yield ("words", feas), ("label", label)
-        return data_iter
-```
-</details>
-映射之后的样本类似于以下的格式：
-```
-257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
-```
-这样神经网络就可以将转化后的文本信息作为特征值进行训练。
-## Step4：定义CNN网络进行训练并保存
-接下来我们使用[CNN模型](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn)来进行训练。在nets.py脚本中定义网络结构。
-<details>
-  <summary>nets.py</summary>
-```python
-import sys
-import time
-import numpy as np
-import paddle
-import paddle.fluid as fluid
-def cnn_net(data,
-            label,
-            dict_dim,
-            emb_dim=128,
-            hid_dim=128,
-            hid_dim2=96,
-            class_dim=2,
-            win_size=3):
-    """ conv net. """
-    emb = fluid.layers.embedding(
-        input=data, size=[dict_dim, emb_dim], is_sparse=True)
-    conv_3 = fluid.nets.sequence_conv_pool(
-        input=emb,
-        num_filters=hid_dim,
-        filter_size=win_size,
-        act="tanh",
-        pool_type="max")
-    fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
-    prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
-    cost = fluid.layers.cross_entropy(input=prediction, label=label)
-    avg_cost = fluid.layers.mean(x=cost)
-    acc = fluid.layers.accuracy(input=prediction, label=label)
-    return avg_cost, acc, prediction
-```
-</details>
-使用训练样本进行训练，训练脚本为local_train.py。在训练结束后使用paddle_serving_client.io.save_model函数来保存部署预测服务使用的模型文件和配置文件。
-<details>
-  <summary>local_train.py</summary>
-```python
-import os
-import sys
-import paddle
-import logging
-import paddle.fluid as fluid
-logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
-logger = logging.getLogger("fluid")
-logger.setLevel(logging.INFO)
-# 加载词典文件
-def load_vocab(filename):
-    vocab = {}
-    with open(filename) as f:
-        wid = 0
-        for line in f:
-            vocab[line.strip()] = wid
-            wid += 1
-    vocab["<unk>"] = len(vocab)
-    return vocab
-if __name__ == "__main__":
-    from nets import cnn_net
-    model_name = "imdb_cnn"
-    vocab = load_vocab('imdb.vocab')
-    dict_dim = len(vocab)
-    #定义模型输入
-    data = fluid.layers.data(
-        name="words", shape=[1], dtype="int64", lod_level=1)
-    label = fluid.layers.data(name="label", shape=[1], dtype="int64")
-    #定义dataset，train_data为训练数据目录
-    dataset = fluid.DatasetFactory().create_dataset()
-    filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
-    dataset.set_use_var([data, label])
-    pipe_command = "python imdb_reader.py"
-    dataset.set_pipe_command(pipe_command)
-    dataset.set_batch_size(4)
-    dataset.set_filelist(filelist)
-    dataset.set_thread(10)
-    #定义模型
-    avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
-    optimizer = fluid.optimizer.SGD(learning_rate=0.001)
-    optimizer.minimize(avg_cost)
-    #执行训练
-    exe = fluid.Executor(fluid.CPUPlace())
-    exe.run(fluid.default_startup_program())
-    epochs = 100
-    import paddle_serving_client.io as serving_io
-    for i in range(epochs):
-        exe.train_from_dataset(
-            program=fluid.default_main_program(), dataset=dataset, debug=False)
-        logger.info("TRAIN --> pass: {}".format(i))
-        if i == 64:
-            #在训练结束时使用PaddleServing中的模型保存接口保存出Serving所需的模型和配置文件
-            serving_io.save_model("{}_model".format(model_name),
-                                  "{}_client_conf".format(model_name),
-                                  {"words": data}, {"prediction": prediction},
-                                  fluid.default_main_program())
-```
-</details>
-![训练过程](./imdb_loss.png)由上图可以看出模型的损失在第65轮之后开始收敛，我们在第65轮训练完成后保存模型和配置文件。保存的文件分为imdb_cnn_client_conf和imdb_cnn_model文件夹，前者包含client端的配置文件，后者包含server端的配置文件和保存的模型文件。
-save_model函数的参数列表如下：
-| 参数                 | 含义                                                         |
-| -------------------- | ------------------------------------------------------------ |
-| server_model_folder  | 保存server端配置文件和模型文件的目录                         |
-| client_config_folder | 保存client端配置文件的目录                                   |
-| feed_var_dict        | 用于预测的模型的输入，dict类型，key可以自定义，value为模型中的input variable，每个key对应一个variable，使用预测服务时，输入数据使用key作为输入的名称 |
-| fetch_var_dict       | 用于预测的模型的输出，dict类型，key可以自定义，value为模型中的input variable，每个key对应一个variable，使用预测服务时，通过key来获取返回数据 |
-| main_program         | 模型的program                                                |
-## Step5：部署RPC预测服务
-Paddle Serving框架支持两种预测服务方式，一种是通过RPC进行通信，一种是通过HTTP进行通信，下面将先介绍RPC预测服务的部署和使用方法，在Step8开始介绍HTTP预测服务的部署和使用。
-```shell
-python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292 #cpu预测服务
-python -m paddle_serving_server_gpu.serve --model imdb_cnn_model/ --port 9292 --gpu_ids 0 #gpu预测服务
-```
-命令中参数--model 指定在之前保存的server端的模型和配置文件目录，--port指定预测服务的端口，当使用gpu版本部署gpu预测服务时可以使用--gpu_ids指定使用的gpu 。
-执行完以上命令之一，就完成了IMDB 情感分析任务的RPC预测服务部署。
-## Step6:复用Reader，定义远程RPC客户端
-下面我们通过Python代码来访问RPC预测服务，脚本为test_client.py
-<details>
-  <summary>test_client.py</summary>
-```python
-from paddle_serving_client import Client
-from imdb_reader import IMDBDataset
-import sys
-client = Client()
-client.load_client_config(sys.argv[1])
-client.connect(["127.0.0.1:9292"])
-#在这里复用了数据预处理部分的代码将原始文本转换成数字id
-imdb_dataset = IMDBDataset()
-imdb_dataset.load_resource(sys.argv[2])
-for line in sys.stdin:
-    word_ids, label = imdb_dataset.get_words_and_label(line)
-    feed = {"words": word_ids}
-    fetch = ["acc", "cost", "prediction"]
-    fetch_map = client.predict(feed=feed, fetch=fetch)
-    print("{} {}".format(fetch_map["prediction"][1], label[0]))
-```
-</details>
-脚本从标准输入接收数据，并打印出样本预测为1的概率与真实的label。
-## Step7：调用RPC服务，测试模型效果
-以上一步实现的客户端为例运行预测服务，使用方式如下：
-```shell
-cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
-```
-使用test_data/part-0文件中的2084个样本进行测试测试，模型预测的准确率为88.19%。
-**注意**：每次模型训练的效果可能略有不同，使用训练出的模型预测的准确率会与示例中接近但有可能不完全一致。
-## Step8：部署HTTP预测服务
-使用HTTP预测服务时，client端不需要安装Paddle Serving的任何模块，仅需要能发送HTTP请求即可。当然HTTP的通信方式会相较于RPC的通信方式在通信阶段消耗更多的时间。
-对于IMDB情感分析任务原始文本在预测之前需要进行预处理，在RPC预测服务中我们将预处理放在client的脚本中，而在HTTP预测服务中我们将预处理放在server端。Paddle Serving的HTTP预测服务框架为这种情况准备了数据预处理和后处理的接口，我们只要根据任务需要重写即可。
-Serving提供了示例代码，通过执行[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中的imdb_web_service_demo.sh脚本来获取。
-下面我们来看一下启动HTTP预测服务的脚本text_classify_service.py。
-<details>
-  <summary>text_clssify_service.py</summary>
-```python
-from paddle_serving_server.web_service import WebService
-from imdb_reader import IMDBDataset
-import sys
-#继承框架中的WebService类
-class IMDBService(WebService):
-    def prepare_dict(self, args={}):
-        if len(args) == 0:
-            exit(-1)
-        self.dataset = IMDBDataset()
-        self.dataset.load_resource(args["dict_file_path"])
-		#重写preprocess方法来实现数据预处理，这里也复用了训练时使用的reader脚本
-    def preprocess(self, feed={}, fetch=[]):
-        if "words" not in feed:
-            exit(-1)
-        res_feed = {}
-        res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
-        return res_feed, fetch
-#这里需要使用name参数指定预测服务的名称，
-imdb_service = IMDBService(name="imdb")
-imdb_service.load_model_config(sys.argv[1])
-imdb_service.prepare_server(
-    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
-imdb_service.run_server()
-```
-</details>
-启动命令
-```shell
-python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
-```
-以上命令中参数1为保存的server端模型和配置文件，参数2为工作目录会保存一些预测服务工作时的配置文件，该目录可以不存在但需要指定名称，预测服务会自行创建，参数3为端口号，参数4为词典文件。
-## Step9：明文数据调用预测服务
-启动完HTTP预测服务，即可通过一行命令进行预测：
-```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
-```
-预测流程正常时，会返回预测概率，示例如下。
-```
-{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
-```
-**注意**：每次模型训练的效果可能略有不同，使用训练出的模型预测概率数值可能与示例不一致。
--- a/doc/WINDOWS_TUTORIAL.md
+++ b/doc/WINDOWS_TUTORIAL.md
@@ -117,9 +117,9 @@ Please refer to [Docker Desktop](https://www.docker.com/products/docker-desktop)
 After installation, start the docker linux engine and download the relevant image. In the Serving directory
 ```
-docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
+docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
 # There is no expose port here, users can set -p to perform port mapping as needed
-docker run --rm -dit --name serving_devel -v $PWD:/Serving hub.baidubce.com/paddlepaddle/serving:latest-devel
+docker run --rm -dit --name serving_devel -v $PWD:/Serving registry.baidubce.com/paddlepaddle/serving:latest-devel
 docker exec -it serving_devel bash
 cd /Serving
 ```

--- a/doc/WINDOWS_TUTORIAL_CN.md
+++ b/doc/WINDOWS_TUTORIAL_CN.md
@@ -117,9 +117,9 @@ python your_client.py
 安装之后启动docker的linux engine，下载相关镜像。在Serving目录下
 ```
-docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
+docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
 # 此处没有expose端口，用户可根据需要设置-p来进行端口映射
-docker run --rm -dit --name serving_devel -v $PWD:/Serving hub.baidubce.com/paddlepaddle/serving:latest-devel 
+docker run --rm -dit --name serving_devel -v $PWD:/Serving registry.baidubce.com/paddlepaddle/serving:latest-devel 
 docker exec -it serving_devel bash
 cd /Serving
 ```

--- a/doc/DESIGN.md
+++ b/doc/DESIGN.md
--- a/doc/DESIGN_CN.md
+++ b/doc/DESIGN_CN.md
--- a/java/README.md
+++ b/java/README.md
@@ -7,8 +7,8 @@
 In order to facilitate users to use java for development, we provide the compiled Serving project to be placed in the java mirror. The way to get the mirror and enter the development environment is
 ```
-docker pull hub.baidubce.com/paddlepaddle/serving:0.4.1-java
+docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-java
-docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.1-java
+docker run --rm -dit --name java_serving registry.baidubce.com/paddlepaddle/serving:0.5.0-java
 docker exec -it java_serving bash
 cd Serving/java
 ```

--- a/java/README_CN.md
+++ b/java/README_CN.md
@@ -7,8 +7,8 @@
 为了方便用户使用java进行开发，我们提供了编译好的Serving工程放置在java镜像当中，获取镜像并进入开发环境的方式是
 ```
-docker pull hub.baidubce.com/paddlepaddle/serving:0.4.1-java
+docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-java
-docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.1-java
+docker run --rm -dit --name java_serving registry.baidubce.com/paddlepaddle/serving:0.5.0-java
 docker exec -it java_serving bash
 cd Serving/java
 ```

--- a/python/examples/README.md
+++ b/python/examples/README.md
+## Examples
+### Support `--use_trt`
+the following models support `--use_trt`, which means you can use TensorRT to accelerate inference at Cuda 10.1 or higher.
+- imagenet ResNet50/ResNet101
+- detection faster_rcnn/yolov3/pp-yolo/ttf-net
--- a/python/examples/README_CN.md
+++ b/python/examples/README_CN.md
+## Serving模型示例
+### 支持TensorRT的模型列表 `--use_trt`
+以下模型支持TensorRT，可以开启 `--use_trt`来加速在线预测，其他模型不能开启。
+- imagenet ResNet50/ResNet101
+- detection faster_rcnn/yolov3/pp-yolo/ttf-net
--- a/python/examples/bert/README.md
+++ b/python/examples/bert/README.md
@@ -11,14 +11,16 @@ This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubd
 Install paddlehub first
 ```
-pip install paddlehub
+pip3 install paddlehub
 ```
 run 
 ```
-python prepare_model.py 128
+python3 prepare_model.py 128
 ```
+**PaddleHub only support Python 3.5+**
 the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
 the config file and model file for server side are saved in the folder bert_seq128_model.
 the config file generated for client side is saved in the folder bert_seq128_client.
@@ -28,8 +30,9 @@ You can also download the above model from BOS(max_seq_len=128). After decompres
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
 ```
-if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'.
 ### Getting Dict and Sample Dataset
@@ -64,7 +67,7 @@ the client reads data from data-c.txt and send prediction request, the predictio
 ### HTTP Inference Service
 start cpu HTTP inference service,Run
 ```
- python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service
+ python bert_web_service.py bert_seq128_model/ 9292 #launch cpu inference service
 ```
 Or,start gpu HTTP inference service,Run

--- a/python/examples/bert/README_CN.md
+++ b/python/examples/bert/README_CN.md
@@ -10,11 +10,11 @@
 示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。
 请先安装paddlehub
 ```
-pip install paddlehub
+pip3 install paddlehub
 ```
 执行
 ```
-python prepare_model.py 128
+python3 prepare_model.py 128
 ```
 参数128表示BERT模型中的max_seq_len，即预处理后的样本长度。
 生成server端配置文件与模型文件，存放在bert_seq128_model文件夹。
@@ -25,9 +25,9 @@ python prepare_model.py 128
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
 ```
-若使用bert_chinese_L-12_H-768_A-12_model模型，将下面命令中的bert_seq128_model字段替换为bert_chinese_L-12_H-768_A-12_model，bert_seq128_client字段替换为bert_chinese_L-12_H-768_A-12_client.
 ### 获取词典和样例数据
@@ -67,7 +67,7 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien
 ### 启动HTTP预测服务
 启动cpu HTTP预测服务，执行
 ```
-python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务
+python bert_web_service.py bert_seq128_model/ 9292 #启动CPU预测服务
 ```

--- a/python/examples/criteo_ctr/README.md
+++ b/python/examples/criteo_ctr/README.md
@@ -14,7 +14,7 @@ tar xf criteo_ctr_demo_model.tar.gz
 mv models/ctr_client_conf .
 mv models/ctr_serving_model .
 ```
-the directories like serving_server_model and serving_client_config will appear.
+the directories like `ctr_serving_model` and `ctr_client_conf` will appear.
 ### Start RPC Inference Service
@@ -26,6 +26,6 @@ python -m paddle_serving_server_gpu.serve --model ctr_serving_model/ --port 9292
 ### RPC Infer
 ```
-python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/
+python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0
 ```
 the latency will display in the end.
--- a/python/examples/criteo_ctr/README_CN.md
+++ b/python/examples/criteo_ctr/README_CN.md
@@ -14,7 +14,7 @@ tar xf criteo_ctr_demo_model.tar.gz
 mv models/ctr_client_conf .
 mv models/ctr_serving_model .
 ```
-会在当前目录出现serving_server_model和serving_client_config文件夹。
+会在当前目录出现`ctr_serving_model` 和 `ctr_client_conf`文件夹。
 ### 启动RPC预测服务
@@ -26,6 +26,6 @@ python -m paddle_serving_server_gpu.serve --model ctr_serving_model/ --port 9292
 ### 执行预测
 ```
-python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/
+python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0
 ```
 预测完毕会输出预测过程的耗时。
--- a/python/examples/criteo_ctr/criteo_reader.py
+++ b/python/examples/criteo_ctr/criteo_reader.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-import sys
-import paddle.fluid.incubate.data_generator as dg
-class CriteoDataset(dg.MultiSlotDataGenerator):
-    def setup(self, sparse_feature_dim):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [
-            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.cont_diff_ = [
-            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.hash_dim_ = sparse_feature_dim
-        # here, training data are lines with line_index < train_idx_
-        self.train_idx_ = 41256555
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-    def _process_line(self, line):
-        features = line.rstrip('\n').split('\t')
-        dense_feature = []
-        sparse_feature = []
-        for idx in self.continuous_range_:
-            if features[idx] == '':
-                dense_feature.append(0.0)
-            else:
-                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
-                                     self.cont_diff_[idx - 1])
-        for idx in self.categorical_range_:
-            sparse_feature.append(
-                [hash(str(idx) + features[idx]) % self.hash_dim_])
-        return dense_feature, sparse_feature, [int(features[0])]
-    def infer_reader(self, filelist, batch, buf_size):
-        def local_iter():
-            for fname in filelist:
-                with open(fname.strip(), "r") as fin:
-                    for line in fin:
-                        dense_feature, sparse_feature, label = self._process_line(
-                            line)
-                        #yield dense_feature, sparse_feature, label
-                        yield [dense_feature] + sparse_feature + [label]
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-    def generate_sample(self, line):
-        def data_iter():
-            dense_feature, sparse_feature, label = self._process_line(line)
-            feature_name = ["dense_input"]
-            for idx in self.categorical_range_:
-                feature_name.append("C" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-        return data_iter
-if __name__ == "__main__":
-    criteo_dataset = CriteoDataset()
-    criteo_dataset.setup(int(sys.argv[1]))
-    criteo_dataset.run_from_stdin()
--- a/python/examples/criteo_ctr/test_client.py
+++ b/python/examples/criteo_ctr/test_client.py
@@ -14,43 +14,63 @@
 # pylint: disable=doc-string-missing
 from paddle_serving_client import Client
-import paddle
 import sys
 import os
 import time
-import criteo_reader as criteo
 from paddle_serving_client.metric import auc
 import numpy as np
 import sys
+class CriteoReader(object):
+    def __init__(self, sparse_feature_dim):
+        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+        self.cont_max_ = [
+            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
+        ]
+        self.cont_diff_ = [
+            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
+        ]
+        self.hash_dim_ = sparse_feature_dim
+        # here, training data are lines with line_index < train_idx_
+        self.train_idx_ = 41256555
+        self.continuous_range_ = range(1, 14)
+        self.categorical_range_ = range(14, 40)
+    def process_line(self, line):
+        features = line.rstrip('\n').split('\t')
+        dense_feature = []
+        sparse_feature = []
+        for idx in self.continuous_range_:
+            if features[idx] == '':
+                dense_feature.append(0.0)
+            else:
+                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
+                                     self.cont_diff_[idx - 1])
+        for idx in self.categorical_range_:
+            sparse_feature.append(
+                [hash(str(idx) + features[idx]) % self.hash_dim_])
+        return sparse_feature
 py_version = sys.version_info[0]
 client = Client()
 client.load_client_config(sys.argv[1])
 client.connect(["127.0.0.1:9292"])
+reader = CriteoReader(1000001)
 batch = 1
 buf_size = 100
-dataset = criteo.CriteoDataset()
-dataset.setup(1000001)
-test_filelists = [
-    "{}/part-%d".format(sys.argv[2]) % x
-    for x in range(len(os.listdir(sys.argv[2])))
-]
-reader = dataset.infer_reader(test_filelists[len(test_filelists) - 40:], batch,
-                              buf_size)
 label_list = []
 prob_list = []
 start = time.time()
-for ei in range(1000):
+f = open(sys.argv[2], 'r')
-    if py_version == 2:
+for ei in range(10):
-        data = reader().next()
+    data = reader.process_line(f.readline())
-    else:
-        data = reader().__next__()
    feed_dict = {}
    for i in range(1, 27):
-        feed_dict["sparse_{}".format(i - 1)] = np.array(data[0][i]).reshape(-1)
+        feed_dict["sparse_{}".format(i - 1)] = np.array(data[i-1]).reshape(-1)
-        feed_dict["sparse_{}.lod".format(i - 1)] = [0, len(data[0][i])]
+        feed_dict["sparse_{}.lod".format(i - 1)] = [0, len(data[i-1])]
    fetch_map = client.predict(feed=feed_dict, fetch=["prob"])
+    print(fetch_map)
 end = time.time()
-print(end - start)
+f.close()
--- a/python/examples/detection/README.md
+++ b/python/examples/detection/README.md
@@ -12,6 +12,7 @@ Paddle Detection provides a large number of [Model Zoo](https://github.com/Paddl
 ### Serving example
 Several examples of PaddleDetection models used in Serving are given in this folder
+All examples support TensorRT.
 -[Faster RCNN](./faster_rcnn_r50_fpn_1x_coco)
 -[PPYOLO](./ppyolo_r50vd_dcn_1x_coco)

--- a/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README.md
+++ b/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README.md
@@ -13,6 +13,9 @@ tar xf faster_rcnn_r50_fpn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`. 
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README_CN.md
+++ b/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README_CN.md
@@ -11,8 +11,9 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 ### 启动服务
 ```
 tar xf faster_rcnn_r50_fpn_1x_coco.tar
-python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```

--- a/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README.md
+++ b/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README.md
@@ -13,6 +13,8 @@ tar xf ppyolo_r50vd_dcn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README_CN.md
+++ b/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README_CN.md
@@ -11,9 +11,11 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 ### 启动服务
 ```
 tar xf ppyolo_r50vd_dcn_1x_coco.tar
-python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/ttfnet_darknet53_1x_coco/README.md
+++ b/python/examples/detection/ttfnet_darknet53_1x_coco/README.md
@@ -12,6 +12,7 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 tar xf ttfnet_darknet53_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
 ### Perform prediction
 ```

--- a/python/examples/detection/ttfnet_darknet53_1x_coco/README_CN.md
+++ b/python/examples/detection/ttfnet_darknet53_1x_coco/README_CN.md
@@ -11,9 +11,11 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 ### 启动服务
 ```
 tar xf ttfnet_darknet53_1x_coco.tar
-python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/yolov3_darknet53_270e_coco/README.md
+++ b/python/examples/detection/yolov3_darknet53_270e_coco/README.md
@@ -13,6 +13,8 @@ tar xf yolov3_darknet53_270e_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/yolov3_darknet53_270e_coco/README_CN.md
+++ b/python/examples/detection/yolov3_darknet53_270e_coco/README_CN.md
@@ -11,9 +11,11 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 ### 启动服务
 ```
 tar xf yolov3_darknet53_270e_coco.tar
-python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/args.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/args.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-import argparse
-def parse_args():
-    parser = argparse.ArgumentParser(description="PaddlePaddle CTR example")
-    parser.add_argument(
-        '--train_data_path',
-        type=str,
-        default='./data/raw/train.txt',
-        help="The path of training dataset")
-    parser.add_argument(
-        '--sparse_only',
-        type=bool,
-        default=False,
-        help="Whether we use sparse features only")
-    parser.add_argument(
-        '--test_data_path',
-        type=str,
-        default='./data/raw/valid.txt',
-        help="The path of testing dataset")
-    parser.add_argument(
-        '--batch_size',
-        type=int,
-        default=1000,
-        help="The size of mini-batch (default:1000)")
-    parser.add_argument(
-        '--embedding_size',
-        type=int,
-        default=10,
-        help="The size for embedding layer (default:10)")
-    parser.add_argument(
-        '--num_passes',
-        type=int,
-        default=10,
-        help="The number of passes to train (default: 10)")
-    parser.add_argument(
-        '--model_output_dir',
-        type=str,
-        default='models',
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--sparse_feature_dim',
-        type=int,
-        default=1000001,
-        help='sparse feature hashing space for index processing')
-    parser.add_argument(
-        '--is_local',
-        type=int,
-        default=1,
-        help='Local train or distributed train (default: 1)')
-    parser.add_argument(
-        '--cloud_train',
-        type=int,
-        default=0,
-        help='Local train or distributed train on paddlecloud (default: 0)')
-    parser.add_argument(
-        '--async_mode',
-        action='store_true',
-        default=False,
-        help='Whether start pserver in async mode to support ASGD')
-    parser.add_argument(
-        '--no_split_var',
-        action='store_true',
-        default=False,
-        help='Whether split variables into blocks when update_method is pserver')
-    parser.add_argument(
-        '--role',
-        type=str,
-        default='pserver',  # trainer or pserver
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--endpoints',
-        type=str,
-        default='127.0.0.1:6000',
-        help='The pserver endpoints, like: 127.0.0.1:6000,127.0.0.1:6001')
-    parser.add_argument(
-        '--current_endpoint',
-        type=str,
-        default='127.0.0.1:6000',
-        help='The path for model to store (default: 127.0.0.1:6000)')
-    parser.add_argument(
-        '--trainer_id',
-        type=int,
-        default=0,
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--trainers',
-        type=int,
-        default=1,
-        help='The num of trianers, (default: 1)')
-    return parser.parse_args()
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/clean.sh
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/clean.sh
-ps -ef | grep cube | awk {'print $2'} | xargs kill -9
-rm -rf cube/cube_data cube/data cube/log* cube/nohup* cube/output/ cube/donefile cube/input cube/monitor cube/cube-builder.INFO
-ps -ef | grep test | awk {'print $2'} | xargs kill -9
-ps -ef | grep serving | awk {'print $2'} | xargs kill -9
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/criteo.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/criteo.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import sys
-class CriteoDataset(object):
-    def setup(self, sparse_feature_dim):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [
-            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.cont_diff_ = [
-            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.hash_dim_ = sparse_feature_dim
-        # here, training data are lines with line_index < train_idx_
-        self.train_idx_ = 41256555
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-    def _process_line(self, line):
-        features = line.rstrip('\n').split('\t')
-        dense_feature = []
-        sparse_feature = []
-        for idx in self.continuous_range_:
-            if features[idx] == '':
-                dense_feature.append(0.0)
-            else:
-                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
-                                     self.cont_diff_[idx - 1])
-        for idx in self.categorical_range_:
-            sparse_feature.append(
-                [hash(str(idx) + features[idx]) % self.hash_dim_])
-        return dense_feature, sparse_feature, [int(features[0])]
-    def infer_reader(self, filelist, batch, buf_size):
-        def local_iter():
-            for fname in filelist:
-                with open(fname.strip(), "r") as fin:
-                    for line in fin:
-                        dense_feature, sparse_feature, label = self._process_line(
-                            line)
-                        #yield dense_feature, sparse_feature, label
-                        yield [dense_feature] + sparse_feature + [label]
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-    def generate_sample(self, line):
-        def data_iter():
-            dense_feature, sparse_feature, label = self._process_line(line)
-            feature_name = ["dense_input"]
-            for idx in self.categorical_range_:
-                feature_name.append("C" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-        return data_iter
-if __name__ == "__main__":
-    criteo_dataset = CriteoDataset()
-    criteo_dataset.setup(int(sys.argv[1]))
-    criteo_dataset.run_from_stdin()
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/criteo_reader.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/criteo_reader.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-import sys
-import paddle.fluid.incubate.data_generator as dg
-class CriteoDataset(dg.MultiSlotDataGenerator):
-    def setup(self, sparse_feature_dim):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [
-            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.cont_diff_ = [
-            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.hash_dim_ = sparse_feature_dim
-        # here, training data are lines with line_index < train_idx_
-        self.train_idx_ = 41256555
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-    def _process_line(self, line):
-        features = line.rstrip('\n').split('\t')
-        dense_feature = []
-        sparse_feature = []
-        for idx in self.continuous_range_:
-            if features[idx] == '':
-                dense_feature.append(0.0)
-            else:
-                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
-                                     self.cont_diff_[idx - 1])
-        for idx in self.categorical_range_:
-            sparse_feature.append(
-                [hash(str(idx) + features[idx]) % self.hash_dim_])
-        return dense_feature, sparse_feature, [int(features[0])]
-    def infer_reader(self, filelist, batch, buf_size):
-        def local_iter():
-            for fname in filelist:
-                with open(fname.strip(), "r") as fin:
-                    for line in fin:
-                        dense_feature, sparse_feature, label = self._process_line(
-                            line)
-                        #yield dense_feature, sparse_feature, label
-                        yield [dense_feature] + sparse_feature + [label]
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-    def generate_sample(self, line):
-        def data_iter():
-            dense_feature, sparse_feature, label = self._process_line(line)
-            feature_name = ["dense_input"]
-            for idx in self.categorical_range_:
-                feature_name.append("C" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-        return data_iter
-if __name__ == "__main__":
-    criteo_dataset = CriteoDataset()
-    criteo_dataset.setup(int(sys.argv[1]))
-    criteo_dataset.run_from_stdin()
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube/conf/cube.conf
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube/conf/cube.conf
-[{
-    "dict_name": "test_dict",
-    "shard": 1,
-    "dup": 1,
-    "timeout": 200,
-    "retry": 3,
-    "backup_request": 100,
-    "type": "ipport_list",
-    "load_balancer": "rr",
-    "nodes": [{
-        "ipport_list": "list://127.0.0.1:8027"
-    }]
-}]
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube/conf/gflags.conf
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube/conf/gflags.conf
--port=8027
--dict_split=1
--in_mem=true
--log_dir=./log/
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube/keys
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube/keys
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/local_train.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/local_train.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-from __future__ import print_function
-from args import parse_args
-import os
-import paddle.fluid as fluid
-import sys
-from network_conf import dnn_model
-dense_feature_dim = 13
-def train():
-    args = parse_args()
-    sparse_only = args.sparse_only
-    if not os.path.isdir(args.model_output_dir):
-        os.mkdir(args.model_output_dir)
-    dense_input = fluid.layers.data(
-        name="dense_input", shape=[dense_feature_dim], dtype='float32')
-    sparse_input_ids = [
-        fluid.layers.data(
-            name="C" + str(i), shape=[1], lod_level=1, dtype="int64")
-        for i in range(1, 27)
-    ]
-    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-    #nn_input = None if sparse_only else dense_input
-    nn_input = dense_input
-    predict_y, loss, auc_var, batch_auc_var, infer_vars = dnn_model(
-        nn_input, sparse_input_ids, label, args.embedding_size,
-        args.sparse_feature_dim)
-    optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
-    optimizer.minimize(loss)
-    exe = fluid.Executor(fluid.CPUPlace())
-    exe.run(fluid.default_startup_program())
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.set_use_var([dense_input] + sparse_input_ids + [label])
-    python_executable = "python"
-    pipe_command = "{} criteo_reader.py {}".format(python_executable,
-                                                   args.sparse_feature_dim)
-    dataset.set_pipe_command(pipe_command)
-    dataset.set_batch_size(128)
-    thread_num = 10
-    dataset.set_thread(thread_num)
-    whole_filelist = [
-        "raw_data/part-%d" % x for x in range(len(os.listdir("raw_data")))
-    ]
-    print(whole_filelist)
-    dataset.set_filelist(whole_filelist[:100])
-    dataset.load_into_memory()
-    fluid.layers.Print(auc_var)
-    epochs = 1
-    for i in range(epochs):
-        exe.train_from_dataset(
-            program=fluid.default_main_program(), dataset=dataset, debug=True)
-        print("epoch {} finished".format(i))
-    import paddle_serving_client.io as server_io
-    feed_var_dict = {}
-    feed_var_dict['dense_input'] = dense_input
-    for i, sparse in enumerate(sparse_input_ids):
-        feed_var_dict["embedding_{}.tmp_0".format(i)] = sparse
-    fetch_var_dict = {"prob": predict_y}
-    feed_kv_dict = {}
-    feed_kv_dict['dense_input'] = dense_input
-    for i, emb in enumerate(infer_vars):
-        feed_kv_dict["embedding_{}.tmp_0".format(i)] = emb
-    fetch_var_dict = {"prob": predict_y}
-    server_io.save_model("ctr_serving_model", "ctr_client_conf", feed_var_dict,
-                         fetch_var_dict, fluid.default_main_program())
-    server_io.save_model("ctr_serving_model_kv", "ctr_client_conf_kv",
-                         feed_kv_dict, fetch_var_dict,
-                         fluid.default_main_program())
-if __name__ == '__main__':
-    train()
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/network_conf.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/network_conf.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-import paddle.fluid as fluid
-import math
-def dnn_model(dense_input, sparse_inputs, label, embedding_size,
-              sparse_feature_dim):
-    def embedding_layer(input):
-        emb = fluid.layers.embedding(
-            input=input,
-            is_sparse=True,
-            is_distributed=False,
-            size=[sparse_feature_dim, embedding_size],
-            param_attr=fluid.ParamAttr(
-                name="SparseFeatFactors",
-                initializer=fluid.initializer.Uniform()))
-        x = fluid.layers.sequence_pool(input=emb, pool_type='sum')
-        return emb, x
-    def mlp_input_tensor(emb_sums, dense_tensor):
-        #if isinstance(dense_tensor, fluid.Variable):
-        #    return fluid.layers.concat(emb_sums, axis=1)
-        #else:
-        return fluid.layers.concat(emb_sums + [dense_tensor], axis=1)
-    def mlp(mlp_input):
-        fc1 = fluid.layers.fc(input=mlp_input,
-                              size=400,
-                              act='relu',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(mlp_input.shape[1]))))
-        fc2 = fluid.layers.fc(input=fc1,
-                              size=400,
-                              act='relu',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc1.shape[1]))))
-        fc3 = fluid.layers.fc(input=fc2,
-                              size=400,
-                              act='relu',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc2.shape[1]))))
-        pre = fluid.layers.fc(input=fc3,
-                              size=2,
-                              act='softmax',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc3.shape[1]))))
-        return pre
-    emb_pair_sums = list(map(embedding_layer, sparse_inputs))
-    emb_sums = [x[1] for x in emb_pair_sums]
-    infer_vars = [x[0] for x in emb_pair_sums]
-    mlp_in = mlp_input_tensor(emb_sums, dense_input)
-    predict = mlp(mlp_in)
-    cost = fluid.layers.cross_entropy(input=predict, label=label)
-    avg_cost = fluid.layers.reduce_sum(cost)
-    accuracy = fluid.layers.accuracy(input=predict, label=label)
-    auc_var, batch_auc_var, auc_states = \
-        fluid.layers.auc(input=predict, label=label, num_thresholds=2 ** 12, slide_steps=20)
-    return predict, avg_cost, auc_var, batch_auc_var, infer_vars
--- a/python/examples/imdb/local_train.py
+++ b/python/examples/imdb/local_train.py
@@ -21,7 +21,7 @@ import paddle.fluid as fluid
 logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
 logger = logging.getLogger("fluid")
 logger.setLevel(logging.INFO)
+paddle.enable_static()
 def load_vocab(filename):
    vocab = {}

--- a/python/examples/ocr/rec_debugger_server.py
+++ b/python/examples/ocr/rec_debugger_server.py
@@ -22,7 +22,10 @@ from paddle_serving_client import Client
 from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
 from paddle_serving_app.reader import Div, Normalize, Transpose
 from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
-from paddle_serving_server_gpu.web_service import WebService
+if sys.argv[1] == 'gpu':
+    from paddle_serving_server_gpu.web_service import WebService
+elif sys.argv[1] == 'cpu':
+    from paddle_serving_server.web_service import WebService
 import time
 import re
 import base64
@@ -65,8 +68,12 @@ class OCRService(WebService):
 ocr_service = OCRService(name="ocr")
 ocr_service.load_model_config("ocr_rec_model")
-ocr_service.set_gpus("0")
+if sys.argv[1] == 'gpu':
-ocr_service.init_rec()
+    ocr_service.set_gpus("0")
-ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
+    ocr_service.init_rec()
+    ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
+elif sys.argv[1] == 'cpu':
+    ocr_service.init_rec()
+    ocr_service.prepare_server(workdir="workdir", port=9292, device="cpu")
 ocr_service.run_debugger_service()
 ocr_service.run_web_service()
--- a/python/examples/pipeline/imagenet/README_CN.md
+++ b/python/examples/pipeline/imagenet/README_CN.md
 # Imagenet Pipeline WebService
-这里以 Uci 服务为例来介绍 Pipeline WebService 的使用。
+这里以 Imagenet 服务为例来介绍 Pipeline WebService 的使用。
 ## 获取模型
 ```
@@ -10,10 +10,11 @@ sh get_model.sh
 ## 启动服务
 ```
-python web_service.py &>log.txt &
+python resnet50_web_service.py &>log.txt &
 ```
 ## 测试
 ```
-curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
+python pipeline_rpc_client.py
 ```
--- a/python/examples/pipeline/imagenet/pipeline_rpc_client.py
+++ b/python/examples/pipeline/imagenet/pipeline_rpc_client.py
@@ -11,7 +11,10 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from paddle_serving_server_gpu.pipeline import PipelineClient
+try:
+    from paddle_serving_server_gpu.pipeline import PipelineClient
+except ImportError:
+    from paddle_serving_server.pipeline import PipelineClient
 import numpy as np
 import requests
 import json

--- a/python/examples/pipeline/ocr/pipeline_http_client.py
+++ b/python/examples/pipeline/ocr/pipeline_http_client.py
@@ -11,7 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from paddle_serving_server.pipeline import PipelineClient
+# from paddle_serving_server.pipeline import PipelineClient
 import numpy as np
 import requests
 import json

--- a/python/examples/pipeline/ocr/pipeline_rpc_client.py
+++ b/python/examples/pipeline/ocr/pipeline_rpc_client.py
@@ -11,7 +11,10 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from paddle_serving_server_gpu.pipeline import PipelineClient
+try:
+    from paddle_serving_server_gpu.pipeline import PipelineClient
+except ImportError:
+    from paddle_serving_server.pipeline import PipelineClient
 import numpy as np
 import requests
 import json

--- a/python/examples/xpu/fit_a_line_xpu/README.md
+++ b/python/examples/xpu/fit_a_line_xpu/README.md
+# Fit a line prediction example
+([简体中文](./README_CN.md)|English)
+## Get data
+```shell
+sh get_data.sh
+```
+## RPC service
+### Start server
+```shell
+python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
+```
+### Client prediction
+The `paddlepaddle` package is used in `test_client.py`, and you may need to download the corresponding package(`pip install paddlepaddle`).
+``` shell
+python test_client.py uci_housing_client/serving_client_conf.prototxt
+```
+## HTTP service
+### Start server
+Start a web service with default web service hosting modules:
+``` shell
+python test_server.py
+```
+### Client prediction
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
--- a/python/examples/xpu/fit_a_line_xpu/README_CN.md
+++ b/python/examples/xpu/fit_a_line_xpu/README_CN.md
+# 线性回归预测服务示例
+(简体中文|[English](./README.md))
+## 获取数据
+```shell
+sh get_data.sh
+```
+## RPC服务
+### 开启服务端
+``` shell
+python test_server.py uci_housing_model/
+```
+也可以通过下面的一行代码开启默认RPC服务：
+```shell
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
+```
+### 客户端预测
+`test_client.py`中使用了`paddlepaddle`包，需要进行下载（`pip install paddlepaddle`）。
+``` shell
+python test_client.py uci_housing_client/serving_client_conf.prototxt
+```
+## HTTP服务
+### 开启服务端
+通过下面的一行代码开启默认web服务：
+``` shell
+python test_server.py
+```
+### 客户端预测
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
--- a/python/examples/xpu/fit_a_line_xpu/benchmark.py
+++ b/python/examples/xpu/fit_a_line_xpu/benchmark.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+from paddle_serving_client import Client
+from paddle_serving_client.utils import MultiThreadRunner
+from paddle_serving_client.utils import benchmark_args
+import time
+import paddle
+import sys
+import requests
+args = benchmark_args()
+def single_func(idx, resource):
+    if args.request == "rpc":
+        client = Client()
+        client.load_client_config(args.model)
+        client.connect([args.endpoint])
+        train_reader = paddle.batch(
+            paddle.reader.shuffle(
+                paddle.dataset.uci_housing.train(), buf_size=500),
+            batch_size=1)
+        start = time.time()
+        for data in train_reader():
+            fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["price"])
+        end = time.time()
+        return [[end - start]]
+    elif args.request == "http":
+        train_reader = paddle.batch(
+            paddle.reader.shuffle(
+                paddle.dataset.uci_housing.train(), buf_size=500),
+            batch_size=1)
+        start = time.time()
+        for data in train_reader():
+            r = requests.post(
+                'http://{}/uci/prediction'.format(args.endpoint),
+                data={"x": data[0]})
+        end = time.time()
+        return [[end - start]]
+multi_thread_runner = MultiThreadRunner()
+result = multi_thread_runner.run(single_func, args.thread, {})
+print(result)
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/get_data.sh
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/get_data.sh
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/data/ctr_prediction/ctr_data.tar.gz
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -zxvf ctr_data.tar.gz
+tar -xzf uci_housing.tar.gz
--- a/python/examples/xpu/fit_a_line_xpu/local_train.py
+++ b/python/examples/xpu/fit_a_line_xpu/local_train.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+import sys
+import paddle
+import paddle.fluid as fluid
+paddle.enable_static()
+train_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.train(), buf_size=500),
+    batch_size=16)
+test_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.test(), buf_size=500),
+    batch_size=16)
+x = fluid.data(name='x', shape=[None, 13], dtype='float32')
+y = fluid.data(name='y', shape=[None, 1], dtype='float32')
+y_predict = fluid.layers.fc(input=x, size=1, act=None)
+cost = fluid.layers.square_error_cost(input=y_predict, label=y)
+avg_loss = fluid.layers.mean(cost)
+sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
+sgd_optimizer.minimize(avg_loss)
+place = fluid.CPUPlace()
+feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+import paddle_serving_client.io as serving_io
+for pass_id in range(30):
+    for data_train in train_reader():
+        avg_loss_value, = exe.run(fluid.default_main_program(),
+                                  feed=feeder.feed(data_train),
+                                  fetch_list=[avg_loss])
+serving_io.save_model("uci_housing_model", "uci_housing_client", {"x": x},
+                      {"price": y_predict}, fluid.default_main_program())
--- a/python/examples/fit_a_line/test_numpy_input_client.py
+++ b/python/examples/fit_a_line/test_numpy_input_client.py
@@ -14,8 +14,8 @@
 # pylint: disable=doc-string-missing
 from paddle_serving_client import Client
-import numpy as np
 import sys
+import numpy as np
 client = Client()
 client.load_client_config(sys.argv[1])
@@ -28,6 +28,8 @@ test_reader = paddle.batch(
    batch_size=1)
 for data in test_reader():
+    new_data = np.zeros((1, 1, 13)).astype("float32")
+    new_data[0] = data[0][0]
    fetch_map = client.predict(
-        feed={"x": np.array(data[0][0])}, fetch=["price"])
+        feed={"x": new_data}, fetch=["price"], batch=True)
-    print("{} {}".format(fetch_map["price"][0][0], data[0][1][0]))
+    print(fetch_map)
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_server_quant.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_server_quant.py
@@ -11,31 +11,32 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# pylint: disable=doc-string-missing
-import os
+from paddle_serving_client import Client
-import sys
+from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_server import OpMaker
+import paddle
-from paddle_serving_server import OpSeqMaker
+import numpy as np
-from paddle_serving_server import MultiLangServer as Server
-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_dist_kv_infer_op = op_maker.create('general_dist_kv_quant_infer')
-response_op = op_maker.create('general_response')
-op_seq_maker = OpSeqMaker()
+def single_func(idx, resource):
-op_seq_maker.add_op(read_op)
+    client = Client()
-op_seq_maker.add_op(general_dist_kv_infer_op)
+    client.load_client_config(
-op_seq_maker.add_op(response_op)
+        "./uci_housing_client/serving_client_conf.prototxt")
+    client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
+    x = [
+        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
+        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
+    ]
+    x = np.array(x)
+    for i in range(1000):
+        fetch_map = client.predict(feed={"x": x}, fetch=["price"])
+        if fetch_map is None:
+            return [[None]]
+    return [[0]]
-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
+multi_thread_runner = MultiThreadRunner()
-server.set_num_threads(4)
+thread_num = 4
-server.load_model_config(sys.argv[1], sys.argv[2])
+result = multi_thread_runner.run(single_func, thread_num, {})
-server.prepare_server(
+if None in result[0]:
-    workdir="work_dir1",
+    exit(1)
-    port=9292,
-    device="cpu",
-    cube_conf="./cube/conf/cube.conf")
-server.run_server()
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_server.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_server.py
@@ -13,29 +13,24 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
-import os
+from paddle_serving_server_gpu.web_service import WebService
-import sys
+import numpy as np
-from paddle_serving_server import OpMaker
-from paddle_serving_server import OpSeqMaker
-from paddle_serving_server import MultiLangServer as Server
-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_dist_kv_infer_op = op_maker.create('general_dist_kv_infer')
-response_op = op_maker.create('general_response')
-op_seq_maker = OpSeqMaker()
+class UciService(WebService):
-op_seq_maker.add_op(read_op)
+    def preprocess(self, feed=[], fetch=[]):
-op_seq_maker.add_op(general_dist_kv_infer_op)
+        feed_batch = []
-op_seq_maker.add_op(response_op)
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
+uci_service = UciService(name="uci")
-server.set_num_threads(4)
+uci_service.load_model_config("uci_housing_model")
-server.load_model_config(sys.argv[1], sys.argv[2])
+uci_service.prepare_server(workdir="workdir", port=9393, use_lite=True, use_xpu=True, ir_optim=True)
-server.prepare_server(
+uci_service.run_rpc_service()
-    workdir="work_dir1",
+uci_service.run_web_service()
-    port=9292,
-    device="cpu",
-    cube_conf="./cube/conf/cube.conf")
-server.run_server()
--- a/python/examples/xpu/resnet_v2_50_xpu/README.md
+++ b/python/examples/xpu/resnet_v2_50_xpu/README.md
+# Image Classification
+## Get Model
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+## RPC Service
+### Start Service
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --port 9393 --use_lite --use_xpu --ir_optim
+```
+### Client Prediction
+```
+python resnet50_v2_client.py
+```
--- a/python/examples/xpu/resnet_v2_50_xpu/README_CN.md
+++ b/python/examples/xpu/resnet_v2_50_xpu/README_CN.md
+# 图像分类
+## 获取模型
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+## RPC 服务
+### 启动服务端
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --port 9393 --use_lite --use_xpu --ir_optim
+```
+### 客户端预测
+```
+python resnet50_v2_client.py
+```
--- a/python/examples/xpu/resnet_v2_50_xpu/daisy.jpg
+++ b/python/examples/xpu/resnet_v2_50_xpu/daisy.jpg
--- a/python/examples/xpu/resnet_v2_50_xpu/localpredict.py
+++ b/python/examples/xpu/resnet_v2_50_xpu/localpredict.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+from paddle_serving_app.local_predict import LocalPredictor
+import sys
+predictor = LocalPredictor()
+predictor.load_model_config(sys.argv[1], use_lite=True, use_xpu=True, ir_optim=True)
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = predictor.predict(feed={"image": img}, fetch=["score"])
+print(fetch_map["score"].reshape(-1))
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube_quant_prepare.sh
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube_quant_prepare.sh
@@ -11,12 +11,22 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# pylint: disable=doc-string-missing
-#! /bin/bash
-mkdir -p cube_model
+from paddle_serving_client import Client
-mkdir -p cube/data
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
-./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8  
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
-mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
+client = Client()
-cd cube && ./cube 
+client.load_client_config(
+    "resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"image": img}, fetch=["score"])
+print(fetch_map["score"].reshape(-1))
--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -152,8 +152,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
+            key = base64.b64decode(post_data["key"].encode())
-            with open(args.model + "/key", "w") as f:
+            with open(args.model + "/key", "wb") as f:
                f.write(key)
            return True
@@ -161,8 +161,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
+            key = base64.b64decode(post_data["key"].encode())
-            with open(args.model + "/key", "r") as f:
+            with open(args.model + "/key", "rb") as f:
                cur_key = f.read()
            return (key == cur_key)
@@ -203,7 +203,7 @@ class MainService(BaseHTTPRequestHandler):
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
-        self.wfile.write(json.dumps(response))
+        self.wfile.write(json.dumps(response).encode())
 if __name__ == "__main__":

--- a/python/paddle_serving_server_gpu/serve.py
+++ b/python/paddle_serving_server_gpu/serve.py
@@ -155,8 +155,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
+            key = base64.b64decode(post_data["key"].encode())
-            with open(args.model + "/key", "w") as f:
+            with open(args.model + "/key", "wb") as f:
                f.write(key)
            return True
@@ -164,8 +164,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
+            key = base64.b64decode(post_data["key"].encode())
-            with open(args.model + "/key", "r") as f:
+            with open(args.model + "/key", "rb") as f:
                cur_key = f.read()
            return (key == cur_key)
@@ -206,7 +206,7 @@ class MainService(BaseHTTPRequestHandler):
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
-        self.wfile.write(json.dumps(response))
+        self.wfile.write(json.dumps(response).encode())
 if __name__ == "__main__":

--- a/python/setup.py.app.in
+++ b/python/setup.py.app.in
@@ -33,7 +33,7 @@ if '${PACK}' == 'ON':
 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'sentencepiece<=0.1.83', 'opencv-python<=4.2.0.32', 'pillow',
-    'pyclipper'
+    'pyclipper', 'shapely'
 ]
 packages=['paddle_serving_app',

--- a/tools/Dockerfile
+++ b/tools/Dockerfile
-FROM centos:7.3.1611
+# A image for building paddle binaries
+# Use cuda devel base image for both cpu and gpu environment
-RUN yum -y install wget && \
+# When you modify it, please be aware of cudnn-runtime version
-    yum -y install epel-release && yum -y install patchelf && \
+FROM hub.baidubce.com/ctr/cuda:9.0-cudnn7-devel-ubuntu16.04
-    yum -y install gcc gcc-c++ make python-devel && \
+MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com>
-    yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false && \
-    yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false && \
+# ENV variables
-    yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false && \
+ARG WITH_GPU
-    yum -y install python3 python3-devel && \
+ARG WITH_AVX
-    yum clean all 
+ENV WITH_GPU=${WITH_GPU:-ON}
-RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
+ENV WITH_AVX=${WITH_AVX:-ON}
-    python get-pip.py && rm get-pip.py && \
-    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+ENV HOME /root
-    echo "export LANG=en_US.utf8" >> /root/.bashrc && \
+# Add bash enhancements
-    echo "export LANGUAGE=en_US.utf8" >> /root/.bashrc
+COPY tools/dockerfile/scripts/root/ /root/
+# Prepare packages for Python
+RUN apt-get update && \
+    apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
+    libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
+    xz-utils tk-dev libffi-dev liblzma-dev
+RUN apt-get update && \
+    apt-get install -y --allow-downgrades --allow-change-held-packages \
+    patchelf git python-pip python-dev python-opencv openssh-server bison \
+    wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
+    curl sed grep graphviz libjpeg-dev zlib1g-dev  \
+    python-matplotlib \
+    automake locales clang-format swig  \
+    liblapack-dev liblapacke-dev libcurl4-openssl-dev \
+    net-tools libtool module-init-tools vim && \
+    apt-get clean -y
+RUN ln -s /usr/lib/x86_64-linux-gnu/libssl.so /usr/lib/libssl.so.10 && \
+    ln -s /usr/lib/x86_64-linux-gnu/libcrypto.so /usr/lib/libcrypto.so.10
+RUN wget https://github.com/koalaman/shellcheck/releases/download/v0.7.1/shellcheck-v0.7.1.linux.x86_64.tar.xz -O shellcheck-v0.7.1.linux.x86_64.tar.xz && \
+    tar -xf shellcheck-v0.7.1.linux.x86_64.tar.xz && cp  shellcheck-v0.7.1/shellcheck /usr/bin/shellcheck && \
+    rm -rf shellcheck-v0.7.1.linux.x86_64.tar.xz shellcheck-v0.7.1
+# Downgrade gcc&&g++
+WORKDIR /usr/bin 
+      COPY tools/dockerfile/build_scripts /build_scripts 
+      RUN bash /build_scripts/install_gcc.sh gcc82 && rm -rf /build_scripts 
+      RUN cp gcc gcc.bak && cp g++ g++.bak && rm gcc && rm g++ 
+      RUN ln -s /usr/local/gcc-8.2/bin/gcc /usr/local/bin/gcc 
+      RUN ln -s /usr/local/gcc-8.2/bin/g++ /usr/local/bin/g++ 
+      RUN ln -s /usr/local/gcc-8.2/bin/gcc /usr/bin/gcc 
+      RUN ln -s /usr/local/gcc-8.2/bin/g++ /usr/bin/g++ 
+      ENV PATH=/usr/local/gcc-8.2/bin:$PATH 
+# install cmake
+WORKDIR /home
+RUN wget -q https://cmake.org/files/v3.16/cmake-3.16.0-Linux-x86_64.tar.gz && tar -zxvf cmake-3.16.0-Linux-x86_64.tar.gz && rm cmake-3.16.0-Linux-x86_64.tar.gz
+ENV PATH=/home/cmake-3.16.0-Linux-x86_64/bin:$PATH
+# Install Python3.6
+RUN mkdir -p /root/python_build/ && wget -q https://www.sqlite.org/2018/sqlite-autoconf-3250300.tar.gz && \
+    tar -zxf sqlite-autoconf-3250300.tar.gz && cd sqlite-autoconf-3250300 && \
+    ./configure -prefix=/usr/local && make -j8 && make install && cd ../ && rm sqlite-autoconf-3250300.tar.gz && \
+    wget -q https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tgz && \
+    tar -xzf Python-3.6.0.tgz && cd Python-3.6.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig
+# Install Python3.7
+RUN wget -q https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tgz && \
+    tar -xzf Python-3.7.0.tgz && cd Python-3.7.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig
+# Install Python3.8
+RUN wget -q https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz && \
+    tar -xzf Python-3.8.0.tgz && cd Python-3.8.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig
+# Install Python3.5
+RUN wget -q https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz && \
+    tar -xzf Python-3.5.1.tgz && cd Python-3.5.1 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/python3.5.1 --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig
+ENV PATH=/usr/local/include/python3.6m/:/usr/local/python3.5.1/include:${PATH}
+ENV PATH=/usr/local/bin:/usr/local/python3.5.1/bin:${PATH}
+ENV LD_LIBRARY_PATH=/usr/local/lib:/usr/local/python3.5.1/lib:${LD_LIBRARY_PATH}
+ENV CPLUS_INCLUDE_PATH=/usr/local/python3.5.1/include/python3.5:/usr/local/include/python3.6m/:$CPLUS_INCLUDE_PATH
+RUN ln -sf /usr/local/python3.5.1/bin/python3.5 /usr/local/bin/python3.5 && ln -sf /usr/local/python3.5.1/bin/python3.5 /usr/bin/python3.5 && ln -sf /usr/local/python3.5.1/bin/pip3.5 /usr/local/bin/pip3.5 && ln -sf /usr/local/python3.5.1/bin/pip3.5 /usr/bin/pip3.5 &&  ln -sf /usr/local/bin/python3.6 /usr/local/bin/python3 && ln -sf /usr/local/bin/python3.6 /usr/bin/python3 && ln -sf /usr/local/bin/pip3.6 /usr/local/bin/pip3 && ln -sf /usr/local/bin/pip3.6 /usr/bin/pip3
+RUN rm -r /root/python_build
+# Install Python2.7.15 to replace original python
+WORKDIR /home
+ENV version=2.7.15
+RUN wget https://www.python.org/ftp/python/$version/Python-$version.tgz && tar -xvf Python-$version.tgz
+WORKDIR /home/Python-$version
+RUN ./configure --enable-unicode=ucs4 --enable-shared CFLAGS=-fPIC --prefix=/usr/local/python2.7.15 && make && make install
+RUN echo "export PATH=/usr/local/python2.7.15/include:${PATH}" >> ~/.bashrc && echo "export PATH=/usr/local/python2.7.15/bin:${PATH}" >> ~/.bashrc && echo "export LD_LIBRARY_PATH=/usr/local/python2.7.15/lib:${LD_LIBRARY_PATH}" >> ~/.bashrc && echo "export CPLUS_INCLUDE_PATH=/usr/local/python2.7.15/include/python2.7:$CPLUS_INCLUDE_PATH" >> ~/.bashrc
+ENV PATH=/usr/local/python2.7.15/include:${PATH}
+ENV PATH=/usr/local/python2.7.15/bin:${PATH}
+ENV LD_LIBRARY_PATH=/usr/local/python2.7.15/lib:${LD_LIBRARY_PATH}
+ENV CPLUS_INCLUDE_PATH=/usr/local/python2.7.15/include/python2.7:$CPLUS_INCLUDE_PATH
+RUN mv /usr/bin/python /usr/bin/python.bak && ln -s /usr/local/python2.7.15/bin/python2.7 /usr/local/bin/python && ln -s /usr/local/python2.7.15/bin/python2.7 /usr/bin/python
+WORKDIR /home
+RUN wget https://files.pythonhosted.org/packages/b0/d1/8acb42f391cba52e35b131e442e80deffbb8d0676b93261d761b1f0ef8fb/setuptools-40.6.2.zip && apt-get -y install unzip && unzip setuptools-40.6.2.zip
+WORKDIR /home/setuptools-40.6.2
+RUN python setup.py build && python setup.py install
+WORKDIR /home
+RUN wget https://files.pythonhosted.org/packages/69/81/52b68d0a4de760a2f1979b0931ba7889202f302072cc7a0d614211bc7579/pip-18.0.tar.gz && tar -zxvf pip-18.0.tar.gz
+WORKDIR pip-18.0
+RUN python setup.py install && \
+  python3.8 setup.py install && \
+  python3.7 setup.py install && \
+  python3.6 setup.py install && \
+  python3.5 setup.py install 
+WORKDIR /home
+RUN rm Python-$version.tgz setuptools-40.6.2.zip pip-18.0.tar.gz && \
+    rm -r Python-$version setuptools-40.6.2 pip-18.0
+# Install Go and glide
+RUN wget -qO- https://dl.google.com/go/go1.14.linux-amd64.tar.gz | \
+    tar -xz -C /usr/local && \
+    mkdir /root/go && \
+    mkdir /root/go/bin && \
+    mkdir /root/go/src && \
+    echo "GOROOT=/usr/local/go" >> /root/.bashrc && \
+    echo "GOPATH=/root/go" >> /root/.bashrc && \
+    echo "PATH=/usr/local/go/bin:/root/go/bin:$PATH" >> /root/.bashrc
+ENV GOROOT=/usr/local/go GOPATH=/root/go
+# should not be in the same line with GOROOT definition, otherwise docker build could not find GOROOT.
+ENV PATH=usr/local/go/bin:/root/go/bin:${PATH}
+# Install TensorRT
+# following TensorRT.tar.gz is not the default official one, we do two miny changes:
+# 1. Remove the unnecessary files to make the library small. TensorRT.tar.gz only contains include and lib now,
+#    and its size is only one-third of the official one.
+# 2. Manually add ~IPluginFactory() in IPluginFactory class of NvInfer.h, otherwise, it couldn't work in paddle.
+#    See https://github.com/PaddlePaddle/Paddle/issues/10129 for details.
+# Downgrade TensorRT 
+COPY tools/dockerfile/build_scripts /build_scripts
+RUN bash /build_scripts/install_trt.sh 
+RUN rm -rf /build_scripts
+# git credential to skip password typing
+RUN git config --global credential.helper store
+# Fix locales to en_US.UTF-8
+RUN localedef -i en_US -f UTF-8 en_US.UTF-8
+RUN apt-get install libprotobuf-dev -y
+# Older versions of patchelf limited the size of the files being processed and were fixed in this pr.
+# https://github.com/NixOS/patchelf/commit/ba2695a8110abbc8cc6baf0eea819922ee5007fa
+# So install a newer version here.
+RUN wget -q https://paddle-ci.cdn.bcebos.com/patchelf_0.10-2_amd64.deb && \
+    dpkg -i patchelf_0.10-2_amd64.deb
+# Configure OpenSSH server. c.f. https://docs.docker.com/engine/examples/running_ssh_service
+RUN mkdir /var/run/sshd && echo 'root:root' | chpasswd && sed -ri 's/^PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config && sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config
+CMD source ~/.bashrc
+# ccache 3.7.9
+RUN wget https://paddle-ci.gz.bcebos.com/ccache-3.7.9.tar.gz && \
+    tar xf ccache-3.7.9.tar.gz && mkdir /usr/local/ccache-3.7.9 && cd ccache-3.7.9 && \
+    ./configure -prefix=/usr/local/ccache-3.7.9 && \
+    make -j8 && make install && \
+    ln -s /usr/local/ccache-3.7.9/bin/ccache /usr/local/bin/ccache
+RUN python3.8 -m pip install --upgrade pip requests && \
+    python3.7 -m pip install --upgrade pip requests && \
+    python3.6 -m pip install --upgrade pip requests && \
+    python3.5 -m pip install --upgrade pip requests && \
+    python2.7 -m pip install --upgrade pip requests   
+RUN wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && \
+    tar xf centos_ssl.tar && rm -rf centos_ssl.tar && \
+    mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && \
+    ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && \
+    ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && \
+    ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && \
+    ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
+EXPOSE 22
--- a/tools/Dockerfile.cuda10.0-cudnn7
+++ b/tools/Dockerfile.cuda10.0-cudnn7
--- a/tools/Dockerfile.cuda10.0-cudnn7.devel
+++ b/tools/Dockerfile.cuda10.0-cudnn7.devel
--- a/tools/Dockerfile.cuda10.1-cudnn7
+++ b/tools/Dockerfile.cuda10.1-cudnn7
--- a/tools/Dockerfile.cuda10.1-cudnn7.devel
+++ b/tools/Dockerfile.cuda10.1-cudnn7.devel
--- a/tools/Dockerfile.cuda10.2-cudnn8
+++ b/tools/Dockerfile.cuda10.2-cudnn8
--- a/tools/Dockerfile.cuda10.2-cudnn8.devel
+++ b/tools/Dockerfile.cuda10.2-cudnn8.devel
--- a/tools/Dockerfile.cuda11-cudnn8
+++ b/tools/Dockerfile.cuda11-cudnn8
--- a/tools/Dockerfile.cuda11-cudnn8.devel
+++ b/tools/Dockerfile.cuda11-cudnn8.devel
--- a/tools/Dockerfile.cuda9.0-cudnn7
+++ b/tools/Dockerfile.cuda9.0-cudnn7
--- a/tools/Dockerfile.cuda9.0-cudnn7.devel
+++ b/tools/Dockerfile.cuda9.0-cudnn7.devel
--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
--- a/tools/dockerfile/build_scripts/build.sh
+++ b/tools/dockerfile/build_scripts/build.sh
--- a/tools/dockerfile/build_scripts/build_utils.sh
+++ b/tools/dockerfile/build_scripts/build_utils.sh
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_client.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_client.py
+#!/bin/bash
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 # 
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -11,39 +13,34 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-# pylint: disable=doc-string-missing
-from paddle_serving_client import MultiLangClient as Client
+# Top-level build script called from Dockerfile
-import sys
-import os
-import criteo as criteo
-import time
-from paddle_serving_client.metric import auc
-import grpc
-client = Client()
+# Stop at any error, show all commands
-client.connect(["127.0.0.1:9292"])
+set -ex
-batch = 1
+if [ -f "/etc/redhat-release" ];then
-buf_size = 100
+  lib_so_5=/usr/lib64/libgfortran.so.5
-dataset = criteo.CriteoDataset()
+  lib_so_6=/usr/lib64/libstdc++.so.6
-dataset.setup(1000001)
+  lib_path=/usr/lib64
-test_filelists = ["{}/part-0".format(sys.argv[1])]
+else
-reader = dataset.infer_reader(test_filelists, batch, buf_size)
+  lib_so_5=/usr/lib/x86_64-linux-gnu/libstdc++.so.5
-label_list = []
+  lib_so_6=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
-prob_list = []
+  lib_path=/usr/lib/x86_64-linux-gnu
-start = time.time()
+fi
-for ei in range(10000):
-    data = reader().next()
-    feed_dict = {}
-    feed_dict['dense_input'] = data[0][0]
-    for i in range(1, 27):
-        feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][i]
-    fetch_map = client.predict(feed=feed_dict, fetch=["prob"])
-    if fetch_map["serving_status_code"] == 0:
-        prob_list.append(fetch_map['prob'][0][1])
-        label_list.append(data[0][-1][0])
-print(auc(label_list, prob_list))
+if [ "$1" == "gcc82" ]; then
-end = time.time()
+  wget -q https://paddle-ci.gz.bcebos.com/gcc-8.2.0.tar.xz 
-print(end - start)
+  tar -xvf gcc-8.2.0.tar.xz && \
+  cd gcc-8.2.0 && \
+  unset LIBRARY_PATH CPATH C_INCLUDE_PATH PKG_CONFIG_PATH CPLUS_INCLUDE_PATH INCLUDE && \
+  ./contrib/download_prerequisites && \
+  cd .. && mkdir temp_gcc82 && cd temp_gcc82 && \
+  ../gcc-8.2.0/configure --prefix=/usr/local/gcc-8.2 --enable-threads=posix --disable-checking --disable-multilib && \
+  make -j8 && make install
+  cd .. && rm -rf temp_gcc82
+  cp ${lib_so_6} ${lib_so_6}.bak  && rm -f ${lib_so_6} && 
+  ln -s /usr/local/gcc-8.2/lib64/libgfortran.so.5 ${lib_so_5} && \
+  ln -s /usr/local/gcc-8.2/lib64/libstdc++.so.6 ${lib_so_6} && \
+  cp /usr/local/gcc-8.2/lib64/libstdc++.so.6.0.25 ${lib_path}
+fi
--- a/tools/dockerfile/build_scripts/install_nccl2.sh
+++ b/tools/dockerfile/build_scripts/install_nccl2.sh
--- a/tools/dockerfile/build_scripts/install_trt.sh
+++ b/tools/dockerfile/build_scripts/install_trt.sh
--- a/tools/dockerfile/build_scripts/manylinux1-check.py
+++ b/tools/dockerfile/build_scripts/manylinux1-check.py
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube_prepare.sh
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/cube_prepare.sh
--- a/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_server_gpu.py
+++ b/python/examples/grpc_impl_example/criteo_ctr_with_cube/test_server_gpu.py
--- a/tools/dockerfile/root/.bashrc
+++ b/tools/dockerfile/root/.bashrc
--- a/tools/dockerfile/root/.gitconfig
+++ b/tools/dockerfile/root/.gitconfig
--- a/tools/dockerfile/root/.scripts/git-completion.sh
+++ b/tools/dockerfile/root/.scripts/git-completion.sh
--- a/tools/dockerfile/root/.scripts/git-prompt.sh
+++ b/tools/dockerfile/root/.scripts/git-prompt.sh