Merge pull request #1515 from bjjwwang/v0.7.0

pick 3 PRs to V0.7.0

Merge pull request #1515 from bjjwwang/v0.7.0
pick 3 PRs to V0.7.0
39e21edd · Jiawei Wang · GitHub · 1410fea6 · 3b4a7e73 · 39e21edd
20 changed file
--- a/core/general-client/README_CN.md
+++ b/core/general-client/README_CN.md
@@ -9,7 +9,7 @@
 以fit_a_line模型为例，服务端启动与常规BRPC-Server端启动命令一样。

 ```
-cd ../../python/examples/fit_a_line
+cd ../../examples/C++/fit_a_line
 sh get_data.sh
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393
 ```

--- a/doc/C++_Serving/ABTest_CN.md
+++ b/doc/C++_Serving/ABTest_CN.md
@@ -30,7 +30,7 @@ pip install Shapely

 ### 启动Server端

-这里采用[Docker方式](../Run_In_Docker_CN.md)启动Server端服务。
+这里采用[Docker方式](../Install_CN.md)启动Server端服务。

 首先启动BOW Server，该服务启用`8000`端口：


--- a/doc/Docker_Images_CN.md
+++ b/doc/Docker_Images_CN.md
@@ -8,10 +8,10 @@

 您可以通过两种方式获取镜像。

-1. 通过 TAG 直接从 `registry.baidubce.com ` 或 拉取镜像，具体TAG请参见下文的**镜像说明**章节的表格。
+1. 通过 TAG 直接从 dockerhub 或 `registry.baidubce.com` 拉取镜像，具体TAG请参见下文的**镜像说明**章节的表格。

   ```shell
-   docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
+   docker pull paddlepaddle/serving:<TAG> # 如果连接dockerhub网速不佳可以尝试registry.baidubce.com/paddlepaddle/serving:<TAG>
   ```

 2. 基于 Dockerfile 构建镜像
@@ -19,27 +19,25 @@
   建立新目录，复制对应 Dockerfile 内容到该目录下 Dockerfile 文件。执行

   ```shell
-   cd tools
-   docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
+   docker build -f tools/${DOCKERFILE} -t <image-name>:<images-tag> .
   ```
   


 ## 镜像说明

-运行时镜像不能用于开发编译。
 若需要基于源代码二次开发编译，请使用后缀为-devel的版本。
-**在TAG列，latest也可以替换成对应的版本号，例如0.5.0/0.4.1等，但需要注意的是，部分开发环境随着某个版本迭代才增加，因此并非所有环境都有对应的版本号可以使用。**
+**在TAG列，0.7.0也可以替换成对应的版本号，例如0.5.0/0.4.1等，但需要注意的是，部分开发环境随着某个版本迭代才增加，因此并非所有环境都有对应的版本号可以使用。**

-**cuda10.1-cudnn7-gcc54环境尚未同步到镜像仓库，如果您需要相关镜像请运行相关dockerfile**

 |                         镜像选择                         |   操作系统    |             TAG              |                          Dockerfile                          |
 | :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
-|                       CPU development                        | Ubuntu16 |         latest-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
-|              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | latest-cuda10.1-cudnn7-gcc54-devel (not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
-|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | latest-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
-|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | latest-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
-|              GPU (cuda11.2-cudnn8-tensorRT7) development               | Ubuntu18 | latest-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |
+|                       CPU development                        | Ubuntu16 |         0.7.0-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
+|              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | 0.7.0-cuda10.1-cudnn7-gcc54-devel (not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
+|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | 0.7.0-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
+|              GPU (cuda10.2-cudnn7-tensorRT6) development               | Ubuntu16 | 0.7.0-cuda10.2-cudnn7-devel | [Dockerfile.cuda10.2-cudnn7.devel](../tools/Dockerfile.cuda10.2-cudnn7.devel) |
+|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | 0.7.0-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
+|              GPU (cuda11.2-cudnn8-tensorRT8) development               | Ubuntu16 | 0.7.0-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |

 **Java镜像：**
 ```
@@ -63,38 +61,24 @@ registry.baidubce.com/paddlepaddle/serving:xpu-x86 # for x86 xpu user

 # （附录）所有镜像列表

-编译镜像：

 开发镜像:

 | Env      | Version | Docker images tag            | OS        | Gcc Version |
 |----------|---------|------------------------------|-----------|-------------|
-|    CPU   | >=0.5.0 | 0.6.2-devel                 | Ubuntu 16 |  8.2.0       |
+|    CPU   | >=0.5.0 | 0.7.0-devel                 | Ubuntu 16 |  8.2.0       |
 |          | <=0.4.0 | 0.4.0-devel                  | CentOS 7  | 4.8.5       |
-| Cuda10.1 | >=0.5.0 | 0.6.2-cuda10.1-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
-|          | <=0.4.0 | 0.6.2-cuda10.1-cudnn7-devel    | CentOS 7  | 4.8.5     |
-| Cuda10.2 | >=0.5.0 | 0.6.2-cuda10.2-cudnn8-devel  | Ubuntu 16 |   8.2.0       |
+| Cuda10.1 | >=0.5.0 | 0.7.0-cuda10.1-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel    | CentOS 7  | 4.8.5     |
+| Cuda10.2+Cudnn7 | >=0.5.0 | 0.7.0-cuda10.2-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
 |          | <=0.4.0 | Nan                          | Nan       | Nan         |
-| Cuda11.0 | >=0.5.0 | 0.6.2-cuda11.0-cudnn8-devel | Ubuntu 18 |    8.2.0       |
+| Cuda10.2+Cudnn8 | >=0.5.0 | 0.7.0-cuda10.2-cudnn8-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | Nan                          | Nan       | Nan         |
+| Cuda11.2 | >=0.5.0 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16 |    8.2.0       |
 |          | <=0.4.0 | Nan                          | Nan       | Nan         |

 运行镜像:

 运行镜像比开发镜像更加轻量化, 运行镜像提供了serving的whl和bin，但为了运行期更小的镜像体积，没有提供诸如cmake这样但开发工具。 如果您想了解有关信息，请检查文档[在Kubernetes上使用Paddle Serving](./Run_On_Kubernetes_CN.md)。

-| ENV                                      | Python Version | Tag                         |
-|------------------------------------------|----------------|-----------------------------|
-| cpu                                      | 3.6            | 0.6.2-py36-runtime          |
-| cpu                                      | 3.7            | 0.6.2-py37-runtime          |
-| cpu                                      | 3.8            | 0.6.2-py38-runtime          |
-| cuda-10.1 + cudnn-7.6.5 + tensorrt-6.0.1 | 3.6            | 0.6.2-cuda10.1-py36-runtime |
-| cuda-10.1 + cudnn-7.6.5 + tensorrt-6.0.1 | 3.7            | 0.6.2-cuda10.1-py37-runtime |
-| cuda-10.1 + cudnn-7.6.5 + tensorrt-6.0.1 | 3.8            | 0.6.2-cuda10.1-py38-runtime |
-| cuda-10.2 + cudnn-8.2.0 + tensorrt-7.1.3 | 3.6            | 0.6.2-cuda10.2-py36-runtime |
-| cuda-10.2 + cudnn-8.2.0 + tensorrt-7.1.3 | 3.7            | 0.6.2-cuda10.2-py37-runtime |
-| cuda-10.2 + cudnn-8.2.0 + tensorrt-7.1.3 | 3.8            | 0.6.2-cuda10.2-py38-runtime |
-| cuda-11 + cudnn-8.0.5 + tensorrt-7.1.3   | 3.6            | 0.6.2-cuda11-py36-runtime   |
-| cuda-11 + cudnn-8.0.5 + tensorrt-7.1.3   | 3.7            | 0.6.2-cuda11-py37-runtime   |
-| cuda-11 + cudnn-8.0.5 + tensorrt-7.1.3   | 3.8            | 0.6.2-cuda11-py38-runtime   |
-
-**注意事项：** 如果您在0.5.0及以上版本需要在一个容器当中同时运行CPU server和GPU server，需要选择Cuda10.1/10.2/11的镜像，因为他们和CPU环境有着相同版本的gcc。
+
--- a/doc/Docker_Images_EN.md
+++ b/doc/Docker_Images_EN.md
@@ -8,10 +8,10 @@ This document maintains a list of docker images provided by Paddle Serving.

 You can get images in two ways:

-1. Pull image directly from `registry.baidubce.com ` through TAG:
+1. Pull image directly from dockerhub or `registry.baidubce.com ` through TAG:

   ```shell
-   docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
+   docker pull docker pull paddlepaddle/serving:<TAG>  # if it is slow connection to dockerhub, please try registry.baidubce.com
   ```

 2. Building image based on dockerfile
@@ -19,25 +19,28 @@ You can get images in two ways:
   Create a new folder and copy Dockerfile to this folder, and run the following command:

   ```shell
-   docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
+   docker build -f tools/${DOCKERFILE} -t <image-name>:<images-tag> .
   ```



 ## Image description

-Runtime images cannot be used for compilation.
 If you want to customize your Serving based on source code, use the version with the suffix - devel.

 **cuda10.1-cudnn7-gcc54 image is not ready, you should run from dockerfile if you need it.**

-|                         Description                          |   OS    |             TAG              |                          Dockerfile                          |
+If you need to develop and compile based on the source code, please use the version with the suffix -devel.
+**In the TAG column, 0.7.0 can also be replaced with the corresponding version number, such as 0.5.0/0.4.1, etc., but it should be noted that some development environments only increase with a certain version iteration, so not all environments All have the corresponding version number can be used.**
+
+|                         Description                         |   OS    |             TAG              |                          Dockerfile                          |
 | :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
-|                       CPU development                        | Ubuntu16 |         latest-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
-|              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | latest-cuda10.1-cudnn7-gcc54-devel(not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
-|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | latest-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
-|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | latest-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
-|              GPU (cuda11.2-cudnn8-tensorRT7) development               | Ubuntu18 | latest-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |
+|                       CPU development                        | Ubuntu16 |         0.7.0-devel         |        [Dockerfile.devel](../tools/Dockerfile.devel)         |
+|              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | 0.7.0-cuda10.1-cudnn7-gcc54-devel (not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
+|              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | 0.7.0-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
+|              GPU (cuda10.2-cudnn7-tensorRT6) development               | Ubuntu16 | 0.7.0-cuda10.2-cudnn7-devel | [Dockerfile.cuda10.2-cudnn7.devel](../tools/Dockerfile.cuda10.2-cudnn7.devel) |
+|              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | 0.7.0-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
+|              GPU (cuda11.2-cudnn8-tensorRT8) development               | Ubuntu16 | 0.7.0-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |

 **Java Client:**
 ```
@@ -64,34 +67,20 @@ Develop Images:

 | Env      | Version | Docker images tag            | OS        | Gcc Version |
 |----------|---------|------------------------------|-----------|-------------|
-|    CPU   | >=0.5.0 | 0.6.2-devel                 | Ubuntu 16 |  8.2.0       |
+|    CPU   | >=0.5.0 | 0.7.0-devel                 | Ubuntu 16 |  8.2.0       |
 |          | <=0.4.0 | 0.4.0-devel                  | CentOS 7  | 4.8.5       |
-| Cuda10.1 | >=0.5.0 | 0.6.2-cuda10.1-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
-|          | 0.6.2   | 0.6.2-cuda10.1-cudnn7-gcc54-devel(not ready)  | Ubuntu 16 |  5.4.0 |
-|          | <=0.4.0 | 0.6.2-cuda10.1-cudnn7-devel    | CentOS 7  | 4.8.5     |
-| Cuda10.2 | >=0.5.0 | 0.6.2-cuda10.2-cudnn8-devel  | Ubuntu 16 |   8.2.0       |
+| Cuda10.1 | >=0.5.0 | 0.7.0-cuda10.1-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel    | CentOS 7  | 4.8.5     |
+| Cuda10.2+Cudnn7 | >=0.5.0 | 0.7.0-cuda10.2-cudnn7-devel  | Ubuntu 16 |   8.2.0       |
+|          | <=0.4.0 | Nan                          | Nan       | Nan         |
+| Cuda10.2+Cudnn8 | >=0.5.0 | 0.7.0-cuda10.2-cudnn8-devel  | Ubuntu 16 |   8.2.0       |
 |          | <=0.4.0 | Nan                          | Nan       | Nan         |
-| Cuda11.0 | >=0.5.0 | 0.6.2-cuda11.0-cudnn8-devel | Ubuntu 18 |    8.2.0       |
+| Cuda11.2 | >=0.5.0 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16 |    8.2.0       |
 |          | <=0.4.0 | Nan                          | Nan       | Nan         |

+
 Running Images:

 Running Images is lighter than Develop Images, and Running Images are made up with serving whl and bin, but without develop tools like cmake because of lower image size. If you want to know about it, plese check the document [Paddle Serving on Kubernetes.](./Run_On_Kubernetes_CN.md).


-| ENV                                      | Python Version | Tag                         |
-|------------------------------------------|----------------|-----------------------------|
-| cpu                                      | 3.6            | 0.6.2-py36-runtime          |
-| cpu                                      | 3.7            | 0.6.2-py37-runtime          |
-| cpu                                      | 3.8            | 0.6.2-py38-runtime          |
-| cuda-10.1 + cudnn-7.6.5 + tensorrt-6.0.1 | 3.6            | 0.6.2-cuda10.1-py36-runtime |
-| cuda-10.1 + cudnn-7.6.5 + tensorrt-6.0.1 | 3.7            | 0.6.2-cuda10.1-py37-runtime |
-| cuda-10.1 + cudnn-7.6.5 + tensorrt-6.0.1 | 3.8            | 0.6.2-cuda10.1-py38-runtime |
-| cuda-10.2 + cudnn-8.2.0 + tensorrt-7.1.3 | 3.6            | 0.6.2-cuda10.2-py36-runtime |
-| cuda-10.2 + cudnn-8.2.0 + tensorrt-7.1.3 | 3.7            | 0.6.2-cuda10.2-py37-runtime |
-| cuda-10.2 + cudnn-8.2.0 + tensorrt-7.1.3 | 3.8            | 0.6.2-cuda10.2-py38-runtime |
-| cuda-11 + cudnn-8.0.5 + tensorrt-7.1.3   | 3.6            | 0.6.2-cuda11-py36-runtime   |
-| cuda-11 + cudnn-8.0.5 + tensorrt-7.1.3   | 3.7            | 0.6.2-cuda11-py37-runtime   |
-| cuda-11 + cudnn-8.0.5 + tensorrt-7.1.3   | 3.8            | 0.6.2-cuda11-py38-runtime   |
-
-**Tips:**  If you want to use CPU server and GPU server (version>=0.5.0) at the same time, you should check the gcc version,  only Cuda10.1/10.2/11 can run with CPU server owing to the same gcc version(8.2).
--- a/doc/FAQ_CN.md
+++ b/doc/FAQ_CN.md
@@ -142,7 +142,7 @@ make: *** [all] Error 2

 #### Q：使用过程中出现CXXABI错误。

-这个问题出现的原因是Python使用的gcc版本和Serving所需的gcc版本对不上。对于Docker用户，推荐使用[Docker容器](./Run_In_Docker_CN.md)，由于Docker容器内的Python版本与Serving在发布前都做过适配，这样就不会出现类似的错误。如果是其他开发环境，首先需要确保开发环境中具备GCC 8.2，如果没有gcc 8.2，参考安装方式
+这个问题出现的原因是Python使用的gcc版本和Serving所需的gcc版本对不上。对于Docker用户，推荐使用[Docker容器](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Docker_Images_CN.md)，由于Docker容器内的Python版本与Serving在发布前都做过适配，这样就不会出现类似的错误。如果是其他开发环境，首先需要确保开发环境中具备GCC 8.2，如果没有gcc 8.2，参考安装方式

 ```bash
 wget -q https://paddle-ci.gz.bcebos.com/gcc-8.2.0.tar.xz 
@@ -236,7 +236,7 @@ InvalidArgumentError: Device id must be less than GPU count, but received id is:

 #### Q: python编译的GCC版本与serving的版本不匹配

-**A:**:1)使用[GPU docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Run_In_Docker_CN.md#gpunvidia-docker)解决环境问题；2)修改anaconda的虚拟环境下安装的python的gcc版本[改变python的GCC编译环境](https://www.jianshu.com/p/c498b3d86f77) 
+**A:**:1)使用GPU Dockers, [这里是Docker镜像列表](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Docker_Images_CN.md)解决环境问题；2)修改anaconda的虚拟环境下安装的python的gcc版本[改变python的GCC编译环境](https://www.jianshu.com/p/c498b3d86f77) 

 #### Q: paddle-serving是否支持本地离线安装 


--- a/doc/Install_CN.md
+++ b/doc/Install_CN.md
@@ -2,7 +2,7 @@

 (简体中文|[English](./Install_EN.md))

-**强烈建议**您在**Docker内构建**Paddle Serving，请查看[如何在Docker中运行PaddleServing](Run_In_Docker_CN.md)。更多镜像请查看[Docker镜像列表](Docker_Images_CN.md)。
+**强烈建议**您在**Docker内构建**Paddle Serving，更多镜像请查看[Docker镜像列表](Docker_Images_CN.md)。

 **提示-1**：本项目仅支持<mark>**Python3.6/3.7/3.8**</mark>，接下来所有的与Python/Pip相关的操作都需要选择正确的Python版本。


--- a/doc/Install_EN.md
+++ b/doc/Install_EN.md
@@ -2,7 +2,7 @@

 ([简体中文](./Install_CN.md)|English)

-**Strongly recommend** you build **Paddle Serving** in Docker, please check [How to run PaddleServing in Docker](Run_In_Docker_CN.md). For more images, please refer to [Docker Image List](Docker_Images_CN.md).
+**Strongly recommend** you build **Paddle Serving** in Docker. For more images, please refer to [Docker Image List](Docker_Images_CN.md).

 **Tip-1**: This project only supports <mark>**Python3.6/3.7/3.8**</mark>, all subsequent operations related to Python/Pip need to select the correct Python version.


--- a/doc/Latest_Packages_CN.md
+++ b/doc/Latest_Packages_CN.md
@@ -10,14 +10,15 @@ https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py
 ## GPU server
 ### Python 3
 ```
-#cuda10.1 with TensorRT 6, Compile by gcc8.2
+#cuda10.1 Cudnn 7 with TensorRT 6, Compile by gcc8.2
 https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl
-#cuda10.2 with TensorRT 7, Compile by gcc8.2
+#cuda10.2 Cudnn 7 with TensorRT 6, Compile by gcc5.4
 https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl
-#cuda11.0 with TensorRT 7 (beta), Compile by gcc8.2
-https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post11-py3-none-any.whl
+#cuda10.2 Cudnn 8 with TensorRT 7, Compile by gcc8.2
+https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl
+#cuda11.2 Cudnn 8 with TensorRT 8 (beta), Compile by gcc8.2
+https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl
 ```
-**Tips:**  If you want to use CPU server and GPU server at the same time, you should check the gcc version,  only Cuda10.1/10.2/11 can run with CPU server owing to the same gcc version(8.2).

 ## Client

@@ -48,16 +49,16 @@ for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as f

 for arm kunlun user
 ```
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.6.0/paddle_serving_server_xpu-0.6.0.post2-cp36-cp36m-linux_aarch64.whl
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.6.0/paddle_serving_client-0.6.0-cp36-cp36m-linux_aarch64.whl
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.6.0/paddle_serving_app-0.6.0-cp36-cp36m-linux_aarch64.whl
+https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_server_xpu-0.7.0.post2-cp36-cp36m-linux_aarch64.whl
+https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_client-0.7.0-cp36-cp36m-linux_aarch64.whl
+https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36-cp36m-linux_aarch64.whl
 ```
 
 for x86 kunlun user
 ``` 
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.6.0/paddle_serving_server_xpu-0.6.0.post2-cp36-cp36m-linux_x86_64.whl
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.6.0/paddle_serving_client-0.6.0-cp36-cp36m-linux_x86_64.whl
-https://paddle-serving.bj.bcebos.com/whl/xpu/0.6.0/paddle_serving_app-0.6.0-cp36-cp36m-linux_x86_64.whl
+https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_server_xpu-0.7.0.post2-cp36-cp36m-linux_x86_64.whl
+https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_client-0.7.0-cp36-cp36m-linux_x86_64.whl
+https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36-cp36m-linux_x86_64.whl
 ```


@@ -74,10 +75,12 @@ https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0
 https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz
 # Cuda 10.1
 https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz
-# Cuda 10.2
+# Cuda 10.2 + Cudnn 7
 https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz
-# Cuda 11
-https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-cuda11-0.0.0.tar.gz
+# Cuda 10.2 + Cudnn 8
+https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz
+# Cuda 11.2
+https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-cuda112-0.0.0.tar.gz
 ```

 #### How to setup SERVING_BIN offline?

--- a/doc/Process_data_CN.md
+++ b/doc/Process_data_CN.md
@@ -10,7 +10,7 @@ pipeline客户端只做很简单的处理，他们把自然输入转化成可以

 #### 1）字符串/数字

-字符串和数字在这个阶段都以字符串的形式存在。我们以[房价预测](../python/examples/pipeline/simple_web_service)作为例子。房价预测的输入是13个维度的浮点数去描述一个住房的特征。在客户端阶段就可以直接如下所示
+字符串和数字在这个阶段都以字符串的形式存在。我们以[房价预测](../examples/Pipeline/simple_web_service)作为例子。房价预测的输入是13个维度的浮点数去描述一个住房的特征。在客户端阶段就可以直接如下所示

 ```
 curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
@@ -24,11 +24,11 @@ curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value"
 curl -X POST -k http://localhost:18082/bert/prediction -d '{"key": ["x"], "value": ["hello world"]}'
 ```

-当然，复杂的处理也可以把这个curl转换成python语言，详情参见[Bert Pipeline示例](../python/examples/pipeline/bert). 
+当然，复杂的处理也可以把这个curl转换成python语言，详情参见[Bert Pipeline示例](../examples/Pipeline/PaddleNLP/bert). 

 #### 2）图片

-图片在Paddle的输入通常需要转换成numpy array，但是在客户端阶段，不需要转换成numpy array，因为那样比较耗费空间，在这个阶段我们用base64 string来传输就可以了，到了服务端的前处理再去解读base64转换成numpy array。详情参见[图像分类pipeline示例](../python/examples/pipeline/PaddleClas/DarkNet53/pipeline_http_client.py)，我们也贴出部分代码
+图片在Paddle的输入通常需要转换成numpy array，但是在客户端阶段，不需要转换成numpy array，因为那样比较耗费空间，在这个阶段我们用base64 string来传输就可以了，到了服务端的前处理再去解读base64转换成numpy array。详情参见[图像分类pipeline示例](../examples/Pipeline/PaddleClas/DarkNet53/pipeline_http_client.py)，我们也贴出部分代码

 ```python
 def cv2_to_base64(image):
@@ -52,7 +52,7 @@ if __name__ == "__main__":

 #### 1）字符串/数字

-刚才提到的房价预测示例，[服务端程序](../python/examples/pipeline/simple_web_service/web_service.py)在这里。
+刚才提到的房价预测示例，[服务端程序](../examples/Pipeline/simple_web_service/web_service.py)在这里。

 ```python
    def init_op(self):
@@ -115,7 +115,7 @@ if __name__ == "__main__":

 #### 2）图片处理

-图像的前处理阶段，前面提到的图像处理程序，[服务端程序](../python/examples/pipeline/PaddleClas/DarkNet53/resnet50_web_service.py)如下。
+图像的前处理阶段，前面提到的图像处理程序，[服务端程序](../examples/Pipeline/PaddleClas/DarkNet53/resnet50_web_service.py)如下。

 ```python
    def init_op(self):

--- a/doc/Python_Pipeline/Pipeline_Design_CN.md
+++ b/doc/Python_Pipeline/Pipeline_Design_CN.md
@@ -20,7 +20,7 @@ Paddle Serving提供了用户友好的多模型组合服务编程框架，Pipeli
 Server端基于<b>RPC服务层</b>和<b>图执行引擎</b>构建，两者的关系如下图所示。

 <div align=center>
-<img src='images/pipeline_serving-image1.png' height = "250" align="middle"/>
+<img src='../images/pipeline_serving-image1.png' height = "250" align="middle"/>
 </div>

 </n>
@@ -65,7 +65,7 @@ Response中`err_no`和`err_msg`表达处理结果的正确性和错误信息，`
 - 对于 OP 之间需要传输过大数据的情况，可以考虑 RAM DB 外存进行全局存储，通过在 Channel 中传递索引的 Key 来进行数据传输

 <div align=center>
-<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
+<img src='../images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </div>


@@ -84,7 +84,7 @@ Response中`err_no`和`err_msg`表达处理结果的正确性和错误信息，`
 - 下图为图执行引擎中 Channel 的设计，采用 input buffer 和 output buffer 进行多 OP 输入或多 OP 输出的数据对齐，中间采用一个 Queue 进行缓冲

 <div align=center>
-<img src='images/pipeline_serving-image3.png' height = "500" align="middle"/>
+<img src='../images/pipeline_serving-image3.png' height = "500" align="middle"/>
 </div>

 #### <b>1.2.3 预测类型的设计</b>
@@ -319,16 +319,16 @@ class ResponseOp(Op):
 所有Pipeline示例在[examples/Pipeline/](../../examples/Pipeline) 目录下，目前有7种类型模型示例：
 - [PaddleClas](../../examples/Pipeline/PaddleClas) 
 - [Detection](../../examples/Pipeline/PaddleDetection)  
- [bert](../../examples/Pipeline/bert)
+- [bert](../../examples/Pipeline/PaddleNLP/bert)
 - [imagenet](../../examples/Pipeline/PaddleClas/imagenet)
 - [imdb_model_ensemble](../../examples/Pipeline/imdb_model_ensemble)
 - [ocr](../../examples/Pipeline/PaddleOCR/ocr)
 - [simple_web_service](../../examples/Pipeline/simple_web_service)

-以 imdb_model_ensemble 为例来展示如何使用 Pipeline Serving，相关代码在 `python/examples/pipeline/imdb_model_ensemble` 文件夹下可以找到，例子中的 Server 端结构如下图所示：
+以 imdb_model_ensemble 为例来展示如何使用 Pipeline Serving，相关代码在 `Serving/examples/Pipeline/imdb_model_ensemble` 文件夹下可以找到，例子中的 Server 端结构如下图所示：

 <div align=center>
-<img src='images/pipeline_serving-image4.png' height = "200" align="middle"/>
+<img src='../images/pipeline_serving-image4.png' height = "200" align="middle"/>
 </div>

 ### 3.1 Pipeline部署需要的文件
@@ -352,13 +352,13 @@ class ResponseOp(Op):
 ### 3.2 获取模型文件

 ```shell
-cd python/examples/pipeline/imdb_model_ensemble
+cd Serving/examples/Pipeline/imdb_model_ensemble
 sh get_data.sh
 python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
 python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
 ```

-PipelineServing 也支持本地自动启动 PaddleServingService，请参考 `python/examples/pipeline/ocr` 下的例子。
+PipelineServing 也支持本地自动启动 PaddleServingService，请参考 `Serving/examples/Pipeline/PaddleOCR/ocr` 下的例子。

 ### 3.3 创建config.yaml
 本示例采用了brpc的client连接类型，还可以选择grpc或local_predictor。
@@ -700,7 +700,7 @@ Pipeline Serving支持低精度推理，CPU、GPU和TensoRT支持的精度类型
  - fp16
  - int8 

-参考[simple_web_service](../python/examples/pipeline/simple_web_service)示例
+参考[simple_web_service](../../examples/Pipeline/simple_web_service)示例
 ***

 ## 5.日志追踪

--- a/doc/Python_Pipeline/Pipeline_Design_EN.md
+++ b/doc/Python_Pipeline/Pipeline_Design_EN.md
@@ -18,7 +18,7 @@ Paddle Serving provides a user-friendly programming framework for multi-model co
 The Server side is built based on <b>RPC Service</b> and <b>graph execution engine</b>. The relationship between them is shown in the following figure.

 <div align=center>
-<img src='images/pipeline_serving-image1.png' height = "250" align="middle"/>
+<img src='../images/pipeline_serving-image1.png' height = "250" align="middle"/>
 </div>

 ### 1.1 RPC Service
@@ -61,7 +61,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 - For cases where large data needs to be transferred between OPs, consider RAM DB external memory for global storage and data transfer by passing index keys in Channel.

 <div align=center>
-<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
+<img src='../images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </div>


@@ -80,7 +80,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 - The following illustration shows the design of Channel in the graph execution engine, using input buffer and output buffer to align data between multiple OP inputs and multiple OP outputs, with a queue in the middle to buffer.

 <div align=center>
-<img src='images/pipeline_serving-image3.png' height = "500" align="middle"/>
+<img src='../images/pipeline_serving-image3.png' height = "500" align="middle"/>
 </div>


@@ -314,16 +314,16 @@ The default implementation of **pack_response_package** is to convert the dictio
 All examples of pipelines are in [examples/pipeline/](../../examples/Pipeline) directory, There are 7 types of model examples currently:
 - [PaddleClas](../../examples/Pipeline/PaddleClas) 
 - [Detection](../../examples/Pipeline/PaddleDetection)  
- [bert](../../examples/Pipeline/bert)
+- [bert](../../examples/Pipeline/PaddleNLP/bert)
 - [imagenet](../../examples/Pipeline/PaddleClas/imagenet)
 - [imdb_model_ensemble](../../examples/Pipeline/imdb_model_ensemble)
 - [ocr](../../examples/Pipeline/PaddleOCR/ocr)
 - [simple_web_service](../../examples/Pipeline/simple_web_service)

-Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `python/examples/pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:
+Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `Serving/examples/Pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:

 <div align=center>
-<img src='images/pipeline_serving-image4.png' height = "200" align="middle"/>
+<img src='../images/pipeline_serving-image4.png' height = "200" align="middle"/>
 </div>

 ### 3.1 Files required for pipeline deployment
@@ -348,13 +348,13 @@ Five types of files are needed, of which model files, configuration files, and s
 ### 3.2 Get model files

 ```shell
-cd python/examples/pipeline/imdb_model_ensemble
+cd Serving/examples/Pipeline/imdb_model_ensemble
 sh get_data.sh
 python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
 python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
 ```

-PipelineServing also supports local automatic startup of PaddleServingService. Please refer to the example `python/examples/pipeline/ocr`.
+PipelineServing also supports local automatic startup of PaddleServingService. Please refer to the example `Serving/examples/Pipeline/PaddleOCR/ocr`.


 ### 3.3 Create config.yaml
@@ -705,7 +705,7 @@ Pipeline Serving supports low-precision inference. The precision types supported
  - fp16
  - int8 

-Reference the example [simple_web_service](../python/examples/pipeline/simple_web_service).
+Reference the example [simple_web_service](../../examples/Pipeline/simple_web_service).

 ***
 

--- a/doc/Run_In_Docker_CN.md
+++ b/doc/Run_In_Docker_CN.md
-# 如何在Docker中运行PaddleServing
-
-(简体中文|[English](Run_In_Docker_EN.md))
-
-Docker最大的好处之一就是可移植性，可在多种操作系统和主流的云计算平台部署。使用Paddle Serving Docker镜像可在Linux、Mac和Windows平台部署。
-
-## 环境要求
-
-Docker（GPU版本需要在GPU机器上安装nvidia-docker）
-
-该文档以Python2为例展示如何在Docker中运行Paddle Serving，您也可以通过将`python`更换成`python3`来用Python3运行相关命令。
-
-## CPU版本
-
-### 获取镜像
-
-参考[该文档](Docker_Images_CN.md)获取镜像：
-
-以CPU编译镜像为例
-
-```shell
-docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
-```
-
-### 创建容器并进入
-
-```bash
-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-devel
-docker exec -it test bash
-```
-
-`-p`选项是为了将容器的`9292`端口映射到宿主机的`9292`端口。
-
-### 安装PaddleServing
-
-镜像里自带对应镜像tag版本的`paddle_serving_server`，`paddle_serving_client`，`paddle_serving_app`，如果用户不需要更改版本，可以直接使用，适用于没有外网服务的环境。
-
-如果需要更换版本，请参照首页的指导，下载对应版本的pip包。
-
-## GPU 版本
-
-```shell
-docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
-```
-
-### 创建容器并进入
-
-```bash
-nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
-nvidia-docker exec -it test bash
-```
-或者
-```bash
-docker run --gpus all -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
-docker exec -it test bash
-```
-
-`-p`选项是为了将容器的`9292`端口映射到宿主机的`9292`端口。
-
-### 安装PaddleServing
-
-请参照首页的指导，下载对应版本的pip包。[最新安装包合集](Latest_Packages_CN.md)
-
-## 注意事项
-
- 运行时镜像不能用于开发编译。如果想要从源码编译，请查看[如何编译PaddleServing](Compile_CN.md)。
- 由于Cuda10和Cuda9的环境受限于GCC版本，无法同时运行CPU版本的`paddle_serving_server`，因此如果想要在GPU环境中同时使用CPU版本的`paddle_serving_server`，请选择Cuda10.1，Cuda10.2和Cuda11版本的镜像。
--- a/doc/Run_In_Docker_EN.md
+++ b/doc/Run_In_Docker_EN.md
-# How to run PaddleServing in Docker
-
-([简体中文](Run_In_Docker_CN.md)|English)
-
-One of the biggest benefits of Docker is portability, which can be deployed on multiple operating systems and mainstream cloud computing platforms. The Paddle Serving Docker image can be deployed on Linux, Mac and Windows platforms.
-
-## Requirements
-
-Docker (GPU version requires nvidia-docker to be installed on the GPU machine)
-
-This document takes Python2 as an example to show how to run Paddle Serving in docker. You can also use Python3 to run related commands by replacing `python` with `python3`.
-
-## CPU
-
-### Get docker image
-
-Refer to [this document](Docker_Images_EN.md) for a docker image:
-
-```shell
-docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
-```
-
-
-### Create container
-
-```bash
-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-devel
-docker exec -it test bash
-```
-
-The `-p` option is to map the `9292` port of the container to the `9292` port of the host.
-
-### Install PaddleServing
-
-Please refer to the instructions on the homepage to download the pip package of the corresponding version.
-  
-
-## GPU
-
-The GPU version is basically the same as the CPU version, with only some differences in interface naming (GPU version requires nvidia-docker to be installed on the GPU machine).
-
-### Get docker image
-
-Refer to [this document](Docker_Images_EN.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
-
-```shell
-docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
-```
-
-### Create container
-
-```bash
-nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
-nvidia-docker exec -it test bash
-```
-
-or
-
-```bash
-docker run --gpus all -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
-docker exec -it test bash
-```
-
-The `-p` option is to map the `9292` port of the container to the `9292` port of the host.
-
-### Install PaddleServing
-
-The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
-
-If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version. [LATEST_PACKAGES](./Latest_Packages_CN.md)
-
-## Precautious
-
- Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](Compile_EN.md).
- If you use Cuda9 and Cuda10 docker images, you cannot use `paddle_serving_server` CPU version at the same time, due to the limitation of gcc version. If you want to use both in one docker image, please choose images of Cuda10.1, Cuda10.2 and Cuda11.
--- a/examples/C++/PaddleRec/criteo_ctr_with_cube/README.md
+++ b/examples/C++/PaddleRec/criteo_ctr_with_cube/README.md
@@ -4,7 +4,7 @@

 ### Get Sample Dataset

-go to directory `python/examples/criteo_ctr_with_cube`
+go to directory `examples/C++/PaddleRec/criteo_ctr_with_cube`
 ```
 sh get_data.sh
 ```
@@ -45,7 +45,7 @@ python3 test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data

 CPU ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 

-Model ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/criteo_ctr_with_cube/network_conf.py)
+Model ：[Criteo CTR](./network_conf.py)

 server core/thread num ： 4/8


--- a/examples/C++/PaddleRec/criteo_ctr_with_cube/README_CN.md
+++ b/examples/C++/PaddleRec/criteo_ctr_with_cube/README_CN.md
@@ -2,7 +2,7 @@
 (简体中文|[English](./README.md))

 ### 获取样例数据
-进入目录 `python/examples/criteo_ctr_with_cube`
+进入目录 `examples/C++/PaddleRec/criteo_ctr_with_cube`
 ```
 sh get_data.sh
 ```
@@ -43,7 +43,7 @@ python3 test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data

 设备 ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 

-模型 ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/criteo_ctr_with_cube/network_conf.py)
+模型 ：[Criteo CTR](./network_conf.py)

 server core/thread num ： 4/8


--- a/examples/util/README.md
+++ b/examples/util/README.md
@@ -26,6 +26,6 @@ The script converts the time-dot information in the log into a json format and s

 Specific operation: Open the chrome browser, enter `chrome://tracing/` in the address bar, jump to the tracing page, click the `load` button, and open the saved trace file to visualize the time information of each stage of the prediction service.

-The data visualization output is shown as follow, it uses [bert as service example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
+The data visualization output is shown as follow, it uses [bert as service example](../C++/PaddleNLP/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.

 ![timeline](../../doc/images/timeline-example.png)
--- a/examples/util/README_CN.md
+++ b/examples/util/README_CN.md
@@ -26,6 +26,6 @@ python3 timeline_trace.py profile trace

 具体操作：打开chrome浏览器，在地址栏输入chrome://tracing/，跳转至tracing页面，点击load按钮，打开保存的trace文件，即可将预测服务的各阶段时间信息可视化。

-效果如下图，图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务，server端开启4卡预测，client端启动4进程，batch size为1时的各阶段timeline，其中bert_pre代表client端的数据预处理阶段，client_infer代表client完成预测请求的发送和接收结果的阶段，图中的process代表的是client的进程号，每个进进程的第二行展示的是server各个op的timeline。
+效果如下图，图中展示了使用[bert示例](../C++/PaddleNLP/bert)的GPU预测服务，server端开启4卡预测，client端启动4进程，batch size为1时的各阶段timeline，其中bert_pre代表client端的数据预处理阶段，client_infer代表client完成预测请求的发送和接收结果的阶段，图中的process代表的是client的进程号，每个进进程的第二行展示的是server各个op的timeline。

 ![timeline](../../doc/images/timeline-example.png)
--- a/java/README_CN.md
+++ b/java/README_CN.md
@@ -34,7 +34,7 @@ mvn install
 以fit_a_line模型为例，服务端启动与常规BRPC-Server端启动命令一样。

 ```
-cd ../../python/examples/fit_a_line
+cd ../../examples/C++/fit_a_line
 sh get_data.sh
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393
 ```
@@ -59,7 +59,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Paddle
 对于input data type = string类型，以IMDB model ensemble模型为例，服务端启动

 ```
-cd ../../python/examples/pipeline/imdb_model_ensemble
+cd ../examples/Pipeline/imdb_model_ensemble
 sh get_data.sh
 python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
 python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
@@ -84,7 +84,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Pipeli
 ### 对于input data type = INDArray类型，以Simple Pipeline WebService中的uci_housing_model模型为例，服务端启动

 ```
-cd ../../python/examples/pipeline/simple_web_service
+cd ../examples/Pipeline/simple_web_service
 sh get_data.sh
 python web_service_java.py &>log.txt &
 ```
@@ -102,7 +102,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Pipeli

 2.目前Serving已推出Pipeline模式（原理详见[Pipeline Serving](../doc/Python_Pipeline/Pipeline_Design_CN.md)），面向Java的Pipeline Serving Client已发布。

-3.注意PipelineClientExample.java中的ip和port（位于java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)），需要与对应Pipeline server的config.yaml文件中配置的ip和port相对应。（以IMDB model ensemble模型为例，位于python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml)）
+3.注意PipelineClientExample.java中的ip和port（位于java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)），需要与对应Pipeline server的config.yaml文件中配置的ip和port相对应。（以IMDB model ensemble模型为例，位于python/examples/pipeline/imdb_model_ensemble/[config.yaml](../examples/Pipeline/imdb_model_ensemble/config.yml)）

 ### 开发部署指导


--- a/python/paddle_serving_app/README.md
+++ b/python/paddle_serving_app/README.md
@@ -52,7 +52,7 @@ Preprocessing for Chinese semantic representation task.

    - line（st ）：Text input.

-  [example](../examples/bert/bert_client.py)
+  [example](../../examples/C++/PaddleNLP/bert/bert_client.py)

 - class LACReader 
  
@@ -67,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
    - words（st ）：Original text input.
    - crf_decode（np.array）：CRF code predicted by model.

-  [example](../examples/lac/lac_http_client.py)
+  [example](../../examples/C++/PaddleNLP/lac/lac_http_client.py)

 - class SentaReader

@@ -76,9 +76,9 @@ Preprocessing for Chinese word segmentation task.
  - `process(cols)`
    - cols（st ）：Word segmentation result.

-  [example](../examples/senta/senta_web_service.py)
+  [example](../../examples/C++/PaddleNLP/senta/senta_web_service.py)

- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes，[example](../examples/imagenet/resnet50_rpc_client.py)
+- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes，[example](../../examples/C++/PaddleClas/imagenet/resnet50_rpc_client.py)

 - class Sequentia

@@ -144,7 +144,7 @@ This tool is convenient to analyze the proportion of time occupancy in the predi
 Load the trace file generated in the previous step through the load button, you can
 Visualize the time information of each stage of the forecast service.

-As shown in next figure, the figure shows the timeline of GPU prediction service using [bert example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert).
+As shown in next figure, the figure shows the timeline of GPU prediction service using [bert example](../../examples/C++/PaddleNLP/bert).
 The server side starts service with 4 GPU cards, the client side starts 4 processes to request, and the batch size is 1.
 In the figure, bert_pre represents the data pre-processing stage of the client, and client_infer represents the stage where the client completes the sending of the prediction request to the receiving result.
 The process in the figure represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
@@ -157,7 +157,7 @@ The inference op of Paddle Serving is implemented based on Paddle inference lib.
 Before deploying the prediction service, you may need to check the input and output of the prediction service or check the resource consumption.
 Therefore, a local prediction tool is built into the paddle_serving_app, which is used in the same way as sending a request to the server through the client.

-Taking [fit_a_line prediction service](../examples/fit_a_line) as an example, the following code can be used to run local prediction.
+Taking [fit_a_line prediction service](../../examples/C++/fit_a_line) as an example, the following code can be used to run local prediction.

 ```python
 from paddle_serving_app.local_predict import LocalPredictor

--- a/python/paddle_serving_app/README_CN.md
+++ b/python/paddle_serving_app/README_CN.md
@@ -48,7 +48,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的
  - `process(line)`
    - line（str）：输入文本

-  [参考示例](../examples/bert/bert_client.py)
+  [参考示例](../../examples/C++/PaddleNLP/bert/bert_client.py)

 - class LACReader 中文分词预处理

@@ -60,7 +60,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的
    - words（str）：原始文本
    - crf_decode（np.array）：模型预测结果中的CRF编码

-  [参考示例](../examples/lac/lac_http_client.py)
+  [参考示例](../../examples/C++/PaddleNLP/lac/lac_http_client.py)

 - class SentaReader

@@ -69,9 +69,9 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的
  - `process(cols)`
    - cols（str）：分词后的文本

-  [参考示例](../examples/senta/senta_web_service.py)
+  [参考示例](../../examples/C++/PaddleNLP/senta/senta_web_service.py)

- 图像的预处理方法相比于上述的方法更加灵活多变，可以通过以下的多个类进行组合，[参考示例](../examples/imagenet/resnet50_rpc_client.py)
+- 图像的预处理方法相比于上述的方法更加灵活多变，可以通过以下的多个类进行组合，[参考示例](../../examples/C++/PaddleClas/imagenet/resnet50_rpc_client.py)

 - class Sequentia

@@ -135,7 +135,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的

 4. 使用chrome浏览器，打开`chrome://tracing/`网址，通过load按钮加载上一步产生的trace文件，即可将预测服务的各阶段时间信息可视化。

-   效果如下图，图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务，server端开启4卡预测，client端启动4进程，batch size为1时的各阶段timeline。
+   效果如下图，图中展示了使用[bert示例](../../examples/C++/PaddleNLP/bert)的GPU预测服务，server端开启4卡预测，client端启动4进程，batch size为1时的各阶段timeline。
 其中bert_pre代表client端的数据预处理阶段，client_infer代表client完成预测请求的发送到接收结果的阶段，图中的process代表的是client的进程号，每个进程的第二行展示的是server各个op的timeline。

   ![timeline](../../doc/images/timeline-example.png)
@@ -144,7 +144,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的

 Paddle Serving框架的server预测op使用了Paddle 的预测框架，在部署预测服务之前可能需要对预测服务的输入输出进行检验或者查看资源占用等。因此在paddle_serving_app中内置了本地预测工具，使用方式与通过client向服务端发送请求一致。

-以[fit_a_line预测服务](../examples/fit_a_line)为例，使用以下代码即可执行本地预测。
+以[fit_a_line预测服务](../../examples/C++/fit_a_line)为例，使用以下代码即可执行本地预测。

 ```python
 from paddle_serving_app.local_predict import LocalPredictor