update docs

83d2c120 · TeslaZhao · 770e4cd6 · a2b9ec51 · 83d2c120 · 83d2c120
25 changed file
--- a/README.md
+++ b/README.md
@@ -176,8 +176,8 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --p
 | Argument                                       | Type | Default | Description                                           |
 | ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
 | `thread`                                       | int  | `2`     | Number of brpc service thread                         |
-| `op_num`                                       | int[]| `0`     | Thread Number for each model in asynchronous mode     |
+| `runtime_thread_num`                           | int[]| `0`     | Thread Number for each model in asynchronous mode     |
-| `op_max_batch`                                 | int[]| `0`     | Batch Number for each model in asynchronous mode      |
+| `batch_infer_size`                             | int[]| `0`     | Batch Number for each model in asynchronous mode      |
 | `gpu_ids`                                      | str[]| `"-1"`  | Gpu card id for each model                            |
 | `port`                                         | int  | `9292`  | Exposed port of current service to users              |
 | `model`                                        | str[]| `""`    | Path of paddle model directory to be served           |
@@ -197,8 +197,8 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --p
    In asynchronous mode, each model will start n threads of the number you specify, and each thread contains a model instance. In other words, each model is equivalent to a thread pool containing N threads, and the task is taken from the task queue of the thread pool to execute.
    In asynchronous mode, each RPC server thread is only responsible for putting the request into the task queue of the model thread pool. After the task is executed, the completed task is removed from the task queue.
    In the above table, the number of RPC server threads is specified by --thread, and the default value is 2.
-    --op_num specifies the number of threads in the thread pool of each model. The default value is 0, indicating that asynchronous mode is not used.
+    --runtime_thread_num specifies the number of threads in the thread pool of each model. The default value is 0, indicating that asynchronous mode is not used.
-    --op_max_batch specifies the number of batches for each model. The default value is 32. It takes effect when --op_num is not 0.
+    --batch_infer_size specifies the number of batches for each model. The default value is 32. It takes effect when --runtime_thread_num is not 0.
 #### When you want a model to use multiple GPU cards.
 python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0,1,2
 #### When you want 2 models.
@@ -206,7 +206,7 @@ python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_m
 #### When you want 2 models, and want each of them use multiple GPU cards.
 python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2
 #### When a service contains two models, and each model needs to specify multiple GPU cards, and needs asynchronous mode, each model specifies different concurrency number.
-python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --op_num 4 8
+python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --runtime_thread_num 4 8
 </center>
 ```python
@@ -267,6 +267,15 @@ output
 {'err_no': 0, 'err_msg': '', 'key': ['res'], 'value': ["['土地整治与土壤修复研究中心', '华南农业大学1素图']"]}
 ```
+<h3 align="center">Stop Serving/Pipeline service</h3>
+**Method one** ：Ctrl+C to quit
+**Method Two** ：In the path where starting the Serving/Pipeline service or the path which environment variable SERVING_HOME set (the file named ProcessInfo.json exists in this path)
+```
+python3 -m paddle_serving_server.serve stop
+```
 <h2 align="center">Document</h2>

--- a/README_CN.md
+++ b/README_CN.md
@@ -110,12 +110,232 @@ Paddle Serving已全面Paddle训练模型，并实现多个Paddle模型套件服
 </center>
 更多模型示例参考Repo，可进入
+```
+# 启动 CPU Docker
+docker pull registry.baidubce.com/paddlepaddle/serving:0.6.2-devel
+docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.6.2-devel bash
+docker exec -it test bash
+git clone https://github.com/PaddlePaddle/Serving
+```
+```
+# 启动 GPU Docker
+nvidia-docker pull registry.baidubce.com/paddlepaddle/serving:0.6.2-cuda10.2-cudnn8-devel
+nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.6.2-cuda10.2-cudnn8-devel bash
+nvidia-docker exec -it test bash
+git clone https://github.com/PaddlePaddle/Serving
+```
+安装所需的pip依赖
+```
+cd Serving
+pip3 install -r python/requirements.txt
+```
+```shell
+pip3 install paddle-serving-client==0.6.2
+pip3 install paddle-serving-server==0.6.2 # CPU
+pip3 install paddle-serving-app==0.6.2
+pip3 install paddle-serving-server-gpu==0.6.2.post102 #GPU with CUDA10.2 + TensorRT7
+# 其他GPU环境需要确认环境再选择执行哪一条
+pip3 install paddle-serving-server-gpu==0.6.2.post101 # GPU with CUDA10.1 + TensorRT6
+pip3 install paddle-serving-server-gpu==0.6.2.post11 # GPU with CUDA10.1 + TensorRT7
+```
+您可能需要使用国内镜像源（例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`）来加速下载。
+如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。如果您想自行编译，请参照[Paddle Serving编译文档](./doc/COMPILE_CN.md)。
+paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。
+paddle-serving-client和paddle-serving-app安装包支持Linux和Windows，其中paddle-serving-client仅支持python3.6/3.7/3.8。
+**最新的0.6.2的版本，已经不支持Cuda 9.0和Cuda 10.0，Python已不支持2.7和3.5。**
+推荐安装2.1.0及以上版本的paddle
+```
+# CPU环境请执行
+pip3 install paddlepaddle==2.1.0
+# GPU Cuda10.2环境请执行
+pip3 install paddlepaddle-gpu==2.1.0
+```
+**注意**： 如果您的Cuda版本不是10.2，请勿直接执行上述命令，需要参考[Paddle官方文档-多版本whl包列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)
+选择相应的GPU环境的url链接并进行安装，例如Cuda 10.1的Python3.6用户，请选择表格当中的`cp36-cp36m`和`cuda10.1-cudnn7-mkl-gcc8.2-avx-trt6.0.1.5`对应的url，复制下来并执行
+```
+pip3 install https://paddle-wheel.bj.bcebos.com/with-trt/2.1.0-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-2.1.0.post101-cp36-cp36m-linux_x86_64.whl
+```
+由于默认的`paddlepaddle-gpu==2.1.0`是Cuda 10.2，并没有联编TensorRT，因此如果需要和在`paddlepaddle-gpu`上使用TensorRT，需要在上述多版本whl包列表当中，找到`cuda10.2-cudnn8.0-trt7.1.3`，下载对应的Python版本。更多信息请参考[如何使用TensorRT?](doc/TENSOR_RT_CN.md)。
+如果是其他环境和Python版本，请在表格中找到对应的链接并用pip安装。
+对于**Windows 10 用户**，请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)。
+<h2 align="center">快速开始示例</h2>
+这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的，而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程，请参考前文的AiStudio教程。
+<h3 align="center">波士顿房价预测</h3>
+进入到Serving的git目录下，进入到`fit_a_line`例子
+``` shell
+cd Serving/python/examples/fit_a_line
+sh get_data.sh
+```
+Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
+<h3 align="center">RPC服务</h3>
+用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发，但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。
+``` shell
+python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
+```
+<center>
+| Argument                                       | Type | Default | Description                                           |
+| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
+| `thread`                                       | int  | `2`     | Number of brpc service thread                         |
+| `runtime_thread_num`                           | int[]| `0`     | Thread Number for each model in asynchronous mode     |
+| `batch_infer_size`                             | int[]| `32`    | Batch Number for each model in asynchronous mode      |
+| `gpu_ids`                                      | str[]| `"-1"`  | Gpu card id for each model                            |
+| `port`                                         | int  | `9292`  | Exposed port of current service to users              |
+| `model`                                        | str[]| `""`    | Path of paddle model directory to be served           |
+| `mem_optim_off`                                | -    | -       | Disable memory / graphic memory optimization          |
+| `ir_optim`                                     | bool | False   | Enable analysis and optimization of calculation graph |
+| `use_mkl` (Only for cpu version)               | -    | -       | Run inference with MKL                                |
+| `use_trt` (Only for trt version)               | -    | -       | Run inference with TensorRT                           |
+| `use_lite` (Only for Intel x86 CPU or ARM CPU) | -    | -       | Run PaddleLite inference                              |
+| `use_xpu`                                      | -    | -       | Run PaddleLite inference with Baidu Kunlun XPU        |
+| `precision`                                    | str  | FP32    | Precision Mode, support FP32, FP16, INT8              |
+| `use_calib`                                    | bool | False   | Use TRT int8 calibration                              |
+| `gpu_multi_stream`                             | bool | False   | EnableGpuMultiStream to get larger QPS                |
+#### 异步模型的说明
+    异步模式适用于1、请求数量非常大的情况，2、多模型串联，想要分别指定每个模型的并发数的情况。
+    异步模式有助于提高Service服务的吞吐（QPS），但对于单次请求而言，时延会有少量增加。
+    异步模式中，每个模型会启动您指定个数的N个线程，每个线程中包含一个模型实例，换句话说每个模型相当于包含N个线程的线程池，从线程池的任务队列中取任务来执行。
+    异步模式中，各个RPC Server的线程只负责将Request请求放入模型线程池的任务队列中，等任务被执行完毕后，再从任务队列中取出已完成的任务。
+    上表中通过 --thread 10 指定的是RPC Server的线程数量，默认值为2，--runtime_thread_num 指定的是各个模型的线程池中线程数N，默认值为0，表示不使用异步模式。
+    --batch_infer_size 指定的各个模型的batch数量，默认值为32，该参数只有当--runtime_thread_num不为0时才生效。
+#### 当您的某个模型想使用多张GPU卡部署时.
+python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0,1,2
+#### 当您的一个服务包含两个模型部署时.
+python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292
+#### 当您的一个服务包含两个模型，且每个模型都需要指定多张GPU卡部署时.
+python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2
+#### 当您的一个服务包含两个模型，且每个模型都需要指定多张GPU卡，且需要异步模式每个模型指定不同的并发数时.
+python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --runtime_thread_num 4 8
 <center class="half">
  <img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="280"/> <img src="https://github.com/PaddlePaddle/PaddleDetection/raw/release/2.3/docs/images/road554.png" width="160"/>
  <img src="https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/recognition.gif" width="213"/>
 </center>
+``` python
+# A user can visit rpc service through paddle_serving_client API
+from paddle_serving_client import Client
+client = Client()
+client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
+        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
+print(fetch_map)
+```
+在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。
+<h3 align="center">HTTP服务</h3>
+用户也可以将数据格式处理逻辑放在服务器端进行，这样就可以直接用curl去访问服务，参考如下案例，在目录`python/examples/fit_a_line`.
+```
+python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
+```
+客户端输入
+```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
+返回结果
+```
+{"result":{"price":[[18.901151657104492]]}}
+```
+<h3 align="center">Pipeline服务</h3>
+Paddle Serving提供业界领先的多模型串联服务，强力支持各大公司实际运行的业务场景，参考 [OCR文字识别案例](python/examples/pipeline/ocr)，在目录`python/examples/pipeline/ocr`
+我们先获取两个模型
+```
+python3 -m paddle_serving_app.package --get_model ocr_rec
+tar -xzvf ocr_rec.tar.gz
+python3 -m paddle_serving_app.package --get_model ocr_det
+tar -xzvf ocr_det.tar.gz
+```
+然后启动服务端程序，将两个串联的模型作为一个整体的服务。
+```
+python3 web_service.py
+```
+最终使用http的方式请求
+```
+python3 pipeline_http_client.py
+```
+也支持rpc的方式
+```
+python3 pipeline_rpc_client.py
+```
+输出
+```
+{'err_no': 0, 'err_msg': '', 'key': ['res'], 'value': ["['土地整治与土壤修复研究中心', '华南农业大学1素图']"]}
+```
+<h3 align="center">关闭Serving/Pipeline服务</h3>
+**方式一** ：Ctrl+C关停服务
+**方式二** ：在启动Serving/Pipeline服务路径或者环境变量SERVING_HOME路径下(该路径下存在文件ProcessInfo.json)
+```
+python3 -m paddle_serving_server.serve stop
+```
+<h2 align="center">文档</h2>
+### 新手教程
+- [怎样保存用于Paddle Serving的模型？](doc/SAVE_CN.md)
+- [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
+- [Paddle Serving示例合辑](python/examples)
+- [如何在Paddle Serving处理常见数据类型](doc/PROCESS_DATA.md)
+- [如何在Serving上处理level of details(LOD)?](doc/LOD_CN.md)
+### 开发者教程
+- [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
+- [如何编译PaddleServing?](doc/COMPILE_CN.md)
+- [如何开发Pipeline?](doc/PIPELINE_SERVING_CN.md)
+- [如何在K8S集群上部署Paddle Serving?](doc/PADDLE_SERVING_ON_KUBERNETES.md)
+- [如何在Paddle Serving上部署安全网关?](doc/SERVING_AUTH_DOCKER.md)
+- [如何开发Pipeline?](doc/PIPELINE_SERVING_CN.md)
+- [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md)
+- [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md)
+- [如何使用TensorRT?](doc/TENSOR_RT_CN.md)
+### 关于Paddle Serving性能
+- [如何测试Paddle Serving性能？](python/examples/util/)
+- [如何优化性能?](doc/PERFORMANCE_OPTIM_CN.md)
+- [在一张GPU上启动多个预测服务](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
+- [GPU版Benchmarks](doc/BENCHMARKING_GPU.md)
+### 设计文档
+- [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
+>>>>>>> a2b9ec5167af0ecd33df9999a2bf19e109dc3bc7
 ## 社区

--- a/doc/DOCKER_IMAGES.md
+++ b/doc/DOCKER_IMAGES.md
@@ -37,7 +37,7 @@ If you want to customize your Serving based on source code, use the version with
 |              GPU (cuda10.1-cudnn7-tensorRT6-gcc54) development               | Ubuntu16 | latest-cuda10.1-cudnn7-gcc54-devel(not ready) | [Dockerfile.cuda10.1-cudnn7-gcc54.devel](../tools/Dockerfile.cuda10.1-cudnn7-gcc54.devel) |
 |              GPU (cuda10.1-cudnn7-tensorRT6) development               | Ubuntu16 | latest-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
 |              GPU (cuda10.2-cudnn8-tensorRT7) development               | Ubuntu16 | latest-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
-|              GPU (cuda11-cudnn8-tensorRT7) development               | Ubuntu18 | latest-cuda11-cudnn8-devel | [Dockerfile.cuda11-cudnn8.devel](../tools/Dockerfile.cuda11-cudnn8.devel) |
+|              GPU (cuda11.2-cudnn8-tensorRT7) development               | Ubuntu18 | latest-cuda11.2-cudnn8-devel | [Dockerfile.cuda11.2-cudnn8.devel](../tools/Dockerfile.cuda11.2-cudnn8.devel) |
 **Java Client:**
 ```

--- a/doc/HTTP_SERVICE_CN.md
+++ b/doc/HTTP_SERVICE_CN.md
@@ -48,7 +48,7 @@ python3.6 -m paddle_serving_server.serve --model uci_housing_model --thread 10 -
 Python的HttpClient使用示例见[`python/examples/fit_a_line/test_httpclient.py`](../python/examples/fit_a_line/test_httpclient.py)，接口详见[`python/paddle_serving_client/httpclient.py`](../python/paddle_serving_client/httpclient.py)。
-Java的HttpClient使用示例见[`java/examples/src/main/java/PaddleServingClientExample.java`](../java/examples/src/main/java/PaddleServingClientExample.java)接口详见[`java/src/main/java/io/paddle/serving/client/HttpClient.java`](../java/src/main/java/io/paddle/serving/client/HttpClient.java)。
+Java的HttpClient使用示例见[`java/examples/src/main/java/PaddleServingClientExample.java`](../java/examples/src/main/java/PaddleServingClientExample.java)接口详见[`java/src/main/java/io/paddle/serving/client/Client.java`](../java/src/main/java/io/paddle/serving/client/Client.java)。
 如果不能满足您的需求，您也可以在此基础上添加一些功能。

--- a/doc/Model_Zoo_CN.md
+++ b/doc/Model_Zoo_CN.md
+# Model Zoo
+本页面展示了Paddle Serving目前支持的预训练模型以及下载链接
+若您想为Paddle Serving提供新的模型，可通过[pull request](https://github.com/PaddlePaddle/Serving/pulls)提交PR
+特别感谢[Padddle wholechain](https://www.paddlepaddle.org.cn/wholechain)以及[PaddleHub](https://www.paddlepaddle.org.cn/hub)为Paddle Serving提供的部分预训练模型
+| 模型 | 类型 | 部署方式 | 下载 | 服务端 |
+| --- | --- | --- | ---- | --- |
+| resnet_v2_50_imagenet | PaddleClas | [单模型](../examples/PaddleClas/resnet_v2_50)</br>[多模型](../examples/pipeline/PaddleClas/ResNet_V2_50) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/resnet_v2_50_imagenet.tar.gz) | Pipeline Serving, C++ Serving|
+| mobilenet_v2_imagenet | PaddleClas | [单模型](../examples/PaddleClas/mobilenet) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/mobilenet_v2_imagenet.tar.gz) |C++ Serving|
+| resnet50_vd | PaddleClas | [单模型](../examples/PaddleClas/imagenet)</br>[多模型](../examples/pipeline/PaddleClas/ResNet50_vd) | [.tar.gz](https://paddle-serving.bj.bcebos.com/ResNet50_vd.tar) |Pipeline Serving, C++ Serving|
+| ResNet50_vd_KL | PaddleClas | [多模型](../examples/pipeline/PaddleClas/ResNet50_vd_KL) | [.tar](https://paddle-serving.bj.bcebos.com/model/ResNet50_vd_KL.tar) |Pipeline Serving|
+| DarkNet53 | PaddleClas | [多模型](../examples/pipeline/PaddleClas/DarkNet53) | [.tar](https://paddle-serving.bj.bcebos.com/model/DarkNet53.tar) |Pipeline Serving|
+| MobileNetV1 | PaddleClas | [多模型](../examples/pipeline/PaddleClas/MobileNetV1) | [.tar](https://paddle-serving.bj.bcebos.com/model/MobileNetV1.tar) |Pipeline Serving|
+| MobileNetV2 | PaddleClas | [多模型](../examples/pipeline/PaddleClas/MobileNetV2) | [.tar](https://paddle-serving.bj.bcebos.com/model/MobileNetV2.tar) |Pipeline Serving|
+| MobileNetV3_large_x1_0 | PaddleClas | [多模型](../examples/pipeline/PaddleClas/MobileNetV3_large_x1_0) | [.tar](https://paddle-serving.bj.bcebos.com/model/MobileNetV3_large_x1_0.tar) |Pipeline Serving|
+| ResNet50_vd_FPGM | PaddleClas | [多模型](../examples/pipeline/PaddleClas/ResNet50_vd_FPGM) | [.tar](https://paddle-serving.bj.bcebos.com/model/ResNet50_vd_FPGM.tar) |Pipeline Serving|
+| ResNet50_vd_PACT | PaddleClas | [多模型](../examples/pipeline/PaddleClas/ResNet50_vd_PACT) | [.tar](https://paddle-serving.bj.bcebos.com/model/ResNet50_vd_PACT.tar) |Pipeline Serving|
+| ResNeXt101_vd_64x4d | PaddleClas | [多模型](../examples/pipeline/PaddleClas/ResNeXt101_vd_64x4d) | [.tar](https://paddle-serving.bj.bcebos.com/model/ResNeXt101_vd_64x4d.tar) |Pipeline Serving|
+| HRNet_W18_C | PaddleClas | [多模型](../examples/pipeline/PaddleClas/HRNet_W18_C) | [.tar](https://paddle-serving.bj.bcebos.com/model/HRNet_W18_C.tar) |Pipeline Serving|
+| ShuffleNetV2_x1_0 | PaddleClas | [多模型](../examples/pipeline/PaddleClas/ShuffleNetV2_x1_0) | [.tar](https://paddle-serving.bj.bcebos.com/model/ShuffleNetV2_x1_0.tar) |Pipeline Serving|
+| bert_chinese_L-12_H-768_A-12 | PaddleNLP | [单模型](../examples/PaddleNLP/bert)</br>[多模型](../examples/pipeline/bert) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz) |Pipeline Serving, C++ Serving|
+| senta_bilstm | PaddleNLP | [单模型](../examples/PaddleNLP/senta) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/senta_bilstm.tar.gz) |C++ Serving|
+| lac | PaddleNLP | [单模型](../examples/PaddleNLP/lac) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/lac.tar.gz) | C++ Serving|
+| transformer | PaddleNLP | [多模型](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/examples/machine_translation/transformer/deploy/serving/README.md) | [model](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/machine_translation/transformer) | Pipeline Serving|
+| criteo_ctr | PaddleRec | [单模型](../examples/PaddleRec/criteo_ctr) | [.tar.gz](https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz) | C++ Serving |
+| criteo_ctr_with_cube | PaddleRec | [单模型](../examples/PaddleRec/criteo_ctr_with_cube) | [.tar.gz](https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz) |C++ Serving|
+| wide&deep | PaddleRec | [单模型](https://github.com/PaddlePaddle/PaddleRec/blob/release/2.1.0/doc/serving.md) | [model](https://github.com/PaddlePaddle/PaddleRec/blob/release/2.1.0/models/rank/wide_deep/README.md) |C++ Serving|
+| blazeface | PaddleDetection | [单模型](../examples/PaddleDetection/blazeface) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ObjectDetection/blazeface.tar.gz) |C++ Serving|
+| cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco | PaddleDetection | [单模型](../examples/PaddleDetection/cascade_rcnn) | [.tar.gz](https://paddle-serving.bj.bcebos.com/pddet_demo/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco_serving.tar.gz) |C++ Serving|
+| yolov4 | PaddleDetection | [单模型](../examples/PaddleDetection/yolov4) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ObjectDetection/yolov4.tar.gz) |C++ Serving|
+| faster_rcnn_hrnetv2p_w18_1x | PaddleDetection | [单模型](../examples/PaddleDetection/faster_rcnn_hrnetv2p_w18_1x) | [.tar.gz](https://paddle-serving.bj.bcebos.com/pddet_demo/faster_rcnn_hrnetv2p_w18_1x.tar.gz) |C++ Serving|
+| fcos_dcn_r50_fpn_1x_coco | PaddleDetection | [单模型](../examples/PaddleDetection/fcos_dcn_r50_fpn_1x_coco) | [.tar.gz](https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/fcos_dcn_r50_fpn_1x_coco.tar) |C++ Serving|
+| ssd_vgg16_300_240e_voc | PaddleDetection |  [单模型](../examples/PaddleDetection/ssd_vgg16_300_240e_voc) | [.tar](https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/ssd_vgg16_300_240e_voc.tar) |C++ Serving |
+| yolov3_darknet53_270e_coco  | PaddleDetection | [单模型](../examples/PaddleDetection/yolov3_darknet53_270e_coco)</br>[多模型](../examples/pipeline/PaddleDetection/yolov3) | [.tar](https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/yolov3_darknet53_270e_coco.tar) |Pipeline Serving, C++ Serving |
+| faster_rcnn_r50_fpn_1x_coco | PaddleDetection | [单模型](../examples/PaddleDetection/faster_rcnn_r50_fpn_1x_coco)</br>[多模型](../examples/pipeline/PaddleDetection/faster_rcnn) | [.tar](https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar) |Pipeline Serving, C++ Serving |
+| ppyolo_r50vd_dcn_1x_coco | PaddleDetection |  [单模型](../examples/PaddleDetection/ppyolo_r50vd_dcn_1x_coco) | [.tar](https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/ppyolo_r50vd_dcn_1x_coco.tar) |C++ Serving |
+| ppyolo_mbv3_large_coco | PaddleDetection |  [多模型](../examples/pipeline/PaddleDetection/ppyolo_mbv3) | [.tar](https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/ppyolo_mbv3_large_coco.tar) |Pipeline Serving |
+| ttfnet_darknet53_1x_coco | PaddleDetection | [单模型](../examples/PaddleDetection/ttfnet_darknet53_1x_coco) | [.tar](https://paddle-serving.bj.bcebos.com/pddet_demo/ttfnet_darknet53_1x_coco.tar) |C++ Serving |
+| YOLOv3-DarkNet | PaddleDetection | [单模型](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/deploy/serving) | [.pdparams](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams)</br>[.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/yolov3/yolov3_darknet53_270e_coco.yml) |C++ Serving |
+| ocr_rec | PaddleOCR | [单模型](../examples/PaddleOCR/ocr_rec_det)</br>[多模型](../examples/pipeline/ocr) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/OCR/ocr_rec.tar.gz) |Pipeline Serving, C++ Serving |
+| ocr_det | PaddleOCR | [单模型](../examples/PaddleOCR/ocr_rec_det)</br>[多模型](../examples/pipeline/ocr) | [.tar.gz](https://paddle-serving.bj.bcebos.com/ocr/ocr_det.tar.gz) |Pipeline Serving, C++ Serving |
+| ch_ppocr_mobile_v2.0_det | PaddleOCR | [多模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/deploy/pdserving/README.md) | [model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar)</br>[.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/configs/det/ch_ppocr_v2.0/ch_det_mv3_db_v2.0.yml) |Pipeline Serving |
+| ch_ppocr_server_v2.0_det | PaddleOCR | [多模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/deploy/pdserving/README.md) | [model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar)</br>[.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml) |Pipeline Serving |
+| ch_ppocr_mobile_v2.0_rec | PaddleOCR | [多模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/deploy/pdserving/README.md) | [model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar)</br>[.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml) |Pipeline Serving |
+| ch_ppocr_server_v2.0_rec | PaddleOCR | [多模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/deploy/pdserving/README.md) | [model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar)</br>[.yml](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/configs/rec/ch_ppocr_v2.0/rec_chinese_common_train_v2.0.yml) |Pipeline Serving |
+| ch_ppocr_mobile_v2.0 | PaddleOCR | [多模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/deploy/pdserving/README.md) | [model](https://github.com/PaddlePaddle/PaddleOCR) |Pipeline Serving |
+| ch_ppocr_server_v2.0 | PaddleOCR | [多模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/deploy/pdserving/README.md) | [model](https://github.com/PaddlePaddle/PaddleOCR) |Pipeline Serving |
+| deeplabv3 | PaddleSeg | [单模型](../examples/PaddleSeg/deeplabv3) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageSegmentation/deeplabv3.tar.gz) | C++ Serving |
+| unet | PaddleSeg | [单模型](../examples/PaddleSeg/unet_for_image_seg) | [.tar.gz](https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageSegmentation/unet.tar.gz) |C++ Serving |
+- 注意事项
+  - 多模型部署示例均在pipeline文件夹下
+  - 单模型采用C++ Serving，多模型采用Pipeline Serving
+- 请参考 [example](../examples) 查看详情
+- 更多模型请参考[wholechain](https://www.paddlepaddle.org.cn/wholechain)
--- a/python/examples/detection/README.md
+++ b/python/examples/detection/README.md
@@ -4,11 +4,11 @@
 ### Introduction
-PaddleDetection flying paddle target detection development kit is designed to help developers complete the whole development process of detection model formation, training, optimization and deployment faster and better. For details, see [Github](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph)
+PaddleDetection flying paddle target detection development kit is designed to help developers complete the whole development process of detection model formation, training, optimization and deployment faster and better. For details, see [Github](https://github.com/PaddlePaddle/PaddleDetection/tree/master)
 This article mainly introduces the deployment of Paddle Detection's dynamic graph model on Serving.
-Paddle Detection provides a large number of [Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/docs/MODEL_ZOO_cn.md), these model libraries can be used in Paddle Serving with export tools Model. For the export tutorial, please refer to [Paddle Detection Export Model Tutorial (Simplified Chinese)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/deploy/EXPORT_MODEL.md).
+Paddle Detection provides a large number of [Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/MODEL_ZOO_cn.md), these model libraries can be used in Paddle Serving with export tools Model. For the export tutorial, please refer to [Paddle Detection Export Model Tutorial (Simplified Chinese)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/EXPORT_MODEL.md).
 ### Serving example
 Several examples of PaddleDetection models used in Serving are given in this folder

--- a/python/examples/detection/README_CN.md
+++ b/python/examples/detection/README_CN.md
@@ -4,13 +4,13 @@
 ### 简介
-PaddleDetection飞桨目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。详情参见[Github](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph)
+PaddleDetection飞桨目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。详情参见[Github](https://github.com/PaddlePaddle/PaddleDetection/tree/master)
 本文主要是介绍Paddle Detection的动态图模型在Serving上的部署。
 ### 导出模型
-Paddle Detection提供了大量的[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/docs/MODEL_ZOO_cn.md), 这些模型库配合导出工具都可以得到可以用于Paddle Serving的模型。导出教程参见[Paddle Detection模型导出教程](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/deploy/EXPORT_MODEL.md)。
+Paddle Detection提供了大量的[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/MODEL_ZOO_cn.md), 这些模型库配合导出工具都可以得到可以用于Paddle Serving的模型。导出教程参见[Paddle Detection模型导出教程](https://github.com/PaddlePaddle/PaddleDetection/blob/master/deploy/EXPORT_MODEL.md)。
 ### Serving示例
 本文件夹下给出了多个PaddleDetection模型用于Serving的范例

--- a/python/examples/pipeline/PaddleDetection/faster_rcnn/config.yml
+++ b/python/examples/pipeline/PaddleDetection/faster_rcnn/config.yml
 dag:
+  #op资源类型, True, 为线程模型；False，为进程模型
  is_thread_op: false
+  #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
  tracer:
    interval_s: 30
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
 http_port: 18082
 op:
  faster_rcnn:
+    #并发数，is_thread_op=True时，为线程并发；否则为进程并发
    concurrency: 2
    local_service_conf:
+      #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
      client_type: local_predictor
+      # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
      device_type: 1
+      #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
      devices: '2'
+      #Fetch结果列表，以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
      fetch_list:
      - save_infer_model/scale_0.tmp_1
+      #模型路径
      model_config: serving_server/
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
 rpc_port: 9998
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+#当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
 worker_num: 20
--- a/python/examples/pipeline/PaddleDetection/ppyolo_mbv3/README_CN.md
+++ b/python/examples/pipeline/PaddleDetection/ppyolo_mbv3/README_CN.md
+# PPYOLO model on Pipeline Paddle Serving
+(简体中文|[English](./README_CN.md))
+### 获取模型
+```
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/ppyolo_mbv3_large_coco.tar
+```
+### 启动服务
+```
+tar xf ppyolo_mbv3_large_coco.tar
+python3 web_service.py
+```
+### 执行预测
+```
+python3 pipeline_http_client.py
+```
--- a/python/examples/pipeline/PaddleDetection/ppyolo_mbv3/config.yml
+++ b/python/examples/pipeline/PaddleDetection/ppyolo_mbv3/config.yml
 dag:
+  #op资源类型, True, 为线程模型；False，为进程模型
  is_thread_op: false
+  #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
  tracer:
    interval_s: 30
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
 http_port: 18082
 op:
  ppyolo_mbv3:
+    #并发数，is_thread_op=True时，为线程并发；否则为进程并发
    concurrency: 10
    local_service_conf:
+      #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
      client_type: local_predictor
+      # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
      device_type: 1
+      #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
      devices: '2'
+      #Fetch结果列表，以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
      fetch_list:
      - save_infer_model/scale_0.tmp_1
+      #模型路径
      model_config: serving_server/
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
 rpc_port: 9998
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+#当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
 worker_num: 20
--- a/python/examples/pipeline/PaddleDetection/yolov3/README_CN.md
+++ b/python/examples/pipeline/PaddleDetection/yolov3/README_CN.md
+# YOLOv3 model on Pipeline Paddle Serving
+(简体中文|[English](./README.md))
+### 获取模型
+```
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/yolov3_darknet53_270e_coco.tar
+```
+### 启动 WebService
+```
+tar xf yolov3_darknet53_270e_coco.tar
+python3 web_service.py
+```
+### 执行预测
+```
+python3 pipeline_http_client.py
+```
--- a/python/examples/pipeline/PaddleDetection/yolov3/config.yml
+++ b/python/examples/pipeline/PaddleDetection/yolov3/config.yml
 dag:
+  #op资源类型, True, 为线程模型；False，为进程模型
  is_thread_op: false
+  #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
  tracer:
    interval_s: 30
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
 http_port: 18082
 op:
  yolov3:
+    #并发数，is_thread_op=True时，为线程并发；否则为进程并发
    concurrency: 10
    local_service_conf:
+      #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
      client_type: local_predictor
+      # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
      device_type: 1
+      #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
      devices: '2'
+      #Fetch结果列表，以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
      fetch_list:
      - save_infer_model/scale_0.tmp_1
+      #模型路径
      model_config: serving_server/
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
 rpc_port: 9998
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+#当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
 worker_num: 20
--- a/python/examples/pipeline/bert/config.yml
+++ b/python/examples/pipeline/bert/config.yml
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
 worker_num: 20
+#build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
+build_dag_each_worker: false
 dag:
+  #op资源类型, True, 为线程模型；False，为进程模型
  is_thread_op: false
+  #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
  tracer:
    interval_s: 10
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
 http_port: 18082
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
 rpc_port: 9998
 op:
  bert:
+    #并发数，is_thread_op=True时，为线程并发；否则为进程并发
    concurrency: 2
+    #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
    local_service_conf:
+      #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
      client_type: local_predictor
+      # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
      device_type: 1
+      #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
      devices: '2'
+      #Fetch结果列表，以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
      fetch_list:
+      #bert模型路径
      model_config: bert_seq128_model/
--- a/python/examples/pipeline/ocr/config.yml
+++ b/python/examples/pipeline/ocr/config.yml
@@ -38,6 +38,9 @@ op:
            #Fetch结果列表，以client_config中fetch_var的alias_name为准
            fetch_list: ["concat_1.tmp_0"]
+            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 0
            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
            devices: ""
@@ -71,6 +74,8 @@ op:
            #Fetch结果列表，以client_config中fetch_var的alias_name为准
            fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] 
+            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 0
            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
            devices: ""

--- a/python/paddle_serving_app/README.md
+++ b/python/paddle_serving_app/README.md
@@ -67,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
    - words（st ）：Original text input.
    - crf_decode（np.array）：CRF code predicted by model.
-  [example](../examples/lac/lac_web_service.py)
+  [example](../examples/lac/lac_http_client.py)
 - class SentaReader

--- a/python/paddle_serving_app/README_CN.md
+++ b/python/paddle_serving_app/README_CN.md
@@ -60,7 +60,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的
    - words（str）：原始文本
    - crf_decode（np.array）：模型预测结果中的CRF编码
-  [参考示例](../examples/lac/lac_web_service.py)
+  [参考示例](../examples/lac/lac_http_client.py)
 - class SentaReader

--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -109,7 +109,12 @@ def is_gpu_mode(unformatted_gpus):
 def serve_args():
    parser = argparse.ArgumentParser("serve")
-    parser.add_argument("server", type=str, default="start",nargs="?", help="stop or start PaddleServing")
+    parser.add_argument(
+        "server",
+        type=str,
+        default="start",
+        nargs="?",
+        help="stop or start PaddleServing")
    parser.add_argument(
        "--thread",
        type=int,
@@ -123,9 +128,13 @@ def serve_args():
    parser.add_argument(
        "--gpu_ids", type=str, default="", nargs="+", help="gpu ids")
    parser.add_argument(
-        "--op_num", type=int, default=0, nargs="+", help="Number of each op")
+        "--runtime_thread_num",
+        type=int,
+        default=0,
+        nargs="+",
+        help="Number of each op")
    parser.add_argument(
-        "--op_max_batch",
+        "--batch_infer_size",
        type=int,
        default=32,
        nargs="+",
@@ -251,11 +260,11 @@ def start_gpu_card_model(gpu_mode, port, args):  # pylint: disable=doc-string-mi
    if args.gpu_multi_stream and device == "gpu":
        server.set_gpu_multi_stream()
-    if args.op_num:
+    if args.runtime_thread_num:
-        server.set_op_num(args.op_num)
+        server.set_runtime_thread_num(args.runtime_thread_num)
-    if args.op_max_batch:
+    if args.batch_infer_size:
-        server.set_op_max_batch(args.op_max_batch)
+        server.set_batch_infer_size(args.batch_infer_size)
    if args.use_lite:
        server.set_lite()
@@ -370,7 +379,7 @@ class MainService(BaseHTTPRequestHandler):
        self.wfile.write(json.dumps(response).encode())
-def stop_serving(command : str, port : int = None):
+def stop_serving(command: str, port: int=None):
    '''
    Stop PaddleServing by port.
@@ -400,7 +409,7 @@ def stop_serving(command : str, port : int = None):
        start_time = info["start_time"]
        if port is not None:
            if port in storedPort:
-                kill_stop_process_by_pid(command ,pid)
+                kill_stop_process_by_pid(command, pid)
                infoList.remove(info)
                if len(infoList):
                    with open(filepath, "w") as fp:
@@ -410,17 +419,18 @@ def stop_serving(command : str, port : int = None):
                return True
            else:
                if lastInfo == info:
-                     raise ValueError(
+                    raise ValueError(
-                         "Please confirm the port [%s] you specified is correct." %
+                        "Please confirm the port [%s] you specified is correct."
-                         port)
+                        % port)
                else:
                    pass
        else:
-            kill_stop_process_by_pid(command ,pid)
+            kill_stop_process_by_pid(command, pid)
            if lastInfo == info:
                os.remove(filepath)
    return True
 if __name__ == "__main__":
    # args.device is not used at all.
    # just keep the interface.
@@ -436,7 +446,7 @@ if __name__ == "__main__":
            os._exit(0)
        else:
            os._exit(-1)
    for single_model_config in args.model:
        if os.path.isdir(single_model_config):
            pass

--- a/python/paddle_serving_server/server.py
+++ b/python/paddle_serving_server/server.py
@@ -82,8 +82,8 @@ class Server(object):
        self.mkl_flag = False
        self.device = "cpu"
        self.gpuid = []
-        self.op_num = [0]
+        self.runtime_thread_num = [0]
-        self.op_max_batch = [32]
+        self.batch_infer_size = [32]
        self.use_trt = False
        self.gpu_multi_stream = False
        self.use_lite = False
@@ -171,11 +171,11 @@ class Server(object):
    def set_gpuid(self, gpuid):
        self.gpuid = format_gpu_to_strlist(gpuid)
-    def set_op_num(self, op_num):
+    def set_runtime_thread_num(self, runtime_thread_num):
-        self.op_num = op_num
+        self.runtime_thread_num = runtime_thread_num
-    def set_op_max_batch(self, op_max_batch):
+    def set_batch_infer_size(self, batch_infer_size):
-        self.op_max_batch = op_max_batch
+        self.batch_infer_size = batch_infer_size
    def set_trt(self):
        self.use_trt = True
@@ -205,15 +205,15 @@ class Server(object):
            else:
                self.gpuid = ["-1"]
-        if isinstance(self.op_num, int):
+        if isinstance(self.runtime_thread_num, int):
-            self.op_num = [self.op_num]
+            self.runtime_thread_num = [self.runtime_thread_num]
-        if len(self.op_num) == 0:
+        if len(self.runtime_thread_num) == 0:
-            self.op_num.append(0)
+            self.runtime_thread_num.append(0)
-        if isinstance(self.op_max_batch, int):
+        if isinstance(self.batch_infer_size, int):
-            self.op_max_batch = [self.op_max_batch]
+            self.batch_infer_size = [self.batch_infer_size]
-        if len(self.op_max_batch) == 0:
+        if len(self.batch_infer_size) == 0:
-            self.op_max_batch.append(32)
+            self.batch_infer_size.append(32)
        index = 0
@@ -224,9 +224,10 @@ class Server(object):
            engine.reloadable_meta = model_config_path + "/fluid_time_file"
            os.system("touch {}".format(engine.reloadable_meta))
            engine.reloadable_type = "timestamp_ne"
-            engine.runtime_thread_num = self.op_num[index % len(self.op_num)]
+            engine.runtime_thread_num = self.runtime_thread_num[index % len(
-            engine.batch_infer_size = self.op_max_batch[index %
+                self.runtime_thread_num)]
-                                                        len(self.op_max_batch)]
+            engine.batch_infer_size = self.batch_infer_size[index % len(
+                self.batch_infer_size)]
            engine.enable_overrun = False
            engine.allow_split_request = True

--- a/python/paddle_serving_server/util.py
+++ b/python/paddle_serving_server/util.py
@@ -91,6 +91,7 @@ def dump_pid_file(portList, model):
       dump_pid_file([9494, 10082], 'serve')
    '''
    pid = os.getpid()
+    gid = os.getpgid(pid)
    pidInfoList = []
    filepath = os.path.join(CONF_HOME, "ProcessInfo.json")
    if os.path.exists(filepath):
@@ -105,7 +106,7 @@ def dump_pid_file(portList, model):
                        pidInfoList.remove(info)
    with open(filepath, "w") as fp:
-        info ={"pid": pid, "port" : portList, "model" : str(model), "start_time" : time.time()}
+        info ={"pid": gid, "port" : portList, "model" : str(model), "start_time" : time.time()}
        pidInfoList.append(info)
        json.dump(pidInfoList, fp)

--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
@@ -133,8 +133,8 @@ class WebService(object):
                            use_calib=False,
                            use_trt=False,
                            gpu_multi_stream=False,
-                            op_num=None,
+                            runtime_thread_num=None,
-                            op_max_batch=None):
+                            batch_infer_size=None):
        device = "cpu"
        server = Server()
@@ -187,11 +187,11 @@ class WebService(object):
        if gpu_multi_stream and device == "gpu":
            server.set_gpu_multi_stream()
-        if op_num:
+        if runtime_thread_num:
-            server.set_op_num(op_num)
+            server.set_runtime_thread_num(runtime_thread_num)
-        if op_max_batch:
+        if batch_infer_size:
-            server.set_op_max_batch(op_max_batch)
+            server.set_batch_infer_size(batch_infer_size)
        if use_lite:
            server.set_lite()
@@ -225,8 +225,8 @@ class WebService(object):
                use_calib=self.use_calib,
                use_trt=self.use_trt,
                gpu_multi_stream=self.gpu_multi_stream,
-                op_num=self.op_num,
+                runtime_thread_num=self.runtime_thread_num,
-                op_max_batch=self.op_max_batch))
+                batch_infer_size=self.batch_infer_size))
    def prepare_server(self,
                       workdir,
@@ -241,8 +241,8 @@ class WebService(object):
                       mem_optim=True,
                       use_trt=False,
                       gpu_multi_stream=False,
-                       op_num=None,
+                       runtime_thread_num=None,
-                       op_max_batch=None,
+                       batch_infer_size=None,
                       gpuid=None):
        print("This API will be deprecated later. Please do not use it")
        self.workdir = workdir
@@ -259,9 +259,9 @@ class WebService(object):
        self.port_list = []
        self.use_trt = use_trt
        self.gpu_multi_stream = gpu_multi_stream
-        self.op_num = op_num
+        self.runtime_thread_num = runtime_thread_num
-        self.op_max_batch = op_max_batch
+        self.batch_infer_size = batch_infer_size
        # record port and pid info for stopping process
        dump_pid_file([self.port], "web_service")
        # if gpuid != None, we will use gpuid first.

--- a/tools/Dockerfile.cuda10.1-cudnn7.devel
+++ b/tools/Dockerfile.cuda10.1-cudnn7.devel
@@ -83,7 +83,7 @@ RUN ln -sf /usr/local/bin/python3.6 /usr/local/bin/python3 && ln -sf /usr/local/
 RUN rm -r /root/python_build
 # Install Go and glide
-RUN wget -qO- https://dl.google.com/go/go1.14.linux-amd64.tar.gz | \
+RUN wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | \
    tar -xz -C /usr/local && \
    mkdir /root/go && \
    mkdir /root/go/bin && \

--- a/tools/Dockerfile.cuda10.2-cudnn7.devel
+++ b/tools/Dockerfile.cuda10.2-cudnn7.devel
@@ -83,7 +83,7 @@ RUN ln -sf /usr/local/bin/python3.6 /usr/local/bin/python3 && ln -sf /usr/local/
 RUN rm -r /root/python_build
 # Install Go and glide
-RUN wget -qO- https://dl.google.com/go/go1.14.linux-amd64.tar.gz | \
+RUN wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | \
    tar -xz -C /usr/local && \
    mkdir /root/go && \
    mkdir /root/go/bin && \

--- a/tools/Dockerfile.cuda10.2-cudnn8.devel
+++ b/tools/Dockerfile.cuda10.2-cudnn8.devel
@@ -83,7 +83,7 @@ RUN ln -sf /usr/local/bin/python3.6 /usr/local/bin/python3 && ln -sf /usr/local/
 RUN rm -r /root/python_build
 # Install Go and glide
-RUN wget -qO- https://dl.google.com/go/go1.14.linux-amd64.tar.gz | \
+RUN wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | \
    tar -xz -C /usr/local && \
    mkdir /root/go && \
    mkdir /root/go/bin && \

--- a/tools/Dockerfile.cuda11.2-cudnn8.devel
+++ b/tools/Dockerfile.cuda11.2-cudnn8.devel
+# A image for building paddle binaries
+# Use cuda devel base image for both cpu and gpu environment
+# When you modify it, please be aware of cudnn-runtime version
+FROM nvidia/cuda:11.2.0-cudnn8-devel-ubuntu16.04
+MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com>
+# ENV variables
+ARG WITH_GPU
+ARG WITH_AVX
+ENV WITH_GPU=${WITH_GPU:-ON}
+ENV WITH_AVX=${WITH_AVX:-ON}
+ENV HOME /root
+# Add bash enhancements
+COPY tools/dockerfiles/root/ /root/
+# Prepare packages for Python
+RUN apt-get update && \
+    apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
+    libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
+    xz-utils tk-dev libffi-dev liblzma-dev
+RUN apt-get update && \
+    apt-get install -y --allow-downgrades --allow-change-held-packages \
+    patchelf git python-pip python-dev python-opencv openssh-server bison \
+    wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
+    curl sed grep graphviz libjpeg-dev zlib1g-dev  \
+    python-matplotlib unzip \
+    automake locales clang-format swig  \
+    liblapack-dev liblapacke-dev libcurl4-openssl-dev \
+    net-tools libtool module-init-tools vim && \
+    apt-get clean -y
+RUN ln -s /usr/lib/x86_64-linux-gnu/libssl.so /usr/lib/libssl.so.10 && \
+    ln -s /usr/lib/x86_64-linux-gnu/libcrypto.so /usr/lib/libcrypto.so.10
+RUN wget https://github.com/koalaman/shellcheck/releases/download/v0.7.1/shellcheck-v0.7.1.linux.x86_64.tar.xz -O shellcheck-v0.7.1.linux.x86_64.tar.xz && \
+    tar -xf shellcheck-v0.7.1.linux.x86_64.tar.xz && cp  shellcheck-v0.7.1/shellcheck /usr/bin/shellcheck && \
+    rm -rf shellcheck-v0.7.1.linux.x86_64.tar.xz shellcheck-v0.7.1
+# Downgrade gcc&&g++
+WORKDIR /usr/bin 
+      COPY tools/dockerfiles/build_scripts /build_scripts 
+      RUN bash /build_scripts/install_gcc.sh gcc82 && rm -rf /build_scripts 
+      RUN cp gcc gcc.bak && cp g++ g++.bak && rm gcc && rm g++ 
+      RUN ln -s /usr/local/gcc-8.2/bin/gcc /usr/local/bin/gcc 
+      RUN ln -s /usr/local/gcc-8.2/bin/g++ /usr/local/bin/g++ 
+      RUN ln -s /usr/local/gcc-8.2/bin/gcc /usr/bin/gcc 
+      RUN ln -s /usr/local/gcc-8.2/bin/g++ /usr/bin/g++ 
+      ENV PATH=/usr/local/gcc-8.2/bin:$PATH 
+# install cmake
+WORKDIR /home
+RUN wget -q https://cmake.org/files/v3.16/cmake-3.16.0-Linux-x86_64.tar.gz && tar -zxvf cmake-3.16.0-Linux-x86_64.tar.gz && rm cmake-3.16.0-Linux-x86_64.tar.gz
+ENV PATH=/home/cmake-3.16.0-Linux-x86_64/bin:$PATH
+# Install Python3.6
+RUN mkdir -p /root/python_build/ && wget -q https://www.sqlite.org/2018/sqlite-autoconf-3250300.tar.gz && \
+    tar -zxf sqlite-autoconf-3250300.tar.gz && cd sqlite-autoconf-3250300 && \
+    ./configure -prefix=/usr/local && make -j8 && make install && cd ../ && rm sqlite-autoconf-3250300.tar.gz
+RUN wget -q https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tgz && \
+    tar -xzf Python-3.6.0.tgz && cd Python-3.6.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig && cd .. && rm -rf Python-3.6.0*
+# Install Python3.7
+RUN wget -q https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tgz && \
+    tar -xzf Python-3.7.0.tgz && cd Python-3.7.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig && cd .. && rm -rf Python-3.7.0*
+# Install Python3.8
+RUN wget -q https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz && \
+    tar -xzf Python-3.8.0.tgz && cd Python-3.8.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null && ldconfig && cd .. && rm -rf Python-3.8.0*
+ENV LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
+RUN ln -sf /usr/local/bin/python3.6 /usr/local/bin/python3 && ln -sf /usr/local/bin/python3.6 /usr/bin/python3 && ln -sf /usr/local/bin/pip3.6 /usr/local/bin/pip3 && ln -sf /usr/local/bin/pip3.6 /usr/bin/pip3
+RUN rm -r /root/python_build
+# Install Go and glide
+RUN wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | \
+    tar -xz -C /usr/local && \
+    mkdir /root/go && \
+    mkdir /root/go/bin && \
+    mkdir /root/go/src && \
+    echo "GOROOT=/usr/local/go" >> /root/.bashrc && \
+    echo "GOPATH=/root/go" >> /root/.bashrc && \
+    echo "PATH=/usr/local/go/bin:/root/go/bin:$PATH" >> /root/.bashrc
+ENV GOROOT=/usr/local/go GOPATH=/root/go
+# should not be in the same line with GOROOT definition, otherwise docker build could not find GOROOT.
+ENV PATH=usr/local/go/bin:/root/go/bin:${PATH}
+# Install TensorRT
+# following TensorRT.tar.gz is not the default official one, we do two miny changes:
+# 1. Remove the unnecessary files to make the library small. TensorRT.tar.gz only contains include and lib now,
+#    and its size is only one-third of the official one.
+# 2. Manually add ~IPluginFactory() in IPluginFactory class of NvInfer.h, otherwise, it couldn't work in paddle.
+#    See https://github.com/PaddlePaddle/Paddle/issues/10129 for details.
+# Downgrade TensorRT 
+COPY tools/dockerfiles/build_scripts /build_scripts
+RUN bash /build_scripts/install_trt.sh cuda11.2 
+RUN rm -rf /build_scripts
+# git credential to skip password typing
+RUN git config --global credential.helper store
+# Fix locales to en_US.UTF-8
+RUN localedef -i en_US -f UTF-8 en_US.UTF-8
+RUN apt-get install libprotobuf-dev -y
+# Older versions of patchelf limited the size of the files being processed and were fixed in this pr.
+# https://github.com/NixOS/patchelf/commit/ba2695a8110abbc8cc6baf0eea819922ee5007fa
+# So install a newer version here.
+RUN wget -q https://paddle-ci.cdn.bcebos.com/patchelf_0.10-2_amd64.deb && \
+    dpkg -i patchelf_0.10-2_amd64.deb
+# Configure OpenSSH server. c.f. https://docs.docker.com/engine/examples/running_ssh_service
+RUN mkdir /var/run/sshd && echo 'root:root' | chpasswd && sed -ri 's/^PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config && sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config
+CMD source ~/.bashrc
+# ccache 3.7.9
+RUN wget https://paddle-ci.gz.bcebos.com/ccache-3.7.9.tar.gz && \
+    tar xf ccache-3.7.9.tar.gz && mkdir /usr/local/ccache-3.7.9 && cd ccache-3.7.9 && \
+    ./configure -prefix=/usr/local/ccache-3.7.9 && \
+    make -j8 && make install && \
+    ln -s /usr/local/ccache-3.7.9/bin/ccache /usr/local/bin/ccache
+RUN python3.8 -m pip install --upgrade pip==21.1.1 requests && \
+    python3.7 -m pip install --upgrade pip==21.1.1 requests && \
+    python3.6 -m pip install --upgrade pip==21.1.1 requests 
+RUN wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && \
+    tar xf centos_ssl.tar && rm -rf centos_ssl.tar && \
+    mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k && mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k && \
+    ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 && \
+    ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 && \
+    ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so && \
+    ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
+EXPOSE 22
--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
@@ -83,7 +83,7 @@ RUN ln -sf /usr/local/bin/python3.6 /usr/local/bin/python3 && ln -sf /usr/local/
 RUN rm -r /root/python_build
 # Install Go and glide
-RUN wget -qO- https://dl.google.com/go/go1.14.linux-amd64.tar.gz | \
+RUN wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | \
    tar -xz -C /usr/local && \
    mkdir /root/go && \
    mkdir /root/go/bin && \