pull develop

aae7dafa · HexToString · 1ee32878 · 8a5303f7 · aae7dafa · aae7dafa
25 changed file
--- a/doc/DESIGN_DOC.md
+++ b/doc/DESIGN_DOC.md
@@ -27,7 +27,7 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro
 |-----|------|-----|-----|------|------|
 | LOW | HIGH | LOW | HIGH |C++ Serving | High-performance，recall and ranking services of large-scale online recommendation systems|
 | HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
-| HIGH | LOW | HIGH| LOW |Python webserver| High-throughput，Low-traffic services or projects that require rapid iteration, model effect verification|
+| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput，Low-traffic services or projects that require rapid iteration, model effect verification|

 Performance index description：
 1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better.
@@ -53,7 +53,7 @@ Paddle Serving takes into account a series of issues such as different operating

 Cross-platform is not dependent on the operating system, nor on the hardware environment. Applications developed under one operating system can still run under another operating system. Therefore, the design should consider not only the development language and the cross-platform components, but also the interpretation differences of the compilers on different systems.

-Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, and then publish it to any popular Linux machine or Windows machine. We have packaged a variety of Docker images for the Paddle Serving framework. Refer to the image list《[Docker Images](DOCKER_IMAGES.md)》, Select mirrors according to user's usage. We provide Docker usage documentation《[How to run PaddleServing in Docker](RUN_IN_DOCKER.md)》.Currently, the Python webserver mode can be deployed and run on the native Linux and Windows dual systems.《[Paddle Serving for Windows Users](WINDOWS_TUTORIAL.md)》
+Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, and then publish it to any popular Linux machine or Windows machine. We have packaged a variety of Docker images for the Paddle Serving framework. Refer to the image list《[Docker Images](DOCKER_IMAGES.md)》, Select mirrors according to user's usage. We provide Docker usage documentation《[How to run PaddleServing in Docker](RUN_IN_DOCKER.md)》.Currently, the Python webservice mode can be deployed and run on the native Linux and Windows dual systems.《[Paddle Serving for Windows Users](WINDOWS_TUTORIAL.md)》

 > Support multiple development languages client SDKs

@@ -141,7 +141,7 @@ The underlying communication of Paddle Serving is implemented with C++ as well a

 ----

-## 4. Python Webserver Design
+## 4. Python Webservice Design

 ### 4.1 Network Communication Mechanism
 There are many open source frameworks for web services. Paddle Serving currently integrates the Flask framework, but this part is not visible to users. In the future, a better-performing web framework may be provided as the underlying HTTP service integration engine.

--- a/doc/DESIGN_DOC_CN.md
+++ b/doc/DESIGN_DOC_CN.md
@@ -29,7 +29,7 @@ Paddle Serving提供很多大规模场景需要的部署功能：1）模型管
 |-----|------|-----|-----|------|------|
 | 低 | 高 | 低 | 高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务|
 | 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率，单算子多模型组合场景，异步模式|
-| 高 | 低 | 高| 低 |Python webserver| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|
+| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|


 性能指标说明：
@@ -55,7 +55,7 @@ Paddle Serving从做顶层设计时考虑到不同团队在工业级场景中会
 > 跨平台运行

 跨平台是不依赖于操作系统，也不依赖硬件环境。一个操作系统下开发的应用，放到另一个操作系统下依然可以运行。因此，设计上既要考虑开发语言、组件是跨平台的，同时也要考虑不同系统上编译器的解释差异。
-Docker 是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像，镜像列表参考《[Docker镜像](DOCKER_IMAGES_CN.md)》，根据用户的使用场景选择镜像。为方便用户使用Docker，我们提供了帮助文档《[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)》。目前，Python webserver模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](WINDOWS_TUTORIAL_CN.md)》
+Docker 是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像，镜像列表参考《[Docker镜像](DOCKER_IMAGES_CN.md)》，根据用户的使用场景选择镜像。为方便用户使用Docker，我们提供了帮助文档《[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)》。目前，Python webservice模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](WINDOWS_TUTORIAL_CN.md)》

 > 支持多种开发语言SDK

@@ -144,7 +144,7 @@ Paddle Serving采用对称加密算法对模型进行加密，在服务加载模
 由于Paddle Serving底层采用基于C++的通信组件，并且核心框架也是基于C/C++编写，当用户想要在服务端定义复杂的前处理与后处理逻辑时，一种办法是修改Paddle Serving底层框架，重新编译源码。另一种方式可以通过在服务端嵌入轻量级的Web服务，通过在Web服务中实现更复杂的预处理逻辑，从而搭建一套逻辑完整的服务。当访问量超过了Web服务能够接受的范围，开发者有足够的理由开发一些高性能的C++预处理逻辑，并嵌入到Serving的原生服务库中。Web服务和RPC服务的关系以及他们的组合方式可以参考下文`用户类型`中的说明。

 ----
-## 4. Python webserver设计与使用
+## 4. Python webservice设计与使用

 ### 4.1 网络框架
 Web服务有很多开源的框架，Paddle Serving当前集成了Flask框架，但这部分对用户不可见，在未来可能会提供性能更好的Web框架作为底层HTTP服务集成引擎。

--- a/doc/LOW_PRECISION_DEPLOYMENT.md
+++ b/doc/LOW_PRECISION_DEPLOYMENT.md
@@ -17,7 +17,7 @@ python -m paddle_serving_client.convert --dirname ResNet50_quant
 ```
 Start RPC service, specify the GPU id and precision mode
 ```
-python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_gpu --use_trt --precision int8 
+python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8 
 ```
 Request the serving service with Client
 ```
@@ -27,7 +27,7 @@ from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize

 client = Client()
 client.load_client_config(
-    "resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
+    "serving_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9393"])

 seq = Sequential([
@@ -37,8 +37,8 @@ seq = Sequential([

 image_file = "daisy.jpg"
 img = seq(image_file)
-fetch_map = client.predict(feed={"image": img}, fetch=["score"])
-print(fetch_map["score"].reshape(-1))
+fetch_map = client.predict(feed={"image": img}, fetch=["save_infer_model/scale_0.tmp_0"])
+print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
 ```

 ## Reference

--- a/doc/LOW_PRECISION_DEPLOYMENT_CN.md
+++ b/doc/LOW_PRECISION_DEPLOYMENT_CN.md
@@ -16,7 +16,7 @@ python -m paddle_serving_client.convert --dirname ResNet50_quant
 ```
 启动rpc服务, 设定所选GPU id、部署模型精度
 ```
-python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_gpu --use_trt --precision int8 
+python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8 
 ```
 使用client进行请求
 ```

--- a/doc/PIPELINE_SERVING.md
+++ b/doc/PIPELINE_SERVING.md
@@ -15,7 +15,6 @@ The Server side is built based on <b>RPC Service</b> and <b>graph execution engi
 <img src='pipeline_serving-image1.png' height = "250" align="middle"/>
 </center>

-
 ### 1. RPC Service

 In order to meet the needs of different users, the RPC service starts one Web server and one RPC server at the same time, and can process 2 types of requests, RESTful API and gRPC.The gPRC gateway receives RESTful API requests and forwards requests to the gRPC server through the reverse proxy server; gRPC requests are received by the gRPC server, so the two types of requests are processed by the gRPC Service in a unified manner to ensure that the processing logic is consistent.
@@ -426,6 +425,8 @@ service PipelineService {

 ## ★ Classic examples

+All examples of pipelines are in [examples/pipeline/](../python/examples/pipeline) directory.
+
 Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `python/examples/pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:



--- a/doc/PIPELINE_SERVING_CN.md
+++ b/doc/PIPELINE_SERVING_CN.md
@@ -150,6 +150,7 @@ def __init__(name=None,



+
 ### 2. 普通 OP二次开发接口
 OP 二次开发的目的是满足业务开发人员控制OP处理策略。

@@ -429,6 +430,8 @@ service PipelineService {

 ## ★ 典型示例

+所有Pipeline示例在[examples/pipeline/](../python/examples/pipeline) 目录下。
+
 这里通过搭建简单的 imdb model ensemble 例子来展示如何使用 Pipeline Serving，相关代码在 `python/examples/pipeline/imdb_model_ensemble` 文件夹下可以找到，例子中的 Server 端结构如下图所示：



--- a/python/examples/low_precision/resnet50/README.md
+++ b/python/examples/low_precision/resnet50/README.md
+# resnet50 int8 example
+(English|[简体中文](./README_CN.md))
+
+## Obtain the quantized model through PaddleSlim tool
+Train the low-precision models please refer to [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html).
+
+## Deploy the quantized model from PaddleSlim using Paddle Serving with Nvidia TensorRT int8 mode
+
+Firstly, download the [Resnet50 int8 model](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz) and convert to Paddle Serving's saved model。
+```
+wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
+tar zxvf ResNet50_quant.tar.gz
+
+python -m paddle_serving_client.convert --dirname ResNet50_quant
+```
+Start RPC service, specify the GPU id and precision mode
+```
+python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8 
+```
+Request the serving service with Client
+```
+python resnet50_client.py
+```
+
+## Reference
+* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+* [Deploy the quantized model Using Paddle Inference on Intel CPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
+* [Deploy the quantized model Using Paddle Inference on Nvidia GPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
--- a/python/examples/low_precision/resnet50/README_CN.md
+++ b/python/examples/low_precision/resnet50/README_CN.md
+# resnet50 int8示例
+(简体中文|[English](./README.md))
+
+## 通过PaddleSlim量化生成低精度模型
+详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html)
+
+## 使用TensorRT int8加载PaddleSlim Int8量化模型进行部署
+首先下载Resnet50 [PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz),并转换为Paddle Serving支持的部署模型格式。
+```
+wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
+tar zxvf ResNet50_quant.tar.gz
+
+python -m paddle_serving_client.convert --dirname ResNet50_quant
+```
+启动rpc服务, 设定所选GPU id、部署模型精度
+```
+python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8 
+```
+使用client进行请求
+```
+python resnet50_client.py
+```
+
+## 参考文档
+* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
+* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
--- a/python/examples/low_precision/resnet50/daisy.jpg
+++ b/python/examples/low_precision/resnet50/daisy.jpg
--- a/python/examples/xpu/fit_a_line_xpu/test_multi_process_client.py
+++ b/python/examples/xpu/fit_a_line_xpu/test_multi_process_client.py
@@ -13,30 +13,20 @@
 # limitations under the License.

 from paddle_serving_client import Client
-from paddle_serving_client.utils import MultiThreadRunner
-import paddle
-import numpy as np
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize

+client = Client()
+client.load_client_config(
+    "serving_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])

-def single_func(idx, resource):
-    client = Client()
-    client.load_client_config(
-        "./uci_housing_client/serving_client_conf.prototxt")
-    client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
-    x = [
-        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
-        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
-    ]
-    x = np.array(x)
-    for i in range(1000):
-        fetch_map = client.predict(feed={"x": x}, fetch=["price"])
-        if fetch_map is None:
-            return [[None]]
-    return [[0]]
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])

-
-multi_thread_runner = MultiThreadRunner()
-thread_num = 4
-result = multi_thread_runner.run(single_func, thread_num, {})
-if None in result[0]:
-    exit(1)
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"image": img}, fetch=["save_infer_model/scale_0.tmp_0"])
+print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
--- a/python/examples/xpu/bert/README.md
+++ b/python/examples/xpu/bert/README.md

 ## Prepare
+### download model and extract
+```
+wget https://paddle-inference-dist.cdn.bcebos.com/PaddleLite/models_and_data_for_unittests/bert_base_chinese.tar.gz
+tar zxvf bert_base_chinese.tar.gz
+```
 ### convert model
 ```
-python -m paddle_serving_client.convert --dirname infer_bert-base-chinese_ft_model_4000.pdparams
+python3 -m paddle_serving_client.convert --dirname bert_base_chinese --model_filename bert_base_chinese/model.pdmodel --params_filename bert_base_chinese/model.pdiparams
+```
+### or, you can get the serving saved model directly
+```
+wget https://paddle-serving.bj.bcebos.com/models/xpu/bert.tar.gz
+tar zxvf bert.tar.gz 
+```
+### Getting Dict and Sample Dataset
+
+```
+sh get_data.sh
 ```
+this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt

 ## RPC Service

 ### Start Service

 ```
-pytyon bert_web_service.py serving_server 7703
+python3 bert_web_service.py serving_server 7703
 ```

 ### Client Prediction

 ```
-python bert_client.py
+head data-c.txt | python3 bert_client.py --model serving_client/serving_client_conf.prototxt
 ```
--- a/python/examples/xpu/bert/get_data.sh
+++ b/python/examples/xpu/bert/get_data.sh
+wget https://paddle-serving.bj.bcebos.com/bert_example/data-c.txt --no-check-certificate
+wget https://paddle-serving.bj.bcebos.com/bert_example/vocab.txt --no-check-certificate
--- a/python/examples/xpu/bert/vocab.txt
+++ b/python/examples/xpu/bert/vocab.txt
--- a/python/examples/xpu/ernie/README.md
+++ b/python/examples/xpu/ernie/README.md

 ## Prepare
+### download model and extract
+```
+wget https://paddle-inference-dist.cdn.bcebos.com/PaddleLite/models_and_data_for_unittests/ernie.tar.gz
+tar zxvf ernie.tar.gz
+```
 ### convert model
 ```
-python3 -m paddle_serving_client.convert --dirname erine
+python3 -m paddle_serving_client.convert --dirname ernie
+```
+### or, you can get the serving saved model directly
+```
+wget https://paddle-serving.bj.bcebos.com/models/xpu/ernie.tar.gz
+tar zxvf ernie.tar.gz 
+```
+### Getting Dict and Sample Dataset
+
+```
+sh get_data.sh
 ```
+this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt

 ## RPC Service


--- a/python/examples/xpu/ernie/ernie_client.py
+++ b/python/examples/xpu/ernie/ernie_client.py
@@ -23,7 +23,7 @@ args = benchmark_args()

 reader = ChineseErnieReader({"max_seq_len": 128})
 fetch = ["save_infer_model/scale_0"]
-endpoint_list = ['127.0.0.1:7704']
+endpoint_list = ['127.0.0.1:12000']
 client = Client()
 client.load_client_config(args.model)
 client.connect(endpoint_list)

--- a/python/examples/xpu/ernie/erine_web_service.py
+++ b/python/examples/xpu/ernie/erine_web_service.py
--- a/python/examples/xpu/ernie/get_data.sh
+++ b/python/examples/xpu/ernie/get_data.sh
+wget https://paddle-serving.bj.bcebos.com/bert_example/data-c.txt --no-check-certificate
+wget https://paddle-serving.bj.bcebos.com/bert_example/vocab.txt --no-check-certificate
--- a/python/examples/xpu/ernie/vocab.txt
+++ b/python/examples/xpu/ernie/vocab.txt
--- a/python/examples/xpu/fit_a_line_xpu/README.md
+++ b/python/examples/xpu/fit_a_line_xpu/README.md
@@ -8,15 +8,10 @@
 sh get_data.sh
 ```

-
-
 ## RPC service

 ### Start server
-``` shell
-python test_server.py uci_housing_model/
-```
-You can alse use the following code to start the RPC service 
+You can use the following code to start the RPC service 
 ```shell
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
 ```
@@ -29,19 +24,17 @@ The `paddlepaddle` package is used in `test_client.py`, and you may need to down
 python test_client.py uci_housing_client/serving_client_conf.prototxt
 ```

-
-
 ## HTTP service

 ### Start server

 Start a web service with default web service hosting modules:
 ``` shell
-python test_server.py
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim --name uci
 ```

 ### Client prediction

 ``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
 ```
--- a/python/examples/xpu/fit_a_line_xpu/README_CN.md
+++ b/python/examples/xpu/fit_a_line_xpu/README_CN.md
@@ -32,8 +32,6 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 python test_client.py uci_housing_client/serving_client_conf.prototxt
 ```

-
-
 ## HTTP服务

 ### 开启服务端
@@ -41,11 +39,11 @@ python test_client.py uci_housing_client/serving_client_conf.prototxt
 通过下面的一行代码开启默认web服务：

 ``` shell
-python test_server.py
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim --name uci
 ```

 ### 客户端预测

 ``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
 ```
--- a/python/examples/xpu/fit_a_line_xpu/test_server.py
+++ b/python/examples/xpu/fit_a_line_xpu/test_server.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from paddle_serving_server.web_service import WebService
-import numpy as np
-
-
-class UciService(WebService):
-    def preprocess(self, feed=[], fetch=[]):
-        feed_batch = []
-        is_batch = True
-        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
-        for i, ins in enumerate(feed):
-            nums = np.array(ins["x"]).reshape(1, 1, 13)
-            new_data[i] = nums
-        feed = {"x": new_data}
-        return feed, fetch, is_batch
-
-
-uci_service = UciService(name="uci")
-uci_service.load_model_config("uci_housing_model")
-uci_service.prepare_server(
-    workdir="workdir", port=9393, use_lite=True, use_xpu=True, ir_optim=True)
-uci_service.run_rpc_service()
-uci_service.run_web_service()
--- a/python/examples/xpu/vgg19/README.md
+++ b/python/examples/xpu/vgg19/README.md

 ## Prepare
+### download model and extract
+```
+wget https://paddle-inference-dist.bj.bcebos.com/PaddleLite/models_and_data_for_unittests/VGG19.tar.gz
+tar zxvf VGG19.tar.gz
+```
 ### convert model
 ```
-python -m paddle_serving_client.convert --dirname VGG19
+python3 -m paddle_serving_client.convert --dirname VGG19
+```
+### or, you can get the serving saved model directly
+```
+wget https://paddle-serving.bj.bcebos.com/models/xpu/vgg19.tar.gz
+tar zxvf vgg19.tar.gz 
 ```

 ## RPC Service
@@ -10,7 +20,7 @@ python -m paddle_serving_client.convert --dirname VGG19
 ### Start Service

 ```
-python -m paddle_serving_server.serve --model serving_server --port 7702 --use_lite --use_xpu --ir_optim
+python3 -m paddle_serving_server.serve --model serving_server --port 7702 --use_lite --use_xpu --ir_optim
 ```

 ### Client Prediction

--- a/python/paddle_serving_server/server.py
+++ b/python/paddle_serving_server/server.py
@@ -386,8 +386,6 @@ class Server(object):
            return

        if not os.path.exists(self.server_path):
-            os.system("touch {}/{}.is_download".format(self.module_path,
-                                                       folder_name))
            print('Frist time run, downloading PaddleServing components ...')

            r = os.system('wget ' + bin_url + ' --no-check-certificate')
@@ -403,9 +401,10 @@ class Server(object):
                    tar = tarfile.open(tar_name)
                    tar.extractall()
                    tar.close()
+                    open(download_flag, "a").close()
                except:
-                    if os.path.exists(exe_path):
-                        os.remove(exe_path)
+                    if os.path.exists(self.server_path):
+                        os.remove(self.server_path)
                    raise SystemExit(
                        'Decompressing failed, please check your permission of {} or disk space left.'
                        .format(self.module_path))

--- a/python/pipeline/pipeline_server.py
+++ b/python/pipeline/pipeline_server.py
@@ -56,7 +56,6 @@ class PipelineServicer(pipeline_service_pb2_grpc.PipelineServiceServicer):
            resp = pipeline_service_pb2.Response()
            resp.err_no = channel.ChannelDataErrcode.NO_SERVICE.value
            resp.err_msg = "Failed to inference: Service name error."
-            resp.result = ""
            return resp
        resp = self._dag_executor.call(request)
        return resp

--- a/tools/scripts/ipipe_py3.sh
+++ b/tools/scripts/ipipe_py3.sh
@@ -148,7 +148,7 @@ function before_hook() {
    setproxy
    unsetproxy
    cd ${build_path}/python
-    python3.6 -m pip install --upgrade pip
+    python3.6 -m pip install --upgrade pip==20.0.1
    python3.6 -m pip install requests
    python3.6 -m pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
    python3.6 -m pip install numpy==1.16.4
@@ -323,7 +323,7 @@ function bert_rpc_cpu() {
    link_data ${data_dir}
    sed -i 's/8860/8861/g' bert_client.py
    python3.6 -m paddle_serving_server.serve --model bert_seq128_model/ --port 8861 > ${dir}server_log.txt 2>&1 &
-    check_result server 3
+    check_result server 5
    cp data-c.txt.1 data-c.txt
    head data-c.txt | python3.6 bert_client.py --model bert_seq128_client/serving_client_conf.prototxt > ${dir}client_log.txt 2>&1
    check_result client "bert_CPU_RPC server test completed"
@@ -338,7 +338,7 @@ function pipeline_imagenet() {
    data_dir=${data}imagenet/
    link_data ${data_dir}
    python3.6 resnet50_web_service.py > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 pipeline_rpc_client.py > ${dir}client_log.txt 2>&1
    check_result client "pipeline_imagenet_GPU_RPC server test completed"
@@ -355,7 +355,7 @@ function ResNet50_rpc() {
    link_data ${data_dir}
    sed -i 's/9696/8863/g' resnet50_rpc_client.py
    python3.6 -m paddle_serving_server.serve --model ResNet50_vd_model --port 8863 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt > ${dir}client_log.txt 2>&1
    check_result client "ResNet50_GPU_RPC server test completed"
@@ -372,7 +372,7 @@ function ResNet101_rpc() {
    link_data ${data_dir}
    sed -i "22cclient.connect(['127.0.0.1:8864'])" image_rpc_client.py
    python3.6 -m paddle_serving_server.serve --model ResNet101_vd_model --port 8864 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 image_rpc_client.py ResNet101_vd_client_config/serving_client_conf.prototxt > ${dir}client_log.txt 2>&1
    check_result client "ResNet101_GPU_RPC server test completed"
@@ -482,7 +482,7 @@ function cascade_rcnn_rpc() {
    link_data ${data_dir}
    sed -i "s/9292/8879/g" test_client.py
    python3.6 -m paddle_serving_server.serve --model serving_server --port 8879 --gpu_ids 0 --thread 2 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 test_client.py > ${dir}client_log.txt 2>&1
    nvidia-smi
@@ -499,7 +499,7 @@ function deeplabv3_rpc() {
    link_data ${data_dir}
    sed -i "s/9494/8880/g" deeplabv3_client.py
    python3.6 -m paddle_serving_server.serve --model deeplabv3_server --gpu_ids 0 --port 8880 --thread 2 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 10
    nvidia-smi
    python3.6 deeplabv3_client.py > ${dir}client_log.txt 2>&1
    nvidia-smi
@@ -516,7 +516,7 @@ function mobilenet_rpc() {
    tar xf mobilenet_v2_imagenet.tar.gz
    sed -i "s/9393/8881/g" mobilenet_tutorial.py
    python3.6 -m paddle_serving_server.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 8881 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 mobilenet_tutorial.py > ${dir}client_log.txt 2>&1
    nvidia-smi
@@ -533,7 +533,7 @@ function unet_rpc() {
    link_data ${data_dir}
    sed -i "s/9494/8882/g" seg_client.py
    python3.6 -m paddle_serving_server.serve --model unet_model --gpu_ids 0 --port 8882 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 seg_client.py > ${dir}client_log.txt 2>&1
    nvidia-smi
@@ -599,7 +599,7 @@ function criteo_ctr_rpc_gpu() {
    link_data ${data_dir}
    sed -i "s/8885/8886/g" test_client.py
    python3.6 -m paddle_serving_server.serve --model ctr_serving_model/ --port 8886 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    nvidia-smi
    python3.6 test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0 > ${dir}client_log.txt 2>&1
    nvidia-smi
@@ -617,7 +617,7 @@ function yolov4_rpc_gpu() {
    sed -i "s/9393/8887/g" test_client.py
    python3.6 -m paddle_serving_server.serve --model yolov4_model --port 8887 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
    nvidia-smi
-    check_result server 5
+    check_result server 8
    python3.6 test_client.py 000000570688.jpg > ${dir}client_log.txt 2>&1
    nvidia-smi
    check_result client "yolov4_GPU_RPC server test completed"
@@ -634,7 +634,7 @@ function senta_rpc_cpu() {
    sed -i "s/9393/8887/g" test_client.py
    python3.6 -m paddle_serving_server.serve --model yolov4_model --port 8887 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
    nvidia-smi
-    check_result server 5
+    check_result server 8
    python3.6 test_client.py 000000570688.jpg > ${dir}client_log.txt 2>&1
    nvidia-smi
    check_result client "senta_GPU_RPC server test completed"
@@ -724,7 +724,7 @@ function bert_http() {
    cp vocab.txt.1 vocab.txt
    export CUDA_VISIBLE_DEVICES=0
    python3.6 bert_web_service.py bert_seq128_model/ 8878 > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 8
    curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:8878/bert/prediction > ${dir}client_log.txt 2>&1
    check_result client "bert_GPU_HTTP server test completed"
    kill_server_process
@@ -762,7 +762,7 @@ function grpc_yolov4() {
    link_data ${data_dir}
    echo -e "${GREEN_COLOR}grpc_impl_example_yolov4_GPU_gRPC server started${RES}"
    python3.6 -m paddle_serving_server.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang > ${dir}server_log.txt 2>&1 &
-    check_result server 5
+    check_result server 10
    echo -e "${GREEN_COLOR}grpc_impl_example_yolov4_GPU_gRPC client started${RES}"
    python3.6 test_client.py 000000570688.jpg > ${dir}client_log.txt 2>&1
    check_result client "grpc_yolov4_GPU_GRPC server test completed"