Merge pull request #25 from PaddlePaddle/develop

Develop

Merge pull request #25 from PaddlePaddle/develop
Develop
e694e859 · TeslaZhao · GitHub · a8891833 · eee1f2a6 · e694e859
32 changed file
--- a/doc/BAIDU_KUNLUN_XPU_SERVING.md
+++ b/doc/BAIDU_KUNLUN_XPU_SERVING.md
+# Paddle Serving Using Baidu Kunlun Chips
+(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
+
+Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips
+ (such as Phytium FT-2000+/64). We will improve
+ the deployment capability on various heterogeneous hardware servers in the future. 
+
+# Compilation and installation
+Refer to [compile](COMPILE.md) document to setup the compilation environment。
+## Compilatiton
+* Compile the Serving Server
+```
+cd Serving
+mkdir -p server-build-arm && cd server-build-arm
+
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DSERVER=ON ..
+make -j10
+```
+You can run `make install` to produce the target in `./output` directory. Add `-DCMAKE_INSTALL_PREFIX=./output` to specify the output path to CMake command shown above。
+* Compile the Serving Client
+```
+mkdir -p client-build-arm && cd client-build-arm
+
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DCLIENT=ON ..
+
+make -j10
+```
+* Compile the App
+```
+cd Serving 
+mkdir -p app-build-arm && cd app-build-arm
+
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DAPP=ON ..
+
+make -j10
+```
+## Install the wheel package
+After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
+For example, after the Server Compiation step，the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package。
+
+# Request parameters description
+In order to deploy serving
+ service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite，please specify the following parameters during deployment。
+|param|param description|about|
+|:--|:--|:--|
+|use_lite|using Paddle-Lite Engine|use the inference capability of Paddle-Lite|
+|use_xpu|using Baidu Kunlun for inference|need to be used with the use_lite option|
+|ir_optim|open the graph optimization|refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
+# Deplyment examples
+## Download the model
+```
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+tar -xzf uci_housing.tar.gz
+```
+## Start RPC service
+There are mainly three deployment methods：
+* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu；
+* deploy on the ARM server standalone with Paddle-Lite；
+* deploy on the ARM server standalone without Paddle-Lite。
+    
+The first two deployment methods are recommended。
+
+Start the rpc service, deploying on ARM server with Baidu Kunlun chips，and accelerate with Paddle-Lite and Baidu Kunlun xpu.
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
+```
+Start the rpc service, deploying on ARM server，and accelerate with Paddle-Lite.
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
+```
+Start the rpc service, deploying on ARM server.
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
+```
+## 
+```
+from paddle_serving_client import Client
+import numpy as np
+client = Client()
+client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
+        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
+print(fetch_map)
+```
+Some examples are provided below, and other models can be modifed with reference to these examples。
+|sample name|sample links|
+|:-----|:--|
+|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
+|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
--- a/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md
+++ b/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md
+# Paddle Serving使用百度昆仑芯片部署
+(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
+
+Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器（如飞腾 FT-2000+/64）上进行部署，后续完善对其他异构硬件服务器部署能力。
+
+# 编译、安装
+基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。
+## 编译
+* 编译server部分
+```
+cd Serving
+mkdir -p server-build-arm && cd server-build-arm
+
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DSERVER=ON ..
+make -j10
+```
+可以执行`make install`把目标产出放在`./output`目录下，cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
+* 编译client部分
+```
+mkdir -p client-build-arm && cd client-build-arm
+
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DCLIENT=ON ..
+
+make -j10
+```
+* 编译app部分
+```
+cd Serving 
+mkdir -p app-build-arm && cd app-build-arm
+
+cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
+    -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
+    -DPYTHON_EXECUTABLE=/usr/bin/python \
+    -DWITH_PYTHON=ON \
+    -DWITH_LITE=ON \
+    -DWITH_XPU=ON \
+    -DAPP=ON ..
+
+make -j10
+```
+## 安装wheel包
+以上编译步骤完成后，会在各自编译目录$build_dir/python/dist生成whl包，分别安装即可。例如server步骤，会在server-build-arm/python/dist目录下生成whl包, 使用命令```pip install -u xxx.whl```进行安装。
+
+# 请求参数说明
+为了支持arm+xpu服务部署，使用Paddle-Lite加速能力，请求时需使用以下参数。
+|参数|参数说明|备注|
+|:--|:--|:--|
+|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力|
+|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用|
+|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
+# 部署使用示例
+## 下载模型
+```
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+tar -xzf uci_housing.tar.gz
+```
+## 启动rpc服务
+主要有三种启动配置：
+* 使用arm cpu+xpu部署，使用Paddle-Lite xpu优化加速能力；
+* 单独使用arm cpu部署，使用Paddle-Lite优化加速能力；
+* 使用arm cpu部署，不使用Paddle-Lite加速。
+    
+推荐使用前两种部署方式。
+
+启动rpc服务，使用arm cpu+xpu部署，使用Paddle-Lite xpu优化加速能力
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
+```
+启动rpc服务，使用arm cpu部署, 使用Paddle-Lite加速能力
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
+```
+启动rpc服务，使用arm cpu部署, 不使用Paddle-Lite加速能力
+```
+python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
+```
+## client调用
+```
+from paddle_serving_client import Client
+import numpy as np
+client = Client()
+client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
+        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
+print(fetch_map)
+```
+以下提供部分样例，其他模型可参照进行修改。
+|示例名称|示例链接|
+|:-----|:--|
+|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
+|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
--- a/python/examples/bert/README.md
+++ b/python/examples/bert/README.md
@@ -11,14 +11,16 @@ This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubd

 Install paddlehub first
 ```
-pip install paddlehub
+pip3 install paddlehub
 ```

 run 
 ```
-python prepare_model.py 128
+python3 prepare_model.py 128
 ```

+**PaddleHub only support Python 3.5+**
+
 the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
 the config file and model file for server side are saved in the folder bert_seq128_model.
 the config file generated for client side is saved in the folder bert_seq128_client.
@@ -28,8 +30,9 @@ You can also download the above model from BOS(max_seq_len=128). After decompres
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
 ```
-if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'.

 ### Getting Dict and Sample Dataset


--- a/python/examples/bert/README_CN.md
+++ b/python/examples/bert/README_CN.md
@@ -10,11 +10,11 @@
 示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。
 请先安装paddlehub
 ```
-pip install paddlehub
+pip3 install paddlehub
 ```
 执行
 ```
-python prepare_model.py 128
+python3 prepare_model.py 128
 ```
 参数128表示BERT模型中的max_seq_len，即预处理后的样本长度。
 生成server端配置文件与模型文件，存放在bert_seq128_model文件夹。
@@ -25,9 +25,9 @@ python prepare_model.py 128
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
 ```
-若使用bert_chinese_L-12_H-768_A-12_model模型，将下面命令中的bert_seq128_model字段替换为bert_chinese_L-12_H-768_A-12_model，bert_seq128_client字段替换为bert_chinese_L-12_H-768_A-12_client.
-


 ### 获取词典和样例数据

--- a/python/examples/criteo_ctr/README.md
+++ b/python/examples/criteo_ctr/README.md
@@ -26,6 +26,6 @@ python -m paddle_serving_server_gpu.serve --model ctr_serving_model/ --port 9292
 ### RPC Infer

 ```
-python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/
+python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0
 ```
 the latency will display in the end.
--- a/python/examples/criteo_ctr/README_CN.md
+++ b/python/examples/criteo_ctr/README_CN.md
@@ -26,6 +26,6 @@ python -m paddle_serving_server_gpu.serve --model ctr_serving_model/ --port 9292
 ### 执行预测

 ```
-python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/
+python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0
 ```
 预测完毕会输出预测过程的耗时。
--- a/python/examples/criteo_ctr/criteo_reader.py
+++ b/python/examples/criteo_ctr/criteo_reader.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import sys
-import paddle.fluid.incubate.data_generator as dg
-
-
-class CriteoDataset(dg.MultiSlotDataGenerator):
-    def setup(self, sparse_feature_dim):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [
-            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.cont_diff_ = [
-            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.hash_dim_ = sparse_feature_dim
-        # here, training data are lines with line_index < train_idx_
-        self.train_idx_ = 41256555
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-
-    def _process_line(self, line):
-        features = line.rstrip('\n').split('\t')
-        dense_feature = []
-        sparse_feature = []
-        for idx in self.continuous_range_:
-            if features[idx] == '':
-                dense_feature.append(0.0)
-            else:
-                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
-                                     self.cont_diff_[idx - 1])
-        for idx in self.categorical_range_:
-            sparse_feature.append(
-                [hash(str(idx) + features[idx]) % self.hash_dim_])
-
-        return dense_feature, sparse_feature, [int(features[0])]
-
-    def infer_reader(self, filelist, batch, buf_size):
-        def local_iter():
-            for fname in filelist:
-                with open(fname.strip(), "r") as fin:
-                    for line in fin:
-                        dense_feature, sparse_feature, label = self._process_line(
-                            line)
-                        #yield dense_feature, sparse_feature, label
-                        yield [dense_feature] + sparse_feature + [label]
-
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-
-    def generate_sample(self, line):
-        def data_iter():
-            dense_feature, sparse_feature, label = self._process_line(line)
-            feature_name = ["dense_input"]
-            for idx in self.categorical_range_:
-                feature_name.append("C" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-
-        return data_iter
-
-
-if __name__ == "__main__":
-    criteo_dataset = CriteoDataset()
-    criteo_dataset.setup(int(sys.argv[1]))
-    criteo_dataset.run_from_stdin()
--- a/python/examples/criteo_ctr/test_client.py
+++ b/python/examples/criteo_ctr/test_client.py
@@ -14,43 +14,63 @@
 # pylint: disable=doc-string-missing

 from paddle_serving_client import Client
-import paddle
 import sys
 import os
 import time
-import criteo_reader as criteo
 from paddle_serving_client.metric import auc
 import numpy as np
 import sys

+class CriteoReader(object):
+    def __init__(self, sparse_feature_dim):
+        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+        self.cont_max_ = [
+            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
+        ]
+        self.cont_diff_ = [
+            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
+        ]
+        self.hash_dim_ = sparse_feature_dim
+        # here, training data are lines with line_index < train_idx_
+        self.train_idx_ = 41256555
+        self.continuous_range_ = range(1, 14)
+        self.categorical_range_ = range(14, 40)
+
+    def process_line(self, line):
+        features = line.rstrip('\n').split('\t')
+        dense_feature = []
+        sparse_feature = []
+        for idx in self.continuous_range_:
+            if features[idx] == '':
+                dense_feature.append(0.0)
+            else:
+                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
+                                     self.cont_diff_[idx - 1])
+        for idx in self.categorical_range_:
+            sparse_feature.append(
+                [hash(str(idx) + features[idx]) % self.hash_dim_])
+
+        return sparse_feature
+
 py_version = sys.version_info[0]

 client = Client()
 client.load_client_config(sys.argv[1])
 client.connect(["127.0.0.1:9292"])
-
+reader = CriteoReader(1000001)
 batch = 1
 buf_size = 100
-dataset = criteo.CriteoDataset()
-dataset.setup(1000001)
-test_filelists = [
-    "{}/part-%d".format(sys.argv[2]) % x
-    for x in range(len(os.listdir(sys.argv[2])))
-]
-reader = dataset.infer_reader(test_filelists[len(test_filelists) - 40:], batch,
-                              buf_size)
 label_list = []
 prob_list = []
 start = time.time()
-for ei in range(1000):
-    if py_version == 2:
-        data = reader().next()
-    else:
-        data = reader().__next__()
+f = open(sys.argv[2], 'r')
+for ei in range(10):
+    data = reader.process_line(f.readline())
    feed_dict = {}
    for i in range(1, 27):
-        feed_dict["sparse_{}".format(i - 1)] = np.array(data[0][i]).reshape(-1)
-        feed_dict["sparse_{}.lod".format(i - 1)] = [0, len(data[0][i])]
+        feed_dict["sparse_{}".format(i - 1)] = np.array(data[i-1]).reshape(-1)
+        feed_dict["sparse_{}.lod".format(i - 1)] = [0, len(data[i-1])]
    fetch_map = client.predict(feed=feed_dict, fetch=["prob"])
+    print(fetch_map)
 end = time.time()
-print(end - start)
+f.close()
--- a/python/examples/detection/README.md
+++ b/python/examples/detection/README.md
@@ -12,6 +12,7 @@ Paddle Detection provides a large number of [Model Zoo](https://github.com/Paddl

 ### Serving example
 Several examples of PaddleDetection models used in Serving are given in this folder
+All examples support TensorRT.

 -[Faster RCNN](./faster_rcnn_r50_fpn_1x_coco)
 -[PPYOLO](./ppyolo_r50vd_dcn_1x_coco)

--- a/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README.md
+++ b/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README.md
@@ -13,6 +13,9 @@ tar xf faster_rcnn_r50_fpn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```

+This model support TensorRT, if you want a faster inference, please use `--use_trt`. 
+
+
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README_CN.md
+++ b/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README_CN.md
@@ -13,6 +13,7 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 tar xf faster_rcnn_r50_fpn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。

 ### 执行预测
 ```

--- a/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README.md
+++ b/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README.md
@@ -13,6 +13,8 @@ tar xf ppyolo_r50vd_dcn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```

+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
+
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README_CN.md
+++ b/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README_CN.md
@@ -14,6 +14,8 @@ tar xf ppyolo_r50vd_dcn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```

+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
+
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/ttfnet_darknet53_1x_coco/README.md
+++ b/python/examples/detection/ttfnet_darknet53_1x_coco/README.md
@@ -12,6 +12,7 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 tar xf ttfnet_darknet53_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.

 ### Perform prediction
 ```

--- a/python/examples/detection/ttfnet_darknet53_1x_coco/README_CN.md
+++ b/python/examples/detection/ttfnet_darknet53_1x_coco/README_CN.md
@@ -14,6 +14,8 @@ tar xf ttfnet_darknet53_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```

+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
+
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/yolov3_darknet53_270e_coco/README.md
+++ b/python/examples/detection/yolov3_darknet53_270e_coco/README.md
@@ -13,6 +13,8 @@ tar xf yolov3_darknet53_270e_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```

+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
+
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/yolov3_darknet53_270e_coco/README_CN.md
+++ b/python/examples/detection/yolov3_darknet53_270e_coco/README_CN.md
@@ -14,6 +14,8 @@ tar xf yolov3_darknet53_270e_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```

+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
+
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/pipeline/imagenet/README_CN.md
+++ b/python/examples/pipeline/imagenet/README_CN.md
 # Imagenet Pipeline WebService

-这里以 Uci 服务为例来介绍 Pipeline WebService 的使用。
+这里以 Imagenet 服务为例来介绍 Pipeline WebService 的使用。

 ## 获取模型
 ```
@@ -10,10 +10,11 @@ sh get_model.sh
 ## 启动服务

 ```
-python web_service.py &>log.txt &
+python resnet50_web_service.py &>log.txt &
 ```

 ## 测试
 ```
-curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
+python pipeline_rpc_client.py
 ```
+
--- a/python/examples/xpu/fit_a_line_xpu/README.md
+++ b/python/examples/xpu/fit_a_line_xpu/README.md
+# Fit a line prediction example
+
+([简体中文](./README_CN.md)|English)
+
+## Get data
+
+```shell
+sh get_data.sh
+```
+
+
+
+## RPC service
+
+### Start server
+
+```shell
+python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
+```
+
+### Client prediction
+
+The `paddlepaddle` package is used in `test_client.py`, and you may need to download the corresponding package(`pip install paddlepaddle`).
+
+``` shell
+python test_client.py uci_housing_client/serving_client_conf.prototxt
+```
+
+
+
+## HTTP service
+
+### Start server
+
+Start a web service with default web service hosting modules:
+``` shell
+python test_server.py
+```
+
+### Client prediction
+
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
--- a/python/examples/xpu/fit_a_line_xpu/README_CN.md
+++ b/python/examples/xpu/fit_a_line_xpu/README_CN.md
+# 线性回归预测服务示例
+
+(简体中文|[English](./README.md))
+
+## 获取数据
+
+```shell
+sh get_data.sh
+```
+
+
+
+## RPC服务
+
+### 开启服务端
+
+``` shell
+python test_server.py uci_housing_model/
+```
+
+也可以通过下面的一行代码开启默认RPC服务：
+
+```shell
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
+```
+
+### 客户端预测
+
+`test_client.py`中使用了`paddlepaddle`包，需要进行下载（`pip install paddlepaddle`）。
+
+``` shell
+python test_client.py uci_housing_client/serving_client_conf.prototxt
+```
+
+
+
+## HTTP服务
+
+### 开启服务端
+
+通过下面的一行代码开启默认web服务：
+
+``` shell
+python test_server.py
+```
+
+### 客户端预测
+
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
--- a/python/examples/xpu/fit_a_line_xpu/benchmark.py
+++ b/python/examples/xpu/fit_a_line_xpu/benchmark.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+from paddle_serving_client import Client
+from paddle_serving_client.utils import MultiThreadRunner
+from paddle_serving_client.utils import benchmark_args
+import time
+import paddle
+import sys
+import requests
+
+args = benchmark_args()
+
+
+def single_func(idx, resource):
+    if args.request == "rpc":
+        client = Client()
+        client.load_client_config(args.model)
+        client.connect([args.endpoint])
+        train_reader = paddle.batch(
+            paddle.reader.shuffle(
+                paddle.dataset.uci_housing.train(), buf_size=500),
+            batch_size=1)
+        start = time.time()
+        for data in train_reader():
+            fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["price"])
+        end = time.time()
+        return [[end - start]]
+    elif args.request == "http":
+        train_reader = paddle.batch(
+            paddle.reader.shuffle(
+                paddle.dataset.uci_housing.train(), buf_size=500),
+            batch_size=1)
+        start = time.time()
+        for data in train_reader():
+            r = requests.post(
+                'http://{}/uci/prediction'.format(args.endpoint),
+                data={"x": data[0]})
+        end = time.time()
+        return [[end - start]]
+
+
+multi_thread_runner = MultiThreadRunner()
+result = multi_thread_runner.run(single_func, args.thread, {})
+print(result)
--- a/python/examples/xpu/fit_a_line_xpu/get_data.sh
+++ b/python/examples/xpu/fit_a_line_xpu/get_data.sh
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+tar -xzf uci_housing.tar.gz
--- a/python/examples/xpu/fit_a_line_xpu/local_train.py
+++ b/python/examples/xpu/fit_a_line_xpu/local_train.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+import sys
+import paddle
+import paddle.fluid as fluid
+paddle.enable_static()
+train_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.train(), buf_size=500),
+    batch_size=16)
+
+test_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.test(), buf_size=500),
+    batch_size=16)
+
+x = fluid.data(name='x', shape=[None, 13], dtype='float32')
+y = fluid.data(name='y', shape=[None, 1], dtype='float32')
+
+y_predict = fluid.layers.fc(input=x, size=1, act=None)
+cost = fluid.layers.square_error_cost(input=y_predict, label=y)
+avg_loss = fluid.layers.mean(cost)
+sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
+sgd_optimizer.minimize(avg_loss)
+
+place = fluid.CPUPlace()
+feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+
+import paddle_serving_client.io as serving_io
+
+for pass_id in range(30):
+    for data_train in train_reader():
+        avg_loss_value, = exe.run(fluid.default_main_program(),
+                                  feed=feeder.feed(data_train),
+                                  fetch_list=[avg_loss])
+
+serving_io.save_model("uci_housing_model", "uci_housing_client", {"x": x},
+                      {"price": y_predict}, fluid.default_main_program())
--- a/python/examples/xpu/fit_a_line_xpu/test_client.py
+++ b/python/examples/xpu/fit_a_line_xpu/test_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+from paddle_serving_client import Client
+import sys
+import numpy as np
+
+client = Client()
+client.load_client_config(sys.argv[1])
+client.connect(["127.0.0.1:9393"])
+
+import paddle
+test_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.test(), buf_size=500),
+    batch_size=1)
+
+for data in test_reader():
+    new_data = np.zeros((1, 1, 13)).astype("float32")
+    new_data[0] = data[0][0]
+    fetch_map = client.predict(
+        feed={"x": new_data}, fetch=["price"], batch=True)
+    print(fetch_map)
--- a/python/examples/xpu/fit_a_line_xpu/test_multi_process_client.py
+++ b/python/examples/xpu/fit_a_line_xpu/test_multi_process_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+from paddle_serving_client.utils import MultiThreadRunner
+import paddle
+import numpy as np
+
+
+def single_func(idx, resource):
+    client = Client()
+    client.load_client_config(
+        "./uci_housing_client/serving_client_conf.prototxt")
+    client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
+    x = [
+        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
+        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
+    ]
+    x = np.array(x)
+    for i in range(1000):
+        fetch_map = client.predict(feed={"x": x}, fetch=["price"])
+        if fetch_map is None:
+            return [[None]]
+    return [[0]]
+
+
+multi_thread_runner = MultiThreadRunner()
+thread_num = 4
+result = multi_thread_runner.run(single_func, thread_num, {})
+if None in result[0]:
+    exit(1)
--- a/python/examples/xpu/fit_a_line_xpu/test_server.py
+++ b/python/examples/xpu/fit_a_line_xpu/test_server.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+from paddle_serving_server_gpu.web_service import WebService
+import numpy as np
+
+
+class UciService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
+
+
+uci_service = UciService(name="uci")
+uci_service.load_model_config("uci_housing_model")
+uci_service.prepare_server(workdir="workdir", port=9393, use_lite=True, use_xpu=True, ir_optim=True)
+uci_service.run_rpc_service()
+uci_service.run_web_service()
--- a/python/examples/xpu/resnet_v2_50_xpu/README.md
+++ b/python/examples/xpu/resnet_v2_50_xpu/README.md
+# Image Classification
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --port 9393 --use_lite --use_xpu --ir_optim
+```
+
+### Client Prediction
+
+```
+python resnet50_v2_client.py
+```
--- a/python/examples/xpu/resnet_v2_50_xpu/README_CN.md
+++ b/python/examples/xpu/resnet_v2_50_xpu/README_CN.md
+# 图像分类
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --port 9393 --use_lite --use_xpu --ir_optim
+```
+
+### 客户端预测
+
+```
+python resnet50_v2_client.py
+```
--- a/python/examples/xpu/resnet_v2_50_xpu/daisy.jpg
+++ b/python/examples/xpu/resnet_v2_50_xpu/daisy.jpg
--- a/python/examples/xpu/resnet_v2_50_xpu/localpredict.py
+++ b/python/examples/xpu/resnet_v2_50_xpu/localpredict.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+from paddle_serving_app.local_predict import LocalPredictor
+import sys
+
+predictor = LocalPredictor()
+predictor.load_model_config(sys.argv[1], use_lite=True, use_xpu=True, ir_optim=True)
+
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = predictor.predict(feed={"image": img}, fetch=["score"])
+print(fetch_map["score"].reshape(-1))
--- a/python/examples/xpu/resnet_v2_50_xpu/resnet50_client.py
+++ b/python/examples/xpu/resnet_v2_50_xpu/resnet50_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+
+client = Client()
+client.load_client_config(
+    "resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])
+
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"image": img}, fetch=["score"])
+print(fetch_map["score"].reshape(-1))
--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -152,8 +152,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
-            with open(args.model + "/key", "w") as f:
+            key = base64.b64decode(post_data["key"].encode())
+            with open(args.model + "/key", "wb") as f:
                f.write(key)
            return True

@@ -161,8 +161,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
-            with open(args.model + "/key", "r") as f:
+            key = base64.b64decode(post_data["key"].encode())
+            with open(args.model + "/key", "rb") as f:
                cur_key = f.read()
            return (key == cur_key)

@@ -203,7 +203,7 @@ class MainService(BaseHTTPRequestHandler):
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
-        self.wfile.write(json.dumps(response))
+        self.wfile.write(json.dumps(response).encode())


 if __name__ == "__main__":