Merge pull request #1037 from wangjiawei04/v0.5.0

cherry-pick #1036 #1035 #1034 #1033

Merge pull request #1037 from wangjiawei04/v0.5.0
cherry-pick #1036 #1035 #1034 #1033
c1e9e00c · Jiawei Wang · GitHub · 7ecd1c1d · 07dce1d9 · c1e9e00c
20 changed file
--- a/doc/DESIGN_DOC.md
+++ b/doc/DESIGN_DOC.md
--- a/doc/DESIGN_DOC_CN.md
+++ b/doc/DESIGN_DOC_CN.md
--- a/doc/INFERENCE_TO_SERVING.md
+++ b/doc/INFERENCE_TO_SERVING.md
-# How to Convert Paddle Inference Model To Paddle Serving Format
-([简体中文](./INFERENCE_TO_SERVING_CN.md)|English)
-you can use a build-in python module called `paddle_serving_client.convert` to convert it.
-```python
-python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
-```
-Arguments are the same as `inference_model_to_serving` API.
-| Argument | Type | Default | Description |
-|--------------|------|-----------|--------------------------------|
-| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
-| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
-| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
-| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
-| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
--- a/doc/INFERENCE_TO_SERVING_CN.md
+++ b/doc/INFERENCE_TO_SERVING_CN.md
-# 如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型
-([English](./INFERENCE_TO_SERVING.md)|简体中文)
-你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
-```python
-python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
-```
-模块参数与`inference_model_to_serving`接口参数相同。
-| 参数 | 类型 | 默认值 | 描述 |
-|--------------|------|-----------|--------------------------------|
-| `dirname` | str | - | 需要转换的模型文件存储路径，Program结构文件和参数文件均保存在此目录。|
-| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
-| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
-| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None，则使用 `__model__` 作为默认的文件名 |
-| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的>二进制文件中，它才需要被指定。如果模型参数是存储在各自分离的文件中，设置它的值为None |
--- a/doc/DESIGN.md
+++ b/doc/DESIGN.md
--- a/doc/DESIGN_CN.md
+++ b/doc/DESIGN_CN.md
--- a/python/examples/bert/README.md
+++ b/python/examples/bert/README.md
@@ -11,14 +11,16 @@ This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubd
 Install paddlehub first
 ```
-pip install paddlehub
+pip3 install paddlehub
 ```
 run 
 ```
-python prepare_model.py 128
+python3 prepare_model.py 128
 ```
+**PaddleHub only support Python 3.5+**
 the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
 the config file and model file for server side are saved in the folder bert_seq128_model.
 the config file generated for client side is saved in the folder bert_seq128_client.
@@ -28,8 +30,9 @@ You can also download the above model from BOS(max_seq_len=128). After decompres
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
 ```
-if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'.
 ### Getting Dict and Sample Dataset

--- a/python/examples/bert/README_CN.md
+++ b/python/examples/bert/README_CN.md
@@ -10,11 +10,11 @@
 示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。
 请先安装paddlehub
 ```
-pip install paddlehub
+pip3 install paddlehub
 ```
 执行
 ```
-python prepare_model.py 128
+python3 prepare_model.py 128
 ```
 参数128表示BERT模型中的max_seq_len，即预处理后的样本长度。
 生成server端配置文件与模型文件，存放在bert_seq128_model文件夹。
@@ -25,9 +25,9 @@ python prepare_model.py 128
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
+mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
+mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
 ```
-若使用bert_chinese_L-12_H-768_A-12_model模型，将下面命令中的bert_seq128_model字段替换为bert_chinese_L-12_H-768_A-12_model，bert_seq128_client字段替换为bert_chinese_L-12_H-768_A-12_client.
 ### 获取词典和样例数据

--- a/python/examples/detection/README.md
+++ b/python/examples/detection/README.md
@@ -12,6 +12,7 @@ Paddle Detection provides a large number of [Model Zoo](https://github.com/Paddl
 ### Serving example
 Several examples of PaddleDetection models used in Serving are given in this folder
+All examples support TensorRT.
 -[Faster RCNN](./faster_rcnn_r50_fpn_1x_coco)
 -[PPYOLO](./ppyolo_r50vd_dcn_1x_coco)

--- a/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README.md
+++ b/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README.md
@@ -13,6 +13,9 @@ tar xf faster_rcnn_r50_fpn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`. 
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README_CN.md
+++ b/python/examples/detection/faster_rcnn_r50_fpn_1x_coco/README_CN.md
@@ -13,6 +13,7 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 tar xf faster_rcnn_r50_fpn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```

--- a/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README.md
+++ b/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README.md
@@ -13,6 +13,8 @@ tar xf ppyolo_r50vd_dcn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README_CN.md
+++ b/python/examples/detection/ppyolo_r50vd_dcn_1x_coco/README_CN.md
@@ -14,6 +14,8 @@ tar xf ppyolo_r50vd_dcn_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/ttfnet_darknet53_1x_coco/README.md
+++ b/python/examples/detection/ttfnet_darknet53_1x_coco/README.md
@@ -12,6 +12,7 @@ wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/
 tar xf ttfnet_darknet53_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
 ### Perform prediction
 ```

--- a/python/examples/detection/ttfnet_darknet53_1x_coco/README_CN.md
+++ b/python/examples/detection/ttfnet_darknet53_1x_coco/README_CN.md
@@ -14,6 +14,8 @@ tar xf ttfnet_darknet53_1x_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/yolov3_darknet53_270e_coco/README.md
+++ b/python/examples/detection/yolov3_darknet53_270e_coco/README.md
@@ -13,6 +13,8 @@ tar xf yolov3_darknet53_270e_coco.tar
 python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0
 ```
+This model support TensorRT, if you want a faster inference, please use `--use_trt`.
 ### Perform prediction
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/detection/yolov3_darknet53_270e_coco/README_CN.md
+++ b/python/examples/detection/yolov3_darknet53_270e_coco/README_CN.md
@@ -14,6 +14,8 @@ tar xf yolov3_darknet53_270e_coco.tar
 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
+该模型支持TensorRT，如果想要更快的预测速度，可以开启`--use_trt`选项。
 ### 执行预测
 ```
 python test_client.py 000000570688.jpg

--- a/python/examples/pipeline/imagenet/README_CN.md
+++ b/python/examples/pipeline/imagenet/README_CN.md
 # Imagenet Pipeline WebService
-这里以 Uci 服务为例来介绍 Pipeline WebService 的使用。
+这里以 Imagenet 服务为例来介绍 Pipeline WebService 的使用。
 ## 获取模型
 ```
@@ -10,10 +10,11 @@ sh get_model.sh
 ## 启动服务
 ```
-python web_service.py &>log.txt &
+python resnet50_web_service.py &>log.txt &
 ```
 ## 测试
 ```
-curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
+python pipeline_rpc_client.py
 ```
--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -152,8 +152,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
+            key = base64.b64decode(post_data["key"].encode())
-            with open(args.model + "/key", "w") as f:
+            with open(args.model + "/key", "wb") as f:
                f.write(key)
            return True
@@ -161,8 +161,8 @@ class MainService(BaseHTTPRequestHandler):
        if "key" not in post_data:
            return False
        else:
-            key = base64.b64decode(post_data["key"])
+            key = base64.b64decode(post_data["key"].encode())
-            with open(args.model + "/key", "r") as f:
+            with open(args.model + "/key", "rb") as f:
                cur_key = f.read()
            return (key == cur_key)
@@ -203,7 +203,7 @@ class MainService(BaseHTTPRequestHandler):
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
-        self.wfile.write(json.dumps(response))
+        self.wfile.write(json.dumps(response).encode())
 if __name__ == "__main__":

--- a/python/pipeline/channel.py
+++ b/python/pipeline/channel.py
@@ -767,7 +767,7 @@ class ThreadChannel(Queue.PriorityQueue):
            while self._stop is False and self._consumer_cursors[
                    op_name] - self._base_cursor >= len(self._output_buf):
                try:
-                    channeldata = self.get(timeout=0)
+                    channeldata = self.get(timeout=0)[1]
                    self._output_buf.append(channeldata)
                    list_values = list(channeldata.values())
                    _LOGGER.debug(