Merge pull request #1 from PaddlePaddle/develop

update from origin

Merge pull request #1 from PaddlePaddle/develop
update from origin
fa19afc1 · Thomas Young · GitHub · eeffb9cf · e36e9b1b · fa19afc1
89 changed file
--- a/doc/DOCKER_IMAGES.md
+++ b/doc/DOCKER_IMAGES.md
@@ -24,7 +24,6 @@ You can get images in two ways:
   ```
 ## Image description
@@ -40,3 +39,13 @@ Runtime images cannot be used for compilation.
 |              GPU (cuda10.0-cudnn7) development               | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
 |     CPU development (Used to compile packages on Ubuntu)     | CentOS6 |            <None>            | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
 | GPU (cuda9.0-cudnn7) development (Used to compile packages on Ubuntu) | CentOS6 |            <None>            | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
+## Requirements for running CUDA containers
+Running a CUDA container requires a machine with at least one CUDA-capable GPU and a driver compatible with the CUDA toolkit version you are using. 
+The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit doesn't have to be installed.
+For the relationship between CUDA toolkit version, Driver version and GPU architecture, please refer to [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA).
--- a/doc/DOCKER_IMAGES_CN.md
+++ b/doc/DOCKER_IMAGES_CN.md
@@ -22,7 +22,6 @@
   ```shell
   docker build -t <image-name>:<images-tag> .
   ```
@@ -40,3 +39,13 @@
 | GPU (cuda10.0-cudnn7) 开发镜像                     | CentOS7  | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
 | CPU 开发镜像 (用于编译 Ubuntu 包)                  | CentOS6  | <无>                         | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
 | GPU (cuda9.0-cudnn7) 开发镜像 (用于编译 Ubuntu 包) | CentOS6  | <无>                         | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
+## 运行CUDA容器的要求
+运行CUDA容器需要至少具有一个支持CUDA的GPU以及与您所使用的CUDA工具包版本兼容的驱动程序。
+运行CUDA容器的机器**只需要相应的NVIDIA驱动程序**，而CUDA工具包不是必要的。
+相关CUDA工具包版本、驱动版本和GPU架构的关系请参阅 [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)。
--- a/doc/FAQ.md
+++ b/doc/FAQ.md
@@ -4,6 +4,35 @@
 ## 基础知识
+#### Q: Paddle Serving 、Paddle Inference、PaddleHub Serving三者的区别及联系？
+**A:** paddle serving是远程服务，即发起预测的设备（手机、浏览器、客户端等）与实际预测的硬件不在一起。	paddle inference是一个library，适合嵌入到一个大系统中保证预测效率，paddle serving调用了paddle       inference做远程服务。paddlehub serving可以认为是一个示例，都会使用paddle serving作为统一预测服务入口。如果在web端交互，一般是调用远程服务的形式，可以使用paddle serving的web service搭建。
+#### Q: paddle-serving是否支持Int32支持
+**A:** 在protobuf定feed_type和fetch_type编号与数据类型对应如下
+     0-int64
+	  1-float32
+	  2-int32
+#### Q: paddle-serving是否支持windows和Linux环境下的多线程调用 
+**A:** 客户端可以发起多线程访问调用服务端 
+#### Q: paddle-serving如何修改消息大小限制
+**A:** 在server端和client但通过FLAGS_max_body_size来扩大数据量限制，单位为字节，默认为64MB
+#### Q: paddle-serving客户端目前支持哪些语言
+**A:** java c++ python 
+#### Q: paddle-serving目前支持哪些协议
+**A:** http rpc 
 ## 编译问题
@@ -12,6 +41,10 @@
 **A:** 通过pip命令安装自己编译出的whl包，并设置SERVING_BIN环境变量为编译出的serving二进制文件路径。
+#### Q: 使用Java客户端，mvn compile过程出现"No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK?"错误
+**A:** 没有安装JDK，或者JAVA_HOME路径配置错误（正确配置是JDK路径，常见错误配置成JRE路径，例如正确路径参考JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64/"）。Java JDK安装参考https://segmentfault.com/a/1190000015389941
 ## 部署问题
@@ -46,7 +79,15 @@ InvalidArgumentError: Device id must be less than GPU count, but received id is:
 **A:** 目前（0.4.0）仅支持CentOS，具体列表查阅[这里](https://github.com/PaddlePaddle/Serving/blob/develop/doc/DOCKER_IMAGES.md)
+#### Q: python编译的GCC版本与serving的版本不匹配
+**A:**:1)使用[GPU docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md#gpunvidia-docker)解决环境问题
+	   2)修改anaconda的虚拟环境下安装的python的gcc版本[参考](https://www.jianshu.com/p/c498b3d86f77) 
+#### Q: paddle-serving是否支持本地离线安装 
+**A:** 支持离线部署，需要把一些相关的[依赖包](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md)提前准备安装好
 ## 预测问题
@@ -105,6 +146,19 @@ client端的日志直接打印到标准输出。
 通过在部署服务之前 'export  GLOG_v=3'可以输出更为详细的日志信息。
+#### Q: paddle-serving启动成功后，相关的日志在哪里设置
+**A:** 1)警告是glog组件打印的，告知glog初始化之前日志打印在STDERR
+	   2)一般采用GLOG_v方式启动服务同时设置日志级别。
+例如：
+```
+GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999 
+```
 #### Q: （GLOG_v=2下）Server端日志一切正常，但Client端始终得不到正确的预测结果
 **A:** 可能是配置文件有问题，检查下配置文件（is_load_tensor，fetch_type等有没有问题）

--- a/doc/GRPC_IMPL_CN.md
+++ b/doc/GRPC_IMPL_CN.md
-# gRPC接口
+# gRPC接口使用介绍
+  - [1.与bRPC接口对比](#1与brpc接口对比)
+      - [1.1 服务端对比](#11-服务端对比)
+      - [1.2 客服端对比](#12-客服端对比)
+      - [1.3 其他](#13-其他)
+  - [2.示例：线性回归预测服务](#2示例线性回归预测服务)
+      - [获取数据](#获取数据)
+      - [开启 gRPC 服务端](#开启-grpc-服务端)
+    - [客户端预测](#客户端预测)
+      - [同步预测](#同步预测)
+      - [异步预测](#异步预测)
+      - [Batch 预测](#batch-预测)
+      - [通用 pb 预测](#通用-pb-预测)
+      - [预测超时](#预测超时)
+      - [List 输入](#list-输入)
+  - [3.更多示例](#3更多示例)
+使用gRPC接口，Client端可以在Win/Linux/MacOS平台上调用不同语言。gRPC 接口实现结构如下：
+![](https://github.com/PaddlePaddle/Serving/blob/develop/doc/grpc_impl.png)
+## 1.与bRPC接口对比
+#### 1.1 服务端对比
+* gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数：
-gRPC 接口实现形式类似 Web Service：
+   ```
-![](grpc_impl.png)
-## 与bRPC接口对比
-1. gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数：
-   ```python
   def load_model_config(self, server_config_paths, client_config_path=None)
   ```
+    在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能不同（如 在cube local 中，Client 端的数据先交给 cube，经过 cube 处理后再交给预测库），此时 gRPC Server 端需要手动设置 gRPC Client 端的配置`client_config_path`。
+    **`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`。**
-   在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能是不同的（如 cube local 例子中，Client 端的数据先交给 cube，经过 cube 处理后再交给预测库），所以 gRPC Server 端需要获取 gRPC Client 端的配置；同时为了取消 gRPC Client 端手动加载配置文件的过程，所以设计 gRPC Server 端同时加载两个配置文件。`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`。
+#### 1.2 客服端对比
-2. gRPC Client 端取消 `load_client_config` 步骤：
+* gRPC Client 端取消 `load_client_config` 步骤：
   在 `connect` 步骤通过 RPC 获取相应的 prototxt（从任意一个 endpoint 获取即可）。
-3. gRPC Client 需要通过 RPC 方式设置 timeout 时间（调用形式与 bRPC Client保持一致）
+* gRPC Client 需要通过 RPC 方式设置 timeout 时间（调用形式与 bRPC Client保持一致）
   因为 bRPC Client 在 `connect` 后无法更改 timeout 时间，所以当 gRPC Server 收到变更 timeout 的调用请求时会重新创建 bRPC Client 实例以变更 bRPC Client timeout时间，同时 gRPC Client 会设置 gRPC 的 deadline 时间。
   **注意，设置 timeout 接口和 Inference 接口不能同时调用（非线程安全），出于性能考虑暂时不加锁。**
-4. gRPC Client 端 `predict` 函数添加 `asyn` 和 `is_python` 参数：
+* gRPC Client 端 `predict` 函数添加 `asyn` 和 `is_python` 参数：
-   ```python
+   ```
   def predict(self, feed, fetch, need_variant_tag=False, asyn=False, is_python=True)
   ```
-   其中，`asyn` 为异步调用选项。当 `asyn=True` 时为异步调用，返回 `MultiLangPredictFuture` 对象，通过 `MultiLangPredictFuture.result()` 阻塞获取预测值；当 `asyn=Fasle` 为同步调用。
+1.    `asyn` 为异步调用选项。当 `asyn=True` 时为异步调用，返回 `MultiLangPredictFuture` 对象，通过 `MultiLangPredictFuture.result()` 阻塞获取预测值；当 `asyn=Fasle` 为同步调用。
+2.    `is_python` 为 proto 格式选项。当 `is_python=True` 时，基于 numpy bytes 格式进行数据传输，目前只适用于 Python；当 `is_python=False` 时，以普通数据格式传输，更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多（详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654)）。
+#### 1.3 其他
+* 异常处理：当 gRPC Server 端的 bRPC Client 预测失败（返回 `None`）时，gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获，并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常（参考 timeout 样例）。
+* 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略，ABTEST 特性还未打齐。
+* 系统兼容性：
+    * [x]  CentOS
+    * [x]  macOS
+    * [x]  Windows
+* 已经支持的客户端语言：
+   -  Python
+   -  Java
+   -  Go
+## 2.示例：线性回归预测服务
+以下是采用gRPC实现的关于线性回归预测的一个示例，具体代码详见此[链接](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example/fit_a_line)
+#### 获取数据
+```shell
+sh get_data.sh
+```
+#### 开启 gRPC 服务端
+``` shell
+python test_server.py uci_housing_model/
+```
+也可以通过下面的一行代码开启默认 gRPC 服务：
+```shell
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang
+```
+注：--use_multilang参数用来启用多语言客户端
+### 客户端预测
+#### 同步预测
+``` shell
+python test_sync_client.py
+```
+#### 异步预测
+``` shell
+python test_asyn_client.py
+```
+#### Batch 预测
+``` shell
+python test_batch_client.py
+```
-   `is_python` 为 proto 格式选项。当 `is_python=True` 时，基于 numpy bytes 格式进行数据传输，目前只适用于 Python；当 `is_python=False` 时，以普通数据格式传输，更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多（详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654)）。
+#### 通用 pb 预测
-5. 异常处理：当 gRPC Server 端的 bRPC Client 预测失败（返回 `None`）时，gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获，并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常（参考 timeout 样例）。
+``` shell
+python test_general_pb_client.py
+```
-6. 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略，ABTEST 特性还未打齐。
+#### 预测超时
-7. 经测试，gRPC 版本可以在 Windows、macOS 平台使用。
+``` shell
+python test_timeout_client.py
+```
-8. 计划支持的客户端语言：
+#### List 输入
-   - [x] Python
+``` shell
-   - [ ] Java
+python test_list_input_client.py
-   - [ ] Go
+```
-   - [ ] JavaScript
-## Python 端的一些例子 
+## 3.更多示例
-详见 `python/examples/grpc_impl_example` 下的示例文件。
+详见[`python/examples/grpc_impl_example`](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example)下的示例文件。
--- a/doc/INFERENCE_TO_SERVING.md
+++ b/doc/INFERENCE_TO_SERVING.md
@@ -24,13 +24,13 @@ inference_model_dir = "your_inference_model"
 serving_client_dir = "serving_client_dir"
 serving_server_dir = "serving_server_dir"
 feed_var_names, fetch_var_names = inference_model_to_serving(
-		inference_model_dir, serving_client_dir, serving_server_dir)
+		inference_model_dir, serving_server_dir, serving_client_dir)
 ```
 if your model file and params file are both standalone, please use the following api.
 ```
 feed_var_names, fetch_var_names = inference_model_to_serving(
-		inference_model_dir, serving_client_dir, serving_server_dir,
+		inference_model_dir, serving_server_dir, serving_client_dir,
 		model_filename="model", params_filename="params")
 ```
--- a/doc/INFERENCE_TO_SERVING_CN.md
+++ b/doc/INFERENCE_TO_SERVING_CN.md
@@ -23,11 +23,11 @@ inference_model_dir = "your_inference_model"
 serving_client_dir = "serving_client_dir"
 serving_server_dir = "serving_server_dir"
 feed_var_names, fetch_var_names = inference_model_to_serving(
-		inference_model_dir, serving_client_dir, serving_server_dir)
+		inference_model_dir, serving_server_dir, serving_client_dir)
 ```
 如果模型中有模型描述文件`model_filename` 和 模型参数文件`params_filename`，那么请用
 ```
 feed_var_names, fetch_var_names = inference_model_to_serving(
-		inference_model_dir, serving_client_dir, serving_server_dir,
+		inference_model_dir, serving_server_dir, serving_client_dir,
 		 model_filename="model", params_filename="params")
 ```
--- a/java/examples/pom.xml
+++ b/java/examples/pom.xml
@@ -75,7 +75,7 @@
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
-            <version>4.11</version>
+            <version>4.13.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>

--- a/python/examples/bert/bert_client.py
+++ b/python/examples/bert/bert_client.py
@@ -23,7 +23,7 @@ args = benchmark_args()
 reader = ChineseBertReader({"max_seq_len": 128})
 fetch = ["pooled_output"]
-endpoint_list = ['127.0.0.1:8861']
+endpoint_list = ['127.0.0.1:9292']
 client = Client()
 client.load_client_config(args.model)
 client.connect(endpoint_list)
@@ -33,5 +33,5 @@ for line in sys.stdin:
    for key in feed_dict.keys():
        feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
    #print(feed_dict)
-    result = client.predict(feed=feed_dict, fetch=fetch)
+    result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
 print(result)
--- a/python/examples/bert/bert_web_service.py
+++ b/python/examples/bert/bert_web_service.py
@@ -29,13 +29,14 @@ class BertService(WebService):
    def preprocess(self, feed=[], fetch=[]):
        feed_res = []
+        is_batch = False
        for ins in feed:
            feed_dict = self.reader.process(ins["words"].encode("utf-8"))
            for key in feed_dict.keys():
                feed_dict[key] = np.array(feed_dict[key]).reshape(
-                    (1, len(feed_dict[key]), 1))
+                    (len(feed_dict[key]), 1))
            feed_res.append(feed_dict)
-        return feed_res, fetch
+        return feed_res, fetch, is_batch
 bert_service = BertService(name="bert")

--- a/python/examples/faster_rcnn_model/000000570688.jpg
+++ b/python/examples/faster_rcnn_model/000000570688.jpg
--- a/python/examples/faster_rcnn_model/label_list.txt
+++ b/python/examples/faster_rcnn_model/label_list.txt
--- a/python/examples/ocr_detection/text_det_client.py
+++ b/python/examples/ocr_detection/text_det_client.py
@@ -12,36 +12,29 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import os
 from paddle_serving_client import Client
-from paddle_serving_app.reader import Sequential, File2Image, ResizeByFactor
+from paddle_serving_app.reader import *
-from paddle_serving_app.reader import Div, Normalize, Transpose
+import numpy as np
-from paddle_serving_app.reader import DBPostProcess, FilterBoxes
-client = Client()
-client.load_client_config("ocr_det_client/serving_client_conf.prototxt")
-client.connect(["127.0.0.1:9494"])
-read_image_file = File2Image()
 preprocess = Sequential([
-    ResizeByFactor(32, 960), Div(255),
+    File2Image(), BGR2RGB(), Div(255.0),
-    Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
+    Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
-        (2, 0, 1))
+    Resize(800, 1333), Transpose((2, 0, 1)), PadStride(32)
 ])
-post_func = DBPostProcess({
+postprocess = RCNNPostprocess("label_list.txt", "output")
-    "thresh": 0.3,
+client = Client()
-    "box_thresh": 0.5,
+client.load_client_config("serving_client/serving_client_conf.prototxt")
-    "max_candidates": 1000,
+client.connect(['127.0.0.1:9292'])
-    "unclip_ratio": 1.5,
+im = preprocess('000000570688.jpg')
-    "min_size": 3
+fetch_map = client.predict(
-})
+    feed={
-filter_func = FilterBoxes(10, 10)
+        "image": im,
+        "im_info": np.array(list(im.shape[1:]) + [1.0]),
-img = read_image_file(name)
+        "im_shape": np.array(list(im.shape[1:]) + [1.0])
-ori_h, ori_w, _ = img.shape
+    },
-img = preprocess(img)
+    fetch=["multiclass_nms_0.tmp_0"],
-new_h, new_w, _ = img.shape
+    batch=False)
-ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
+fetch_map["image"] = '000000570688.jpg'
-outputs = client.predict(feed={"image": img}, fetch=["concat_1.tmp_0"])
+print(fetch_map)
-dt_boxes_list = post_func(outputs["concat_1.tmp_0"], [ratio_list])
+postprocess(fetch_map)
-dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
+print(fetch_map)
--- a/python/examples/faster_rcnn/000000570688.jpg
+++ b/python/examples/faster_rcnn/000000570688.jpg
--- a/python/examples/faster_rcnn_model/000000570688_bbox.jpg
+++ b/python/examples/faster_rcnn_model/000000570688_bbox.jpg
--- a/python/examples/faster_rcnn_model/README.md
+++ b/python/examples/faster_rcnn_model/README.md
--- a/python/examples/faster_rcnn_model/README_CN.md
+++ b/python/examples/faster_rcnn_model/README_CN.md
--- a/python/examples/faster_rcnn/benchmark.py
+++ b/python/examples/faster_rcnn/benchmark.py
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+from __future__ import unicode_literals, absolute_import
+import os
+import sys
+import time
+import json
+import requests
+from paddle_serving_client import Client
+from paddle_serving_client.utils import MultiThreadRunner
+from paddle_serving_client.utils import benchmark_args, show_latency
+from paddle_serving_app.reader import ChineseBertReader
+from paddle_serving_app.reader import *
+import numpy as np
+args = benchmark_args()
+def single_func(idx, resource):
+    img = "./000000570688.jpg"
+    profile_flags = False
+    latency_flags = False
+    if os.getenv("FLAGS_profile_client"):
+        profile_flags = True
+    if os.getenv("FLAGS_serving_latency"):
+        latency_flags = True
+        latency_list = []
+    if args.request == "rpc":
+        preprocess = Sequential([
+            File2Image(), BGR2RGB(), Div(255.0),
+            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
+            Resize(640, 640), Transpose((2, 0, 1))
+        ])
+        postprocess = RCNNPostprocess("label_list.txt", "output")
+        client = Client()
+        client.load_client_config(args.model)
+        client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
+        start = time.time()
+        for i in range(turns):
+            if args.batch_size >= 1:
+                l_start = time.time()
+                feed_batch = []
+                b_start = time.time()
+                im = preprocess(img)
+                for bi in range(args.batch_size):
+                    print("1111batch")
+                    print(bi)
+                    feed_batch.append({
+                        "image": im,
+                        "im_info": np.array(list(im.shape[1:]) + [1.0]),
+                        "im_shape": np.array(list(im.shape[1:]) + [1.0])
+                    })
+            # im = preprocess(img)
+                b_end = time.time()
+                if profile_flags:
+                    sys.stderr.write(
+                        "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
+                            os.getpid(),
+                            int(round(b_start * 1000000)),
+                            int(round(b_end * 1000000))))
+                #result = client.predict(feed=feed_batch, fetch=fetch)
+                fetch_map = client.predict(
+                    feed=feed_batch, fetch=["multiclass_nms"])
+                fetch_map["image"] = img
+                postprocess(fetch_map)
+                l_end = time.time()
+                if latency_flags:
+                    latency_list.append(l_end * 1000 - l_start * 1000)
+            else:
+                print("unsupport batch size {}".format(args.batch_size))
+    else:
+        raise ValueError("not implemented {} request".format(args.request))
+    end = time.time()
+    if latency_flags:
+        return [[end - start], latency_list]
+    else:
+        return [[end - start]]
+if __name__ == '__main__':
+    multi_thread_runner = MultiThreadRunner()
+    endpoint_list = ["127.0.0.1:7777"]
+    turns = 10
+    start = time.time()
+    result = multi_thread_runner.run(
+        single_func, args.thread, {"endpoint": endpoint_list,
+                                   "turns": turns})
+    end = time.time()
+    total_cost = end - start
+    avg_cost = 0
+    for i in range(args.thread):
+        avg_cost += result[0][i]
+    avg_cost = avg_cost / args.thread
+    print("total cost: {}s".format(total_cost))
+    print("each thread cost: {}s. ".format(avg_cost))
+    print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
+                                    total_cost))
+    if os.getenv("FLAGS_serving_latency"):
+        show_latency(result[1])
--- a/python/examples/faster_rcnn/benchmark.sh
+++ b/python/examples/faster_rcnn/benchmark.sh
+rm profile_log*
+export CUDA_VISIBLE_DEVICES=0
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+export FLAGS_serving_latency=1
+gpu_id=0
+#save cpu and gpu utilization log
+if [ -d utilization ];then
+    rm -rf utilization
+else
+    mkdir utilization
+fi
+#start server
+$PYTHONROOT/bin/python3 -m paddle_serving_server_gpu.serve --model $1 --port 7777 --thread 4 --gpu_ids 0  --ir_optim >  elog  2>&1 &
+sleep 5
+#warm up
+$PYTHONROOT/bin/python3 benchmark.py --thread 4 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+echo -e "import psutil\ncpu_utilization=psutil.cpu_percent(1,False)\nprint('CPU_UTILIZATION:', cpu_utilization)\n" > cpu_utilization.py
+for thread_num in 1 4 8 16
+do
+for batch_size in 1
+do
+    job_bt=`date '+%Y%m%d%H%M%S'`
+    nvidia-smi --id=0 --query-compute-apps=used_memory --format=csv -lms 100 > gpu_use.log 2>&1 &
+    nvidia-smi --id=0 --query-gpu=utilization.gpu --format=csv -lms 100 > gpu_utilization.log 2>&1 &
+    gpu_memory_pid=$!
+    $PYTHONROOT/bin/python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+    kill ${gpu_memory_pid}
+    kill `ps -ef|grep used_memory|awk '{print $2}'`
+    echo "model_name:" $1
+    echo "thread_num:" $thread_num
+    echo "batch_size:" $batch_size
+    echo "=================Done===================="
+    echo "model_name:$1" >> profile_log_$1
+    echo "batch_size:$batch_size" >> profile_log_$1
+    $PYTHONROOT/bin/python3 cpu_utilization.py >> profile_log_$1
+    job_et=`date '+%Y%m%d%H%M%S'`
+    awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "MAX_GPU_MEMORY:", max}' gpu_use.log >> profile_log_$1
+    awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "GPU_UTILIZATION:", max}' gpu_utilization.log >> profile_log_$1
+    rm -rf gpu_use.log gpu_utilization.log
+    $PYTHONROOT/bin/python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
+    tail -n 8 profile >> profile_log_$1
+    echo "" >> profile_log_$1
+done
+done
+#Divided log
+awk 'BEGIN{RS="\n\n"}{i++}{print > "bert_log_"i}' profile_log_$1
+mkdir bert_log && mv bert_log_* bert_log
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/faster_rcnn/label_list.txt
+++ b/python/examples/faster_rcnn/label_list.txt
+background
+person
+bicycle
+car
+motorcycle
+airplane
+bus
+train
+truck
+boat
+traffic light
+fire hydrant
+stop sign
+parking meter
+bench
+bird
+cat
+dog
+horse
+sheep
+cow
+elephant
+bear
+zebra
+giraffe
+backpack
+umbrella
+handbag
+tie
+suitcase
+frisbee
+skis
+snowboard
+sports ball
+kite
+baseball bat
+baseball glove
+skateboard
+surfboard
+tennis racket
+bottle
+wine glass
+cup
+fork
+knife
+spoon
+bowl
+banana
+apple
+sandwich
+orange
+broccoli
+carrot
+hot dog
+pizza
+donut
+cake
+chair
+couch
+potted plant
+bed
+dining table
+toilet
+tv
+laptop
+mouse
+remote
+keyboard
+cell phone
+microwave
+oven
+toaster
+sink
+refrigerator
+book
+clock
+vase
+scissors
+teddy bear
+hair drier
+toothbrush
--- a/python/examples/faster_rcnn_model/test_client.py
+++ b/python/examples/faster_rcnn_model/test_client.py
--- a/python/examples/imagenet/resnet50_rpc_client.py
+++ b/python/examples/imagenet/resnet50_rpc_client.py
@@ -38,7 +38,8 @@ start = time.time()
 image_file = "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"
 for i in range(10):
    img = seq(image_file)
-    fetch_map = client.predict(feed={"image": img}, fetch=["score"])
+    fetch_map = client.predict(
+        feed={"image": img}, fetch=["score"], batch=False)
    prob = max(fetch_map["score"][0])
    label = label_dict[fetch_map["score"][0].tolist().index(prob)].strip(
    ).replace(",", "")

--- a/python/examples/imagenet/resnet50_web_service.py
+++ b/python/examples/imagenet/resnet50_web_service.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 import sys
 from paddle_serving_client import Client
+import numpy as np
 from paddle_serving_app.reader import Sequential, URL2Image, Resize, CenterCrop, RGB2BGR, Transpose, Div, Normalize, Base64ToImage
 if len(sys.argv) != 4:
@@ -44,12 +44,13 @@ class ImageService(WebService):
    def preprocess(self, feed=[], fetch=[]):
        feed_batch = []
+        is_batch = True
        for ins in feed:
            if "image" not in ins:
                raise ("feed data error!")
            img = self.seq(ins["image"])
            feed_batch.append({"image": img[np.newaxis, :]})
-        return feed_batch, fetch
+        return feed_batch, fetch, is_batch
    def postprocess(self, feed=[], fetch=[], fetch_map={}):
        score_list = fetch_map["score"]

--- a/python/examples/imdb/benchmark.py
+++ b/python/examples/imdb/benchmark.py
@@ -18,7 +18,7 @@ import sys
 import time
 import requests
 import numpy as np
-from paddle_serving_app.reader import IMDBDataset
+from paddle_serving_app.reader.imdb_reader import IMDBDataset
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import MultiThreadRunner, benchmark_args, show_latency

--- a/python/examples/imdb/test_client.py
+++ b/python/examples/imdb/test_client.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
 from paddle_serving_client import Client
-from paddle_serving_app.reader import IMDBDataset
+from paddle_serving_app.reader.imdb_reader import IMDBDataset
 import sys
 import numpy as np

--- a/python/examples/imdb/text_classify_service.py
+++ b/python/examples/imdb/text_classify_service.py
@@ -14,7 +14,7 @@
 # pylint: disable=doc-string-missing
 from paddle_serving_server.web_service import WebService
-from paddle_serving_app.reader import IMDBDataset
+from paddle_serving_app.reader.imdb_reader import IMDBDataset
 import sys
 import numpy as np
@@ -29,13 +29,14 @@ class IMDBService(WebService):
    def preprocess(self, feed={}, fetch=[]):
        feed_batch = []
        words_lod = [0]
+        is_batch = True
        for ins in feed:
            words = self.dataset.get_words_only(ins["words"])
            words = np.array(words).reshape(len(words), 1)
            words_lod.append(words_lod[-1] + len(words))
            feed_batch.append(words)
        feed = {"words": np.concatenate(feed_batch), "words.lod": words_lod}
-        return feed, fetch
+        return feed, fetch, is_batch
 imdb_service = IMDBService(name="imdb")

--- a/python/examples/lac/lac_web_service.py
+++ b/python/examples/lac/lac_web_service.py
@@ -15,6 +15,7 @@
 from paddle_serving_server.web_service import WebService
 import sys
 from paddle_serving_app.reader import LACReader
+import numpy as np
 class LACService(WebService):
@@ -23,13 +24,21 @@ class LACService(WebService):
    def preprocess(self, feed={}, fetch=[]):
        feed_batch = []
+        fetch = ["crf_decode"]
+        lod_info = [0]
+        is_batch = True
        for ins in feed:
            if "words" not in ins:
                raise ("feed data error!")
            feed_data = self.reader.process(ins["words"])
-            feed_batch.append({"words": feed_data})
+            feed_batch.append(np.array(feed_data).reshape(len(feed_data), 1))
-        fetch = ["crf_decode"]
+            lod_info.append(lod_info[-1] + len(feed_data))
-        return feed_batch, fetch
+        feed_dict = {
+            "words": np.concatenate(
+                feed_batch, axis=0),
+            "words.lod": lod_info
+        }
+        return feed_dict, fetch, is_batch
    def postprocess(self, feed={}, fetch=[], fetch_map={}):
        batch_ret = []

--- a/python/examples/ocr/det_debugger_server.py
+++ b/python/examples/ocr/det_debugger_server.py
@@ -53,7 +53,9 @@ class OCRService(WebService):
        self.ori_h, self.ori_w, _ = im.shape
        det_img = self.det_preprocess(im)
        _, self.new_h, self.new_w = det_img.shape
-        return {"image": det_img[np.newaxis, :].copy()}, ["concat_1.tmp_0"]
+        return {
+            "image": det_img[np.newaxis, :].copy()
+        }, ["concat_1.tmp_0"], True
    def postprocess(self, feed={}, fetch=[], fetch_map=None):
        det_out = fetch_map["concat_1.tmp_0"]

--- a/python/examples/ocr/det_web_server.py
+++ b/python/examples/ocr/det_web_server.py
@@ -54,7 +54,7 @@ class OCRService(WebService):
        det_img = self.det_preprocess(im)
        _, self.new_h, self.new_w = det_img.shape
        print(det_img)
-        return {"image": det_img}, ["concat_1.tmp_0"]
+        return {"image": det_img}, ["concat_1.tmp_0"], False
    def postprocess(self, feed={}, fetch=[], fetch_map=None):
        det_out = fetch_map["concat_1.tmp_0"]

--- a/python/examples/ocr/ocr_debugger_server.py
+++ b/python/examples/ocr/ocr_debugger_server.py
@@ -42,10 +42,9 @@ class OCRService(WebService):
        self.det_client = LocalPredictor()
        if sys.argv[1] == 'gpu':
            self.det_client.load_model_config(
-                det_model_config, gpu=True, profile=False)
+                det_model_config, use_gpu=True, gpu_id=1)
        elif sys.argv[1] == 'cpu':
-            self.det_client.load_model_config(
+            self.det_client.load_model_config(det_model_config)
-                det_model_config, gpu=False, profile=False)
        self.ocr_reader = OCRReader()
    def preprocess(self, feed=[], fetch=[]):
@@ -58,7 +57,7 @@ class OCRService(WebService):
        det_img = det_img[np.newaxis, :]
        det_img = det_img.copy()
        det_out = self.det_client.predict(
-            feed={"image": det_img}, fetch=["concat_1.tmp_0"])
+            feed={"image": det_img}, fetch=["concat_1.tmp_0"], batch=True)
        filter_func = FilterBoxes(10, 10)
        post_func = DBPostProcess({
            "thresh": 0.3,
@@ -91,7 +90,7 @@ class OCRService(WebService):
            imgs[id] = norm_img
        feed = {"image": imgs.copy()}
        fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
-        return feed, fetch
+        return feed, fetch, True
    def postprocess(self, feed={}, fetch=[], fetch_map=None):
        rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
@@ -107,7 +106,8 @@ ocr_service.load_model_config("ocr_rec_model")
 ocr_service.prepare_server(workdir="workdir", port=9292)
 ocr_service.init_det_debugger(det_model_config="ocr_det_model")
 if sys.argv[1] == 'gpu':
-    ocr_service.run_debugger_service(gpu=True)
+    ocr_service.set_gpus("2")
+    ocr_service.run_debugger_service()
 elif sys.argv[1] == 'cpu':
    ocr_service.run_debugger_service()
 ocr_service.run_web_service()
--- a/python/examples/ocr/ocr_web_client.py
+++ b/python/examples/ocr/ocr_web_client.py
@@ -36,4 +36,5 @@ for img_file in os.listdir(test_img_dir):
    image = cv2_to_base64(image_data1)
    data = {"feed": [{"image": image}], "fetch": ["res"]}
    r = requests.post(url=url, headers=headers, data=json.dumps(data))
+    print(r)
    print(r.json())
--- a/python/examples/ocr/ocr_web_server.py
+++ b/python/examples/ocr/ocr_web_server.py
@@ -50,7 +50,7 @@ class OCRService(WebService):
        ori_h, ori_w, _ = im.shape
        det_img = self.det_preprocess(im)
        det_out = self.det_client.predict(
-            feed={"image": det_img}, fetch=["concat_1.tmp_0"])
+            feed={"image": det_img}, fetch=["concat_1.tmp_0"], batch=False)
        _, new_h, new_w = det_img.shape
        filter_func = FilterBoxes(10, 10)
        post_func = DBPostProcess({
@@ -77,10 +77,10 @@ class OCRService(WebService):
            max_wh_ratio = max(max_wh_ratio, wh_ratio)
        for img in img_list:
            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
-            feed = {"image": norm_img}
+            feed_list.append(norm_img[np.newaxis, :])
-            feed_list.append(feed)
+        feed_batch = {"image": np.concatenate(feed_list, axis=0)}
        fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
-        return feed_list, fetch
+        return feed_batch, fetch, True
    def postprocess(self, feed={}, fetch=[], fetch_map=None):
        rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)

--- a/python/examples/ocr/rec_debugger_server.py
+++ b/python/examples/ocr/rec_debugger_server.py
@@ -52,7 +52,7 @@ class OCRService(WebService):
            imgs[i] = norm_img
        feed = {"image": imgs.copy()}
        fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
-        return feed, fetch
+        return feed, fetch, True
    def postprocess(self, feed={}, fetch=[], fetch_map=None):
        rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)

--- a/python/examples/ocr/rec_web_server.py
+++ b/python/examples/ocr/rec_web_server.py
@@ -51,10 +51,17 @@ class OCRService(WebService):
            max_wh_ratio = max(max_wh_ratio, wh_ratio)
        for img in img_list:
            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
-            feed = {"image": norm_img}
+            #feed = {"image": norm_img}
-            feed_list.append(feed)
+            feed_list.append(norm_img)
+        if len(feed_list) == 1:
+            feed_batch = {
+                "image": np.concatenate(
+                    feed_list, axis=0)[np.newaxis, :]
+            }
+        else:
+            feed_batch = {"image": np.concatenate(feed_list, axis=0)}
        fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
-        return feed_list, fetch
+        return feed_batch, fetch, True
    def postprocess(self, feed={}, fetch=[], fetch_map=None):
        rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)

--- a/python/examples/ocr_detection/7.jpg
+++ b/python/examples/ocr_detection/7.jpg
--- a/python/examples/pipeline/imagenet/README.md
+++ b/python/examples/pipeline/imagenet/README.md
+# Imagenet Pipeline WebService
+This document will takes Imagenet service as an example to introduce how to use Pipeline WebService.
+## Get model
+```
+sh get_model.sh
+```
+## Start server
+```
+python resnet50_web_service.py &>log.txt &
+```
+## RPC test
+```
+python pipeline_rpc_client.py
+```
--- a/python/examples/pipeline/imagenet/README_CN.md
+++ b/python/examples/pipeline/imagenet/README_CN.md
+# Imagenet Pipeline WebService
+这里以 Uci 服务为例来介绍 Pipeline WebService 的使用。
+## 获取模型
+```
+sh get_data.sh
+```
+## 启动服务
+```
+python web_service.py &>log.txt &
+```
+## 测试
+```
+curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
+```
--- a/python/examples/pipeline/imagenet/config.yml
+++ b/python/examples/pipeline/imagenet/config.yml
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18082
+rpc_port: 9999
+dag:
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+op:
+    imagenet:
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+        local_service_conf:
+            #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+            concurrency: 2
+            #uci模型路径
+            model_config: ResNet50_vd_model
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0" # "0,1"
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["score"] 
--- a/python/examples/pipeline/imagenet/daisy.jpg
+++ b/python/examples/pipeline/imagenet/daisy.jpg
--- a/python/examples/pipeline/imagenet/get_model.sh
+++ b/python/examples/pipeline/imagenet/get_model.sh
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz
+tar -xzvf ResNet50_vd.tar.gz
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/image_data.tar.gz
+tar -xzvf image_data.tar.gz
--- a/python/examples/pipeline/imagenet/imagenet.label
+++ b/python/examples/pipeline/imagenet/imagenet.label
--- a/python/examples/pipeline/imagenet/pipeline_rpc_client.py
+++ b/python/examples/pipeline/imagenet/pipeline_rpc_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server_gpu.pipeline import PipelineClient
+import numpy as np
+import requests
+import json
+import cv2
+import base64
+import os
+client = PipelineClient()
+client.connect(['127.0.0.1:9999'])
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+with open("daisy.jpg", 'rb') as file:
+    image_data = file.read()
+image = cv2_to_base64(image_data)
+for i in range(1):
+    ret = client.predict(feed_dict={"image": image}, fetch=["label", "prob"])
+    print(ret)
--- a/python/examples/pipeline/imagenet/resnet50_web_service.py
+++ b/python/examples/pipeline/imagenet/resnet50_web_service.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+from paddle_serving_app.reader import Sequential, URL2Image, Resize, CenterCrop, RGB2BGR, Transpose, Div, Normalize, Base64ToImage
+try:
+    from paddle_serving_server_gpu.web_service import WebService, Op
+except ImportError:
+    from paddle_serving_server.web_service import WebService, Op
+import logging
+import numpy as np
+import base64, cv2
+class ImagenetOp(Op):
+    def init_op(self):
+        self.seq = Sequential([
+            Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+            Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
+                                True)
+        ])
+        self.label_dict = {}
+        label_idx = 0
+        with open("imagenet.label") as fin:
+            for line in fin:
+                self.label_dict[label_idx] = line.strip()
+                label_idx += 1
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        data = base64.b64decode(input_dict["image"].encode('utf8'))
+        data = np.fromstring(data, np.uint8)
+        # Note: class variables(self.var) can only be used in process op mode
+        im = cv2.imdecode(data, cv2.IMREAD_COLOR)
+        img = self.seq(im)
+        return {"image": img[np.newaxis, :].copy()}, False, None, ""
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        print(fetch_dict)
+        score_list = fetch_dict["score"]
+        result = {"label": [], "prob": []}
+        for score in score_list:
+            score = score.tolist()
+            max_score = max(score)
+            result["label"].append(self.label_dict[score.index(max_score)]
+                                   .strip().replace(",", ""))
+            result["prob"].append(max_score)
+        result["label"] = str(result["label"])
+        result["prob"] = str(result["prob"])
+        return result, None, ""
+class ImageService(WebService):
+    def get_pipeline_response(self, read_op):
+        image_op = ImagenetOp(name="imagenet", input_ops=[read_op])
+        return image_op
+uci_service = ImageService(name="imagenet")
+uci_service.prepare_pipeline_config("config.yml")
+uci_service.run_service()
--- a/python/examples/pipeline/imdb_model_ensemble/README.md
+++ b/python/examples/pipeline/imdb_model_ensemble/README.md
+# IMDB model ensemble examples
+## Get models
+```
+sh get_data.sh
+```
+## Start servers
+```
+python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
+python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
+python test_pipeline_server.py &>pipeline.log &
+```
+## Start clients
+```
+python test_pipeline_client.py
+```
--- a/python/examples/pipeline/imdb_model_ensemble/README_CN.md
+++ b/python/examples/pipeline/imdb_model_ensemble/README_CN.md
@@ -8,8 +8,8 @@ sh get_data.sh
 ## 启动服务
 ```
-python -m paddle_serving_server_gpu.serve --model imdb_cnn_model --port 9292 &> cnn.log &
+python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
-python -m paddle_serving_server_gpu.serve --model imdb_bow_model --port 9393 &> bow.log &
+python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
 python test_pipeline_server.py &>pipeline.log &
 ```
@@ -17,8 +17,3 @@ python test_pipeline_server.py &>pipeline.log &
 ```
 python test_pipeline_client.py
 ```
-## HTTP 测试
-```
-curl -X POST -k http://localhost:9999/prediction -d '{"key": ["words"], "value": ["i am very sad | 0"]}'
-```
--- a/python/examples/pipeline/imdb_model_ensemble/config.yml
+++ b/python/examples/pipeline/imdb_model_ensemble/config.yml
-rpc_port: 18080
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
+rpc_port: 18070
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18071
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+#当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
 worker_num: 4
-build_dag_each_worker: false
+#build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
+build_dag_each_worker: False
 dag:
-    is_thread_op: true
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: True
+    #重试次数
    retry: 1
-    use_profile: false
+    #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
+    use_profile: False
+    #channel的最大长度，默认为0
+    channel_size: 0
+    #tracer, 跟踪框架吞吐，每个OP和channel的工作情况。无tracer时不生成数据
+    tracer:
+        #每次trace的时间间隔，单位秒/s
+        interval_s: 10
 op:
    bow:
-        concurrency: 2
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
-        remote_service_conf:
+        concurrency: 1
-            client_type: brpc
-            model_config: imdb_bow_model
+        #client连接类型，brpc
-            devices: ""
+        client_type: brpc
-            rpc_port : 9393 
+        #Serving交互重试次数，默认不重试
+        retry: 1
+        #Serving交互超时时间, 单位ms
+        timeout: 3000
+        #Serving IPs
+        server_endpoints: ["127.0.0.1:9393"]
+        #bow模型client端配置
+        client_config: "imdb_bow_client_conf/serving_client_conf.prototxt"
+        #Fetch结果列表，以client_config中fetch_var的alias_name为准
+        fetch_list: ["prediction"]
+        #批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout，否则不足batch_size时会阻塞
+        batch_size: 1
+        #批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
    cnn:
-        concurrency: 2
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
-        remote_service_conf:
+        concurrency: 1
-            client_type: brpc
-            model_config: imdb_cnn_model
+        #client连接类型，brpc
-            devices: ""
+        client_type: brpc
-            rpc_port : 9292
+        #Serving交互重试次数，默认不重试
+        retry: 1
+        #超时时间, 单位ms
+        timeout: 3000
+        #Serving IPs
+        server_endpoints: ["127.0.0.1:9292"]
+        #cnn模型client端配置
+        client_config: "imdb_cnn_client_conf/serving_client_conf.prototxt"
+        #Fetch结果列表，以client_config中fetch_var的alias_name为准
+        fetch_list: ["prediction"]
+        #批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout，否则不足batch_size时会阻塞
+        batch_size: 1
+        #批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
+    combine:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+        #Serving交互重试次数，默认不重试
+        retry: 1
+        #超时时间, 单位ms
+        timeout: 3000
+        #批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout，否则不足batch_size时会阻塞
+        batch_size: 1
+        #批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
--- a/python/examples/pipeline/imdb_model_ensemble/test_pipeline_client.py
+++ b/python/examples/pipeline/imdb_model_ensemble/test_pipeline_client.py
@@ -15,21 +15,22 @@ from paddle_serving_server.pipeline import PipelineClient
 import numpy as np
 client = PipelineClient()
-client.connect(['127.0.0.1:18080'])
+client.connect(['127.0.0.1:18070'])
 words = 'i am very sad | 0'
 futures = []
-for i in range(4):
+for i in range(100):
    futures.append(
        client.predict(
-            feed_dict={"words": words},
+            feed_dict={"words": words,
+                       "logid": 10000 + i},
            fetch=["prediction"],
            asyn=True,
            profile=False))
 for f in futures:
    res = f.result()
-    if res["ecode"] != 0:
+    if res.err_no != 0:
        print("predict failed: {}".format(res))
    print(res)
--- a/python/examples/pipeline/imdb_model_ensemble/test_pipeline_server.py
+++ b/python/examples/pipeline/imdb_model_ensemble/test_pipeline_server.py
@@ -15,10 +15,14 @@
 from paddle_serving_server.pipeline import Op, RequestOp, ResponseOp
 from paddle_serving_server.pipeline import PipelineServer
 from paddle_serving_server.pipeline.proto import pipeline_service_pb2
-from paddle_serving_server.pipeline.channel import ChannelDataEcode
+from paddle_serving_server.pipeline.channel import ChannelDataErrcode
 import numpy as np
-from paddle_serving_app.reader import IMDBDataset
+from paddle_serving_app.reader.imdb_reader import IMDBDataset
 import logging
+try:
+    from paddle_serving_server.web_service import WebService
+except ImportError:
+    from paddle_serving_server_gpu.web_service import WebService
 _LOGGER = logging.getLogger()
 user_handler = logging.StreamHandler()
@@ -41,74 +45,68 @@ class ImdbRequestOp(RequestOp):
                continue
            words = request.value[idx]
            word_ids, _ = self.imdb_dataset.get_words_and_label(words)
-            dictdata[key] = np.array(word_ids)
+            word_len = len(word_ids)
-        return dictdata
+            dictdata[key] = np.array(word_ids).reshape(word_len, 1)
+            dictdata["{}.lod".format(key)] = np.array([0, word_len])
+        log_id = None
+        if request.logid is not None:
+            log_id = request.logid
+        return dictdata, log_id, None, ""
 class CombineOp(Op):
-    def preprocess(self, input_data):
+    def preprocess(self, input_data, data_id, log_id):
+        #_LOGGER.info("Enter CombineOp::preprocess")
        combined_prediction = 0
        for op_name, data in input_data.items():
            _LOGGER.info("{}: {}".format(op_name, data["prediction"]))
            combined_prediction += data["prediction"]
        data = {"prediction": combined_prediction / 2}
-        return data
+        return data, False, None, ""
 class ImdbResponseOp(ResponseOp):
    # Here ImdbResponseOp is consistent with the default ResponseOp implementation
    def pack_response_package(self, channeldata):
        resp = pipeline_service_pb2.Response()
-        resp.ecode = channeldata.ecode
+        resp.err_no = channeldata.error_code
-        if resp.ecode == ChannelDataEcode.OK.value:
+        if resp.err_no == ChannelDataErrcode.OK.value:
            feed = channeldata.parse()
            # ndarray to string
            for name, var in feed.items():
                resp.value.append(var.__repr__())
                resp.key.append(name)
        else:
-            resp.error_info = channeldata.error_info
+            resp.err_msg = channeldata.error_info
        return resp
 read_op = ImdbRequestOp()
-bow_op = Op(name="bow",
-            input_ops=[read_op],
-            server_endpoints=["127.0.0.1:9393"],
+class BowOp(Op):
-            fetch_list=["prediction"],
+    def init_op(self):
-            client_config="imdb_bow_client_conf/serving_client_conf.prototxt",
+        pass
-            concurrency=1,
-            timeout=-1,
-            retry=1,
+class CnnOp(Op):
-            batch_size=3,
+    def init_op(self):
-            auto_batching_timeout=1000)
+        pass
-cnn_op = Op(name="cnn",
-            input_ops=[read_op],
-            server_endpoints=["127.0.0.1:9292"],
+bow_op = BowOp("bow", input_ops=[read_op])
-            fetch_list=["prediction"],
+cnn_op = CnnOp("cnn", input_ops=[read_op])
-            client_config="imdb_cnn_client_conf/serving_client_conf.prototxt",
+combine_op = CombineOp("combine", input_ops=[bow_op, cnn_op])
-            concurrency=1,
-            timeout=-1,
-            retry=1,
-            batch_size=1,
-            auto_batching_timeout=None)
-combine_op = CombineOp(
-    name="combine",
-    input_ops=[bow_op, cnn_op],
-    concurrency=1,
-    timeout=-1,
-    retry=1,
-    batch_size=2,
-    auto_batching_timeout=None)
 # fetch output of bow_op
-# response_op = ImdbResponseOp(input_ops=[bow_op])
+#response_op = ImdbResponseOp(input_ops=[bow_op])
 # fetch output of combine_op
 response_op = ImdbResponseOp(input_ops=[combine_op])
 # use default ResponseOp implementation
-# response_op = ResponseOp(input_ops=[combine_op])
+#response_op = ResponseOp(input_ops=[combine_op])
 server = PipelineServer()
 server.set_response_op(response_op)

--- a/python/examples/pipeline/ocr/README.md
+++ b/python/examples/pipeline/ocr/README.md
@@ -28,31 +28,9 @@ python web_service.py &>log.txt &
 python pipeline_http_client.py
 ```
 <!--
 ## More (PipelineServing)
-You can choose one of the following versions to start Service.
-### Remote Service Version
-```
-python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 12000 --gpu_id 0 &> det.log &
-python -m paddle_serving_server_gpu.serve --model ocr_rec_model --port 12001 --gpu_id 0 &> rec.log &
-python remote_service_pipeline_server.py &>pipeline.log &
-```
-### Local Service Version
-```
-python local_service_pipeline_server.py &>pipeline.log &
-```
-### Hybrid Service Version
-```
-python -m paddle_serving_server_gpu.serve --model ocr_rec_model --port 12001 --gpu_id 0 &> rec.log &
-python hybrid_service_pipeline_server.py &>pipeline.log &
-```
 ## Client Prediction
 ### RPC

--- a/python/examples/pipeline/ocr/README_CN.md
+++ b/python/examples/pipeline/ocr/README_CN.md
@@ -31,26 +31,6 @@ python pipeline_http_client.py
 <!--
 ## 其他 (PipelineServing)
-你可以选择下面任意一种版本启动服务。
-### 远程服务版本
-```
-python -m paddle_serving_server.serve --model ocr_det_model --port 12000 --gpu_id 0 &> det.log &
-python -m paddle_serving_server.serve --model ocr_rec_model --port 12001 --gpu_id 0 &> rec.log &
-python remote_service_pipeline_server.py &>pipeline.log &
-```
-### 本地服务版本
-```
-python local_service_pipeline_server.py &>pipeline.log &
-```
-### 混合服务版本
-```
-python -m paddle_serving_server_gpu.serve --model ocr_rec_model --port 12001 --gpu_id 0 &> rec.log &
-python hybrid_service_pipeline_server.py &>pipeline.log &
-```
 ## 启动客户端
 ### RPC

--- a/python/examples/pipeline/ocr/config.yml
+++ b/python/examples/pipeline/ocr/config.yml
-rpc_port: 18080
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
-worker_num: 4
+rpc_port: 18090
-build_dag_each_worker: false
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
 http_port: 9999
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+#build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
+build_dag_each_worker: false
 dag:
-    is_thread_op: false
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+    #重试次数
    retry: 1
+    #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
    use_profile: false
 op:
    det:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
        concurrency: 2
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
        local_service_conf:
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
            client_type: local_predictor
+            #det模型路径
            model_config: ocr_det_model
-            devices: ""
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["concat_1.tmp_0"]
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0"
    rec:
-        concurrency: 1
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 2
+        #超时时间, 单位ms
        timeout: -1
+        #Serving交互重试次数，默认不重试
        retry: 1
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
        local_service_conf:
+            #client类型，包括brpc, grpc和local_predictor。local_predictor不启动Serving服务，进程内预测
            client_type: local_predictor
+            #rec模型路径
            model_config: ocr_rec_model
-            devices: ""
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] 
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0"
--- a/python/examples/pipeline/ocr/hybrid_service_pipeline_server.py
+++ b/python/examples/pipeline/ocr/hybrid_service_pipeline_server.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-from paddle_serving_server_gpu.pipeline import Op, RequestOp, ResponseOp
-from paddle_serving_server_gpu.pipeline import PipelineServer
-from paddle_serving_server_gpu.pipeline.proto import pipeline_service_pb2
-from paddle_serving_server_gpu.pipeline.channel import ChannelDataEcode
-from paddle_serving_server_gpu.pipeline import LocalRpcServiceHandler
-import numpy as np
-import cv2
-import time
-import base64
-import json
-from paddle_serving_app.reader import OCRReader
-from paddle_serving_app.reader import Sequential, ResizeByFactor
-from paddle_serving_app.reader import Div, Normalize, Transpose
-from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
-import time
-import re
-import base64
-import logging
-_LOGGER = logging.getLogger()
-class DetOp(Op):
-    def init_op(self):
-        self.det_preprocess = Sequential([
-            ResizeByFactor(32, 960), Div(255),
-            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
-                (2, 0, 1))
-        ])
-        self.filter_func = FilterBoxes(10, 10)
-        self.post_func = DBPostProcess({
-            "thresh": 0.3,
-            "box_thresh": 0.5,
-            "max_candidates": 1000,
-            "unclip_ratio": 1.5,
-            "min_size": 3
-        })
-    def preprocess(self, input_dicts):
-        (_, input_dict), = input_dicts.items()
-        data = base64.b64decode(input_dict["image"].encode('utf8'))
-        data = np.fromstring(data, np.uint8)
-        # Note: class variables(self.var) can only be used in process op mode
-        self.im = cv2.imdecode(data, cv2.IMREAD_COLOR)
-        self.ori_h, self.ori_w, _ = self.im.shape
-        det_img = self.det_preprocess(self.im)
-        _, self.new_h, self.new_w = det_img.shape
-        return {"image": det_img}
-    def postprocess(self, input_dicts, fetch_dict):
-        det_out = fetch_dict["concat_1.tmp_0"]
-        ratio_list = [
-            float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
-        ]
-        dt_boxes_list = self.post_func(det_out, [ratio_list])
-        dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
-        out_dict = {"dt_boxes": dt_boxes, "image": self.im}
-        return out_dict
-class RecOp(Op):
-    def init_op(self):
-        self.ocr_reader = OCRReader()
-        self.get_rotate_crop_image = GetRotateCropImage()
-        self.sorted_boxes = SortedBoxes()
-    def preprocess(self, input_dicts):
-        (_, input_dict), = input_dicts.items()
-        im = input_dict["image"]
-        dt_boxes = input_dict["dt_boxes"]
-        dt_boxes = self.sorted_boxes(dt_boxes)
-        feed_list = []
-        img_list = []
-        max_wh_ratio = 0
-        for i, dtbox in enumerate(dt_boxes):
-            boximg = self.get_rotate_crop_image(im, dt_boxes[i])
-            img_list.append(boximg)
-            h, w = boximg.shape[0:2]
-            wh_ratio = w * 1.0 / h
-            max_wh_ratio = max(max_wh_ratio, wh_ratio)
-        for img in img_list:
-            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
-            feed = {"image": norm_img}
-            feed_list.append(feed)
-        return feed_list
-    def postprocess(self, input_dicts, fetch_dict):
-        rec_res = self.ocr_reader.postprocess(fetch_dict, with_score=True)
-        res_lst = []
-        for res in rec_res:
-            res_lst.append(res[0])
-        res = {"res": str(res_lst)}
-        return res
-read_op = RequestOp()
-det_op = DetOp(
-    name="det",
-    input_ops=[read_op],
-    local_rpc_service_handler=LocalRpcServiceHandler(
-        model_config="ocr_det_model",
-        workdir="det_workdir",  # defalut: "workdir"
-        thread_num=2,  # defalut: 2
-        devices="0",  # gpu0. defalut: "" (cpu)
-        mem_optim=True,  # defalut: True
-        ir_optim=False,  # defalut: False
-        available_port_generator=None),  # defalut: None
-    concurrency=1)
-rec_op = RecOp(
-    name="rec",
-    input_ops=[det_op],
-    server_endpoints=["127.0.0.1:12001"],
-    fetch_list=["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"],
-    client_config="ocr_rec_client/serving_client_conf.prototxt",
-    concurrency=1)
-response_op = ResponseOp(input_ops=[rec_op])
-server = PipelineServer("ocr")
-server.set_response_op(response_op)
-server.prepare_server('config.yml')
-server.run_server()
--- a/python/examples/pipeline/ocr/local_service_pipeline_server.py
+++ b/python/examples/pipeline/ocr/local_service_pipeline_server.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-from paddle_serving_server_gpu.pipeline import Op, RequestOp, ResponseOp
-from paddle_serving_server_gpu.pipeline import PipelineServer
-from paddle_serving_server_gpu.pipeline.proto import pipeline_service_pb2
-from paddle_serving_server_gpu.pipeline.channel import ChannelDataEcode
-from paddle_serving_server_gpu.pipeline import LocalRpcServiceHandler
-import numpy as np
-import cv2
-import time
-import base64
-import json
-from paddle_serving_app.reader import OCRReader
-from paddle_serving_app.reader import Sequential, ResizeByFactor
-from paddle_serving_app.reader import Div, Normalize, Transpose
-from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
-import time
-import re
-import base64
-import logging
-_LOGGER = logging.getLogger()
-class DetOp(Op):
-    def init_op(self):
-        self.det_preprocess = Sequential([
-            ResizeByFactor(32, 960), Div(255),
-            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
-                (2, 0, 1))
-        ])
-        self.filter_func = FilterBoxes(10, 10)
-        self.post_func = DBPostProcess({
-            "thresh": 0.3,
-            "box_thresh": 0.5,
-            "max_candidates": 1000,
-            "unclip_ratio": 1.5,
-            "min_size": 3
-        })
-    def preprocess(self, input_dicts):
-        (_, input_dict), = input_dicts.items()
-        data = base64.b64decode(input_dict["image"].encode('utf8'))
-        data = np.fromstring(data, np.uint8)
-        # Note: class variables(self.var) can only be used in process op mode
-        self.im = cv2.imdecode(data, cv2.IMREAD_COLOR)
-        self.ori_h, self.ori_w, _ = self.im.shape
-        det_img = self.det_preprocess(self.im)
-        _, self.new_h, self.new_w = det_img.shape
-        return {"image": det_img}
-    def postprocess(self, input_dicts, fetch_dict):
-        det_out = fetch_dict["concat_1.tmp_0"]
-        ratio_list = [
-            float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
-        ]
-        dt_boxes_list = self.post_func(det_out, [ratio_list])
-        dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
-        out_dict = {"dt_boxes": dt_boxes, "image": self.im}
-        return out_dict
-class RecOp(Op):
-    def init_op(self):
-        self.ocr_reader = OCRReader()
-        self.get_rotate_crop_image = GetRotateCropImage()
-        self.sorted_boxes = SortedBoxes()
-    def preprocess(self, input_dicts):
-        (_, input_dict), = input_dicts.items()
-        im = input_dict["image"]
-        dt_boxes = input_dict["dt_boxes"]
-        dt_boxes = self.sorted_boxes(dt_boxes)
-        feed_list = []
-        img_list = []
-        max_wh_ratio = 0
-        for i, dtbox in enumerate(dt_boxes):
-            boximg = self.get_rotate_crop_image(im, dt_boxes[i])
-            img_list.append(boximg)
-            h, w = boximg.shape[0:2]
-            wh_ratio = w * 1.0 / h
-            max_wh_ratio = max(max_wh_ratio, wh_ratio)
-        for img in img_list:
-            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
-            feed = {"image": norm_img}
-            feed_list.append(feed)
-        return feed_list
-    def postprocess(self, input_dicts, fetch_dict):
-        rec_res = self.ocr_reader.postprocess(fetch_dict, with_score=True)
-        res_lst = []
-        for res in rec_res:
-            res_lst.append(res[0])
-        res = {"res": str(res_lst)}
-        return res
-read_op = RequestOp()
-det_op = DetOp(
-    name="det",
-    input_ops=[read_op],
-    local_rpc_service_handler=LocalRpcServiceHandler(
-        model_config="ocr_det_model",
-        workdir="det_workdir",  # defalut: "workdir"
-        thread_num=2,  # defalut: 2
-        devices="0",  # gpu0. defalut: "" (cpu)
-        mem_optim=True,  # defalut: True
-        ir_optim=False,  # defalut: False
-        available_port_generator=None),  # defalut: None
-    concurrency=1)
-rec_op = RecOp(
-    name="rec",
-    input_ops=[det_op],
-    local_rpc_service_handler=LocalRpcServiceHandler(
-        model_config="ocr_rec_model"),
-    concurrency=1)
-response_op = ResponseOp(input_ops=[rec_op])
-server = PipelineServer("ocr")
-server.set_response_op(response_op)
-server.prepare_server('config.yml')
-server.run_server()
--- a/python/examples/pipeline/ocr/pipeline_http_client.py
+++ b/python/examples/pipeline/ocr/pipeline_http_client.py
@@ -11,7 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from paddle_serving_server_gpu.pipeline import PipelineClient
+from paddle_serving_server.pipeline import PipelineClient
 import numpy as np
 import requests
 import json

--- a/python/examples/pipeline/ocr/pipeline_rpc_client.py
+++ b/python/examples/pipeline/ocr/pipeline_rpc_client.py
@@ -20,7 +20,7 @@ import base64
 import os
 client = PipelineClient()
-client.connect(['127.0.0.1:18080'])
+client.connect(['127.0.0.1:18090'])
 def cv2_to_base64(image):
@@ -33,6 +33,6 @@ for img_file in os.listdir(test_img_dir):
        image_data = file.read()
    image = cv2_to_base64(image_data)
-for i in range(4):
+for i in range(1):
    ret = client.predict(feed_dict={"image": image}, fetch=["res"])
    print(ret)
--- a/python/examples/pipeline/ocr/remote_service_pipeline_server.py
+++ b/python/examples/pipeline/ocr/remote_service_pipeline_server.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-from paddle_serving_server_gpu.pipeline import Op, RequestOp, ResponseOp
-from paddle_serving_server_gpu.pipeline import PipelineServer
-from paddle_serving_server_gpu.pipeline.proto import pipeline_service_pb2
-from paddle_serving_server_gpu.pipeline.channel import ChannelDataEcode
-import numpy as np
-import cv2
-import time
-import base64
-import json
-from paddle_serving_app.reader import OCRReader
-from paddle_serving_app.reader import Sequential, ResizeByFactor
-from paddle_serving_app.reader import Div, Normalize, Transpose
-from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
-import time
-import re
-import base64
-import logging
-_LOGGER = logging.getLogger()
-class DetOp(Op):
-    def init_op(self):
-        self.det_preprocess = Sequential([
-            ResizeByFactor(32, 960), Div(255),
-            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
-                (2, 0, 1))
-        ])
-        self.filter_func = FilterBoxes(10, 10)
-        self.post_func = DBPostProcess({
-            "thresh": 0.3,
-            "box_thresh": 0.5,
-            "max_candidates": 1000,
-            "unclip_ratio": 1.5,
-            "min_size": 3
-        })
-    def preprocess(self, input_dicts):
-        (_, input_dict), = input_dicts.items()
-        data = base64.b64decode(input_dict["image"].encode('utf8'))
-        data = np.fromstring(data, np.uint8)
-        # Note: class variables(self.var) can only be used in process op mode
-        self.im = cv2.imdecode(data, cv2.IMREAD_COLOR)
-        self.ori_h, self.ori_w, _ = self.im.shape
-        det_img = self.det_preprocess(self.im)
-        _, self.new_h, self.new_w = det_img.shape
-        return {"image": det_img}
-    def postprocess(self, input_dicts, fetch_dict):
-        det_out = fetch_dict["concat_1.tmp_0"]
-        ratio_list = [
-            float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
-        ]
-        dt_boxes_list = self.post_func(det_out, [ratio_list])
-        dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
-        out_dict = {"dt_boxes": dt_boxes, "image": self.im}
-        return out_dict
-class RecOp(Op):
-    def init_op(self):
-        self.ocr_reader = OCRReader()
-        self.get_rotate_crop_image = GetRotateCropImage()
-        self.sorted_boxes = SortedBoxes()
-    def preprocess(self, input_dicts):
-        (_, input_dict), = input_dicts.items()
-        im = input_dict["image"]
-        dt_boxes = input_dict["dt_boxes"]
-        dt_boxes = self.sorted_boxes(dt_boxes)
-        feed_list = []
-        img_list = []
-        max_wh_ratio = 0
-        for i, dtbox in enumerate(dt_boxes):
-            boximg = self.get_rotate_crop_image(im, dt_boxes[i])
-            img_list.append(boximg)
-            h, w = boximg.shape[0:2]
-            wh_ratio = w * 1.0 / h
-            max_wh_ratio = max(max_wh_ratio, wh_ratio)
-        for img in img_list:
-            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
-            feed = {"image": norm_img}
-            feed_list.append(feed)
-        return feed_list
-    def postprocess(self, input_dicts, fetch_dict):
-        rec_res = self.ocr_reader.postprocess(fetch_dict, with_score=True)
-        res_lst = []
-        for res in rec_res:
-            res_lst.append(res[0])
-        res = {"res": str(res_lst)}
-        return res
-read_op = RequestOp()
-det_op = DetOp(
-    name="det",
-    input_ops=[read_op],
-    server_endpoints=["127.0.0.1:12000"],
-    fetch_list=["concat_1.tmp_0"],
-    client_config="ocr_det_client/serving_client_conf.prototxt",
-    concurrency=1)
-rec_op = RecOp(
-    name="rec",
-    input_ops=[det_op],
-    server_endpoints=["127.0.0.1:12001"],
-    fetch_list=["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"],
-    client_config="ocr_rec_client/serving_client_conf.prototxt",
-    concurrency=1)
-response_op = ResponseOp(input_ops=[rec_op])
-server = PipelineServer("ocr")
-server.set_response_op(response_op)
-server.prepare_server('config.yml')
-server.run_server()
--- a/python/examples/pipeline/ocr/web_service.py
+++ b/python/examples/pipeline/ocr/web_service.py
@@ -14,7 +14,7 @@
 try:
    from paddle_serving_server.web_service import WebService, Op
 except ImportError:
-    from paddle_serving_server.web_service import WebService, Op
+    from paddle_serving_server_gpu.web_service import WebService, Op
 import logging
 import numpy as np
 import cv2
@@ -43,7 +43,7 @@ class DetOp(Op):
            "min_size": 3
        })
-    def preprocess(self, input_dicts):
+    def preprocess(self, input_dicts, data_id, log_id):
        (_, input_dict), = input_dicts.items()
        data = base64.b64decode(input_dict["image"].encode('utf8'))
        data = np.fromstring(data, np.uint8)
@@ -52,9 +52,9 @@ class DetOp(Op):
        self.ori_h, self.ori_w, _ = self.im.shape
        det_img = self.det_preprocess(self.im)
        _, self.new_h, self.new_w = det_img.shape
-        return {"image": det_img[np.newaxis, :]}
+        return {"image": det_img[np.newaxis, :].copy()}, False, None, ""
-    def postprocess(self, input_dicts, fetch_dict):
+    def postprocess(self, input_dicts, fetch_dict, log_id):
        det_out = fetch_dict["concat_1.tmp_0"]
        ratio_list = [
            float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
@@ -63,7 +63,7 @@ class DetOp(Op):
        dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
        out_dict = {"dt_boxes": dt_boxes, "image": self.im}
        print("out dict", out_dict)
-        return out_dict
+        return out_dict, None, ""
 class RecOp(Op):
@@ -72,7 +72,7 @@ class RecOp(Op):
        self.get_rotate_crop_image = GetRotateCropImage()
        self.sorted_boxes = SortedBoxes()
-    def preprocess(self, input_dicts):
+    def preprocess(self, input_dicts, data_id, log_id):
        (_, input_dict), = input_dicts.items()
        im = input_dict["image"]
        dt_boxes = input_dict["dt_boxes"]
@@ -93,15 +93,15 @@ class RecOp(Op):
            norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
            imgs[id] = norm_img
        feed = {"image": imgs.copy()}
-        return feed
+        return feed, False, None, ""
-    def postprocess(self, input_dicts, fetch_dict):
+    def postprocess(self, input_dicts, fetch_dict, log_id):
        rec_res = self.ocr_reader.postprocess(fetch_dict, with_score=True)
        res_lst = []
        for res in rec_res:
            res_lst.append(res[0])
        res = {"res": str(res_lst)}
-        return res
+        return res, None, ""
 class OcrService(WebService):
@@ -112,5 +112,5 @@ class OcrService(WebService):
 uci_service = OcrService(name="ocr")
-uci_service.prepare_pipeline_config("brpc_config.yml")
+uci_service.prepare_pipeline_config("config.yml")
 uci_service.run_service()
--- a/python/examples/pipeline/simple_web_service/README.md
+++ b/python/examples/pipeline/simple_web_service/README.md
@@ -15,5 +15,5 @@ python web_service.py &>log.txt &
 ## Http test
 ```
-curl -X POST -k http://localhost:18080/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
+curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
 ```
--- a/python/examples/pipeline/simple_web_service/README_CN.md
+++ b/python/examples/pipeline/simple_web_service/README_CN.md
@@ -15,5 +15,5 @@ python web_service.py &>log.txt &
 ## 测试
 ```
-curl -X POST -k http://localhost:18080/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
+curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
 ```
--- a/python/examples/pipeline/simple_web_service/config.yml
+++ b/python/examples/pipeline/simple_web_service/config.yml
-worker_num: 4
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
-http_port: 18080
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18082
 dag:
-    is_thread_op: false
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
 op:
    uci:
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
        local_service_conf:
+            #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+            concurrency: 2
+            #uci模型路径
            model_config: uci_housing_model
-            devices: "" # "0,1"
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0" # "0,1"
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["price"] 
--- a/python/examples/pipeline/simple_web_service/web_service.py
+++ b/python/examples/pipeline/simple_web_service/web_service.py
@@ -17,6 +17,7 @@ except ImportError:
    from paddle_serving_server.web_service import WebService, Op
 import logging
 import numpy as np
+import sys
 _LOGGER = logging.getLogger()
@@ -25,20 +26,32 @@ class UciOp(Op):
    def init_op(self):
        self.separator = ","
-    def preprocess(self, input_dicts):
+    def preprocess(self, input_dicts, data_id, log_id):
        (_, input_dict), = input_dicts.items()
-        _LOGGER.info(input_dict)
+        _LOGGER.error("UciOp::preprocess >>> log_id:{}, input:{}".format(
+            log_id, input_dict))
        x_value = input_dict["x"]
-        if isinstance(x_value, (str, unicode)):
+        proc_dict = {}
-            input_dict["x"] = np.array(
+        if sys.version_info.major == 2:
-                [float(x.strip())
+            if isinstance(x_value, (str, unicode)):
-                 for x in x_value.split(self.separator)]).reshape(1, 13)
+                input_dict["x"] = np.array(
-        return input_dict
+                    [float(x.strip())
+                     for x in x_value.split(self.separator)]).reshape(1, 13)
-    def postprocess(self, input_dicts, fetch_dict):
+                _LOGGER.error("input_dict:{}".format(input_dict))
-        # _LOGGER.info(fetch_dict)
+        else:
+            if isinstance(x_value, str):
+                input_dict["x"] = np.array(
+                    [float(x.strip())
+                     for x in x_value.split(self.separator)]).reshape(1, 13)
+                _LOGGER.error("input_dict:{}".format(input_dict))
+        return input_dict, False, None, ""
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        _LOGGER.info("UciOp::postprocess >>> log_id:{}, fetch_dict:{}".format(
+            log_id, fetch_dict))
        fetch_dict["price"] = str(fetch_dict["price"][0][0])
-        return fetch_dict
+        return fetch_dict, None, ""
 class UciService(WebService):

--- a/python/examples/senta/senta_web_service.py
+++ b/python/examples/senta/senta_web_service.py
@@ -37,6 +37,7 @@ class SentaService(WebService):
    #定义senta模型预测服务的预处理，调用顺序：lac reader->lac模型预测->预测结果后处理->senta reader
    def preprocess(self, feed=[], fetch=[]):
        feed_batch = []
+        is_batch = True
        words_lod = [0]
        for ins in feed:
            if "words" not in ins:
@@ -64,14 +65,13 @@ class SentaService(WebService):
        return {
            "words": np.concatenate(feed_batch),
            "words.lod": words_lod
-        }, fetch
+        }, fetch, is_batch
 senta_service = SentaService(name="senta")
 senta_service.load_model_config("senta_bilstm_model")
 senta_service.prepare_server(workdir="workdir")
 senta_service.init_lac_client(
-    lac_port=9300,
+    lac_port=9300, lac_client_config="lac_model/serving_server_conf.prototxt")
-    lac_client_config="lac/lac_model/serving_server_conf.prototxt")
 senta_service.run_rpc_service()
 senta_service.run_web_service()
--- a/python/examples/unet_for_image_seg/unet_benchmark/README.md
+++ b/python/examples/unet_for_image_seg/unet_benchmark/README.md
+#UNET_BENCHMARK 使用说明
+## 功能
+* benchmark测试
+## 注意事项
+* 示例图片（可以有多张）请放置于与img_data路径中，支持jpg，jpeg
+* 图片张数应该大于等于并发数量
+## TODO
+* http benchmark
--- a/python/examples/unet_for_image_seg/unet_benchmark/img_data/N0060.jpg
+++ b/python/examples/unet_for_image_seg/unet_benchmark/img_data/N0060.jpg
--- a/python/examples/unet_for_image_seg/unet_benchmark/launch_benckmark.sh
+++ b/python/examples/unet_for_image_seg/unet_benchmark/launch_benckmark.sh
+#!/bin/bash
+python unet_benchmark.py --thread 1 --batch_size 1 --model ../unet_client/serving_client_conf.prototxt
+# thread/batch can be modified as you wish 
--- a/python/examples/unet_for_image_seg/unet_benchmark/unet_benchmark.py
+++ b/python/examples/unet_for_image_seg/unet_benchmark/unet_benchmark.py
+# -*- coding: utf-8 -*-
+#
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+  unet bench mark script
+  20201130 first edition by cg82616424
+"""
+from __future__ import unicode_literals, absolute_import
+import os
+import time
+import json
+import requests
+from paddle_serving_client import Client
+from paddle_serving_client.utils import MultiThreadRunner
+from paddle_serving_client.utils import benchmark_args, show_latency
+from paddle_serving_app.reader import Sequential, File2Image, Resize, Transpose, BGR2RGB, SegPostprocess
+args = benchmark_args()
+def get_img_names(path):
+    """
+    Brief:
+        get img files(jpg) under this path
+        if any exception happened return None
+    Args:
+        path (string): image file path
+    Returns:
+        list: images names under this folder
+    """
+    if not os.path.exists(path):
+        return None
+    if not os.path.isdir(path):
+        return None
+    list_name = []
+    for f_handler in os.listdir(path):
+        file_path = os.path.join(path, f_handler)
+        if os.path.isdir(file_path):
+            continue
+        else:
+            if not file_path.endswith(".jpeg") and not file_path.endswith(
+                    ".jpg"):
+                continue
+            list_name.append(file_path)
+    return list_name
+def preprocess_img(img_list):
+    """
+    Brief:
+        prepare img data for benchmark
+    Args:
+        img_list(list): list for img file path
+    Returns:
+        image content binary list after preprocess
+    """
+    preprocess = Sequential([File2Image(), Resize((512, 512))])
+    result_list = []
+    for img in img_list:
+        img_tmp = preprocess(img)
+        result_list.append(img_tmp)
+    return result_list
+def benckmark_worker(idx, resource):
+    """
+    Brief:
+        benchmark single worker for unet
+    Args:
+        idx(int): worker idx ,use idx to select backend unet service
+        resource(dict): unet serving endpoint dict 
+    Returns:
+        latency
+    TODO:
+        http benckmarks
+    """
+    profile_flags = False
+    latency_flags = False
+    postprocess = SegPostprocess(2)
+    if os.getenv("FLAGS_profile_client"):
+        profile_flags = True
+    if os.getenv("FLAGS_serving_latency"):
+        latency_flags = True
+        latency_list = []
+    client_handler = Client()
+    client_handler.load_client_config(args.model)
+    client_handler.connect(
+        [resource["endpoint"][idx % len(resource["endpoint"])]])
+    start = time.time()
+    turns = resource["turns"]
+    img_list = resource["img_list"]
+    for i in range(turns):
+        if args.batch_size >= 1:
+            l_start = time.time()
+            feed_batch = []
+            b_start = time.time()
+            for bi in range(args.batch_size):
+                feed_batch.append({"image": img_list[bi]})
+            b_end = time.time()
+            if profile_flags:
+                sys.stderr.write(
+                    "PROFILE\tpid:{}\tunt_pre_0:{} unet_pre_1:{}\n".format(
+                        os.getpid(),
+                        int(round(b_start * 1000000)),
+                        int(round(b_end * 1000000))))
+            result = client_handler.predict(
+                feed={"image": img_list[bi]}, fetch=["output"])
+            #result["filename"] = "./img_data/N0060.jpg" % (os.getpid(), idx, time.time())
+            #postprocess(result) # if you  want to measure post process time, you have to uncomment this line
+            l_end = time.time()
+            if latency_flags:
+                latency_list.append(l_end * 1000 - l_start * 1000)
+        else:
+            print("unsupport batch size {}".format(args.batch_size))
+    end = time.time()
+    if latency_flags:
+        return [[end - start], latency_list]
+    else:
+        return [[end - start]]
+if __name__ == '__main__':
+    """
+    usage: 
+    """
+    img_file_list = get_img_names("./img_data")
+    img_content_list = preprocess_img(img_file_list)
+    multi_thread_runner = MultiThreadRunner()
+    endpoint_list = ["127.0.0.1:9494"]
+    turns = 1
+    start = time.time()
+    result = multi_thread_runner.run(benckmark_worker, args.thread, {
+        "endpoint": endpoint_list,
+        "turns": turns,
+        "img_list": img_content_list
+    })
+    end = time.time()
+    total_cost = end - start
+    avg_cost = 0
+    for i in range(args.thread):
+        avg_cost += result[0][i]
+    avg_cost = avg_cost / args.thread
+    print("total cost: {}s".format(total_cost))
+    print("each thread cost: {}s. ".format(avg_cost))
+    print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
+                                    total_cost))
+    if os.getenv("FLAGS_serving_latency"):
+        show_latency(result[1])
--- a/python/paddle_serving_app/local_predict.py
+++ b/python/paddle_serving_app/local_predict.py
@@ -32,6 +32,12 @@ logger.setLevel(logging.INFO)
 class LocalPredictor(object):
+    """
+    Prediction in the current process of the local environment, in process
+    call, Compared with RPC/HTTP, LocalPredictor has better performance, 
+    because of no network and packaging load.
+    """
    def __init__(self):
        self.feed_names_ = []
        self.fetch_names_ = []
@@ -42,13 +48,41 @@ class LocalPredictor(object):
        self.fetch_names_to_idx_ = {}
        self.fetch_names_to_type_ = {}
-    def load_model_config(self, model_path, gpu=False, profile=True, cpu_num=1):
+    def load_model_config(self,
+                          model_path,
+                          use_gpu=False,
+                          gpu_id=0,
+                          use_profile=False,
+                          thread_num=1,
+                          mem_optim=True,
+                          ir_optim=False,
+                          use_trt=False,
+                          use_feed_fetch_ops=False):
+        """
+        Load model config and set the engine config for the paddle predictor
+        Args:
+            model_path: model config path.
+            use_gpu: calculating with gpu, False default.
+            gpu_id: gpu id, 0 default.
+            use_profile: use predictor profiles, False default.
+            thread_num: thread nums, default 1. 
+            mem_optim: memory optimization, True default.
+            ir_optim: open calculation chart optimization, False default.
+            use_trt: use nvidia TensorRT optimization, False default
+            use_feed_fetch_ops: use feed/fetch ops, False default.
+        """
        client_config = "{}/serving_server_conf.prototxt".format(model_path)
        model_conf = m_config.GeneralModelConfig()
        f = open(client_config, 'r')
        model_conf = google.protobuf.text_format.Merge(
            str(f.read()), model_conf)
        config = AnalysisConfig(model_path)
+        logger.info("load_model_config params: model_path:{}, use_gpu:{},\
+            gpu_id:{}, use_profile:{}, thread_num:{}, mem_optim:{}, ir_optim:{},\
+            use_trt:{}, use_feed_fetch_ops:{}".format(
+            model_path, use_gpu, gpu_id, use_profile, thread_num, mem_optim,
+            ir_optim, use_trt, use_feed_fetch_ops))
        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
@@ -64,19 +98,43 @@ class LocalPredictor(object):
            self.fetch_names_to_idx_[var.alias_name] = i
            self.fetch_names_to_type_[var.alias_name] = var.fetch_type
-        if not gpu:
+        if use_profile:
-            config.disable_gpu()
-        else:
-            config.enable_use_gpu(100, 0)
-        if profile:
            config.enable_profile()
+        if mem_optim:
+            config.enable_memory_optim()
+        config.switch_ir_optim(ir_optim)
+        config.set_cpu_math_library_num_threads(thread_num)
+        config.switch_use_feed_fetch_ops(use_feed_fetch_ops)
        config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
-        config.set_cpu_math_library_num_threads(cpu_num)
-        config.switch_ir_optim(False)
+        if not use_gpu:
-        config.switch_use_feed_fetch_ops(False)
+            config.disable_gpu()
+        else:
+            config.enable_use_gpu(100, gpu_id)
+            if use_trt:
+                config.enable_tensorrt_engine(
+                    workspace_size=1 << 20,
+                    max_batch_size=32,
+                    min_subgraph_size=3,
+                    use_static=False,
+                    use_calib_mode=False)
        self.predictor = create_paddle_predictor(config)
    def predict(self, feed=None, fetch=None, batch=False, log_id=0):
+        """
+        Predict locally
+        Args:
+            feed: feed var
+            fetch: fetch var
+            batch: batch data or not, False default.If batch is False, a new
+                   dimension is added to header of the shape[np.newaxis].
+            log_id: for logging
+        Returns:
+            fetch_map: dict 
+        """
        if feed is None or fetch is None:
            raise ValueError("You should specify feed and fetch for prediction")
        fetch_list = []

--- a/python/paddle_serving_app/reader/__init__.py
+++ b/python/paddle_serving_app/reader/__init__.py
@@ -18,5 +18,5 @@ from .image_reader import RCNNPostprocess, SegPostprocess, PadStride, BlazeFaceP
 from .image_reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
 from .lac_reader import LACReader
 from .senta_reader import SentaReader
-from .imdb_reader import IMDBDataset
+#from .imdb_reader import IMDBDataset
 from .ocr_reader import OCRReader
--- a/python/paddle_serving_app/reader/pddet/image_tool.py
+++ b/python/paddle_serving_app/reader/pddet/image_tool.py
@@ -22,18 +22,17 @@ import yaml
 import copy
 import argparse
 import logging
-import paddle.fluid as fluid
 import json
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
-precision_map = {
+#precision_map = {
-    'trt_int8': fluid.core.AnalysisConfig.Precision.Int8,
+#    'trt_int8': fluid.core.AnalysisConfig.Precision.Int8,
-    'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32,
+#    'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32,
-    'trt_fp16': fluid.core.AnalysisConfig.Precision.Half
+#    'trt_fp16': fluid.core.AnalysisConfig.Precision.Half
-}
+#}
 class Resize(object):

--- a/python/paddle_serving_client/io/__init__.py
+++ b/python/paddle_serving_client/io/__init__.py
@@ -92,9 +92,12 @@ def save_model(server_model_folder,
            fetch_var.shape.extend(tmp_shape)
        config.fetch_var.extend([fetch_var])
-    cmd = "mkdir -p {}".format(client_config_folder)
+    try:
+        save_dirname = os.path.normpath(client_config_folder)
-    os.system(cmd)
+        os.makedirs(save_dirname)
+    except OSError as e:
+        if e.errno != errno.EEXIST:
+            raise
    with open("{}/serving_client_conf.prototxt".format(client_config_folder),
              "w") as fout:
        fout.write(str(config))

--- a/python/paddle_serving_server/__init__.py
+++ b/python/paddle_serving_server/__init__.py
@@ -23,13 +23,13 @@ import paddle_serving_server as paddle_serving_server
 from .version import serving_server_version
 from contextlib import closing
 import collections
-import fcntl
 import shutil
 import numpy as np
 import grpc
 from .proto import multi_lang_general_model_service_pb2
 import sys
+if sys.platform.startswith('win') is False:
+    import fcntl
 sys.path.append(
    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
 from .proto import multi_lang_general_model_service_pb2_grpc

--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
@@ -52,6 +52,20 @@ class WebService(object):
    def load_model_config(self, model_config):
        print("This API will be deprecated later. Please do not use it")
        self.model_config = model_config
+        import os
+        from .proto import general_model_config_pb2 as m_config
+        import google.protobuf.text_format
+        if os.path.isdir(model_config):
+            client_config = "{}/serving_server_conf.prototxt".format(
+                model_config)
+        elif os.path.isfile(path):
+            client_config = model_config
+        model_conf = m_config.GeneralModelConfig()
+        f = open(client_config, 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names = [var.alias_name for var in model_conf.feed_var]
+        self.fetch_names = [var.alias_name for var in model_conf.fetch_var]
    def _launch_rpc_service(self):
        op_maker = OpMaker()
@@ -112,13 +126,14 @@ class WebService(object):
        if "fetch" not in request.json:
            abort(400)
        try:
-            feed, fetch = self.preprocess(request.json["feed"],
+            feed, fetch, is_batch = self.preprocess(request.json["feed"],
-                                          request.json["fetch"])
+                                                    request.json["fetch"])
            if isinstance(feed, dict) and "fetch" in feed:
                del feed["fetch"]
            if len(feed) == 0:
                raise ValueError("empty input")
-            fetch_map = self.client.predict(feed=feed, fetch=fetch, batch=True)
+            fetch_map = self.client.predict(
+                feed=feed, fetch=fetch, batch=is_batch)
            result = self.postprocess(
                feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
            result = {"result": result}
@@ -174,21 +189,19 @@ class WebService(object):
        from paddle_serving_app.local_predict import LocalPredictor
        self.client = LocalPredictor()
        self.client.load_model_config(
-            "{}".format(self.model_config), gpu=False, profile=False)
+            "{}".format(self.model_config), use_gpu=False)
    def run_web_service(self):
        print("This API will be deprecated later. Please do not use it")
-        self.app_instance.run(host="0.0.0.0",
+        self.app_instance.run(host="0.0.0.0", port=self.port, threaded=True)
-                              port=self.port,
-                              threaded=False,
-                              processes=1)
    def get_app_instance(self):
        return self.app_instance
    def preprocess(self, feed=[], fetch=[]):
        print("This API will be deprecated later. Please do not use it")
-        return feed, fetch
+        is_batch = True
+        return feed, fetch, is_batch
    def postprocess(self, feed=[], fetch=[], fetch_map=None):
        print("This API will be deprecated later. Please do not use it")

--- a/python/paddle_serving_server_gpu/web_service.py
+++ b/python/paddle_serving_server_gpu/web_service.py
@@ -58,6 +58,20 @@ class WebService(object):
    def load_model_config(self, model_config):
        print("This API will be deprecated later. Please do not use it")
        self.model_config = model_config
+        import os
+        from .proto import general_model_config_pb2 as m_config
+        import google.protobuf.text_format
+        if os.path.isdir(model_config):
+            client_config = "{}/serving_server_conf.prototxt".format(
+                model_config)
+        elif os.path.isfile(path):
+            client_config = model_config
+        model_conf = m_config.GeneralModelConfig()
+        f = open(client_config, 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names = [var.alias_name for var in model_conf.feed_var]
+        self.fetch_names = [var.alias_name for var in model_conf.fetch_var]
    def set_gpus(self, gpus):
        print("This API will be deprecated later. Please do not use it")
@@ -167,13 +181,14 @@ class WebService(object):
        if "fetch" not in request.json:
            abort(400)
        try:
-            feed, fetch = self.preprocess(request.json["feed"],
+            feed, fetch, is_batch = self.preprocess(request.json["feed"],
-                                          request.json["fetch"])
+                                                    request.json["fetch"])
            if isinstance(feed, dict) and "fetch" in feed:
                del feed["fetch"]
            if len(feed) == 0:
                raise ValueError("empty input")
-            fetch_map = self.client.predict(feed=feed, fetch=fetch)
+            fetch_map = self.client.predict(
+                feed=feed, fetch=fetch, batch=is_batch)
            result = self.postprocess(
                feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
            result = {"result": result}
@@ -235,21 +250,19 @@ class WebService(object):
        from paddle_serving_app.local_predict import LocalPredictor
        self.client = LocalPredictor()
        self.client.load_model_config(
-            "{}".format(self.model_config), gpu=gpu, profile=False)
+            "{}".format(self.model_config), use_gpu=True, gpu_id=self.gpus[0])
    def run_web_service(self):
        print("This API will be deprecated later. Please do not use it")
-        self.app_instance.run(host="0.0.0.0",
+        self.app_instance.run(host="0.0.0.0", port=self.port, threaded=True)
-                              port=self.port,
-                              threaded=False,
-                              processes=4)
    def get_app_instance(self):
        return self.app_instance
    def preprocess(self, feed=[], fetch=[]):
        print("This API will be deprecated later. Please do not use it")
-        return feed, fetch
+        is_batch = True
+        return feed, fetch, is_batch
    def postprocess(self, feed=[], fetch=[], fetch_map=None):
        print("This API will be deprecated later. Please do not use it")

--- a/python/pipeline/analyse.py
+++ b/python/pipeline/analyse.py
@@ -312,7 +312,7 @@ class OpAnalyst(object):
        # reduce op times
        op_times = {
-            op_name: sum(step_times.values())
+            op_name: sum(list(step_times.values()))
            for op_name, step_times in op_times.items()
        }

--- a/python/pipeline/channel.py
+++ b/python/pipeline/channel.py
@@ -32,7 +32,10 @@ import copy
 _LOGGER = logging.getLogger(__name__)
-class ChannelDataEcode(enum.Enum):
+class ChannelDataErrcode(enum.Enum):
+    """
+    ChannelData error code
+    """
    OK = 0
    TIMEOUT = 1
    NOT_IMPLEMENTED = 2
@@ -42,9 +45,21 @@ class ChannelDataEcode(enum.Enum):
    CLOSED_ERROR = 6
    NO_SERVICE = 7
    UNKNOW = 8
+    PRODUCT_ERROR = 9
+class ProductErrCode(enum.Enum):
+    """
+    ProductErrCode is a base class for recording business error code. 
+    product developers inherit this class and extend more error codes. 
+    """
+    pass
 class ChannelDataType(enum.Enum):
+    """
+    Channel data type
+    """
    DICT = 0
    CHANNEL_NPDATA = 1
    ERROR = 2
@@ -56,20 +71,23 @@ class ChannelData(object):
                 npdata=None,
                 dictdata=None,
                 data_id=None,
-                 ecode=None,
+                 log_id=None,
+                 error_code=None,
                 error_info=None,
+                 prod_error_code=None,
+                 prod_error_info=None,
                 client_need_profile=False):
        '''
        There are several ways to use it:
-        1. ChannelData(ChannelDataType.CHANNEL_NPDATA.value, npdata, data_id)
+        1. ChannelData(ChannelDataType.CHANNEL_NPDATA.value, npdata, data_id, log_id)
-        2. ChannelData(ChannelDataType.DICT.value, dictdata, data_id)
+        2. ChannelData(ChannelDataType.DICT.value, dictdata, data_id, log_id)
-        3. ChannelData(ecode, error_info, data_id)
+        3. ChannelData(error_code, error_info, prod_error_code, prod_error_info, data_id, log_id)
        Protobufs are not pickle-able:
        https://stackoverflow.com/questions/55344376/how-to-import-protobuf-module
        '''
-        if ecode is not None:
+        if error_code is not None or prod_error_code is not None:
            if data_id is None or error_info is None:
                _LOGGER.critical("Failed to generate ChannelData: data_id"
                                 " and error_info cannot be None")
@@ -77,25 +95,30 @@ class ChannelData(object):
            datatype = ChannelDataType.ERROR.value
        else:
            if datatype == ChannelDataType.CHANNEL_NPDATA.value:
-                ecode, error_info = ChannelData.check_npdata(npdata)
+                error_code, error_info = ChannelData.check_npdata(npdata)
-                if ecode != ChannelDataEcode.OK.value:
+                if error_code != ChannelDataErrcode.OK.value:
                    datatype = ChannelDataType.ERROR.value
-                    _LOGGER.error("(logid={}) {}".format(data_id, error_info))
+                    _LOGGER.error("(data_id={} log_id={}) {}".format(
+                        data_id, log_id, error_info))
            elif datatype == ChannelDataType.DICT.value:
-                ecode, error_info = ChannelData.check_dictdata(dictdata)
+                error_code, error_info = ChannelData.check_dictdata(dictdata)
-                if ecode != ChannelDataEcode.OK.value:
+                if error_code != ChannelDataErrcode.OK.value:
                    datatype = ChannelDataType.ERROR.value
-                    _LOGGER.error("(logid={}) {}".format(data_id, error_info))
+                    _LOGGER.error("(data_id={} log_id={}) {}".format(
+                        data_id, log_id, error_info))
            else:
-                _LOGGER.critical("(logid={}) datatype not match".format(
+                _LOGGER.critical("(data_id={} log_id={}) datatype not match".
-                    data_id))
+                                 format(data_id, log_id))
                os._exit(-1)
        self.datatype = datatype
        self.npdata = npdata
        self.dictdata = dictdata
        self.id = data_id
-        self.ecode = ecode
+        self.log_id = log_id
+        self.error_code = error_code
        self.error_info = error_info
+        self.prod_error_code = prod_error_code
+        self.prod_error_info = prod_error_info
        self.client_need_profile = client_need_profile
        self.profile_data_set = set()
@@ -106,67 +129,67 @@ class ChannelData(object):
    @staticmethod
    def check_dictdata(dictdata):
-        ecode = ChannelDataEcode.OK.value
+        error_code = ChannelDataErrcode.OK.value
        error_info = None
        if isinstance(dictdata, list):
            # batch data
            for sample in dictdata:
                if not isinstance(sample, dict):
-                    ecode = ChannelDataEcode.TYPE_ERROR.value
+                    error_code = ChannelDataErrcode.TYPE_ERROR.value
                    error_info = "Failed to check data: the type of " \
                            "data must be dict, but get {}.".format(type(sample))
                    break
        elif not isinstance(dictdata, dict):
            # batch size = 1
-            ecode = ChannelDataEcode.TYPE_ERROR.value
+            error_code = ChannelDataErrcode.TYPE_ERROR.value
            error_info = "Failed to check data: the type of data must " \
                    "be dict, but get {}.".format(type(dictdata))
-        return ecode, error_info
+        return error_code, error_info
    @staticmethod
    def check_batch_npdata(batch):
-        ecode = ChannelDataEcode.OK.value
+        error_code = ChannelDataErrcode.OK.value
        error_info = None
        for npdata in batch:
-            ecode, error_info = ChannelData.check_npdata(npdata)
+            error_code, error_info = ChannelData.check_npdata(npdata)
-            if ecode != ChannelDataEcode.OK.value:
+            if error_code != ChannelDataErrcode.OK.value:
                break
-        return ecode, error_info
+        return error_code, error_info
    @staticmethod
    def check_npdata(npdata):
-        ecode = ChannelDataEcode.OK.value
+        error_code = ChannelDataErrcode.OK.value
        error_info = None
        if isinstance(npdata, list):
            # batch data
            for sample in npdata:
                if not isinstance(sample, dict):
-                    ecode = ChannelDataEcode.TYPE_ERROR.value
+                    error_code = ChannelDataErrcode.TYPE_ERROR.value
                    error_info = "Failed to check data: the " \
                            "value of data must be dict, but get {}.".format(
                                    type(sample))
                    break
                for _, value in sample.items():
                    if not isinstance(value, np.ndarray):
-                        ecode = ChannelDataEcode.TYPE_ERROR.value
+                        error_code = ChannelDataErrcode.TYPE_ERROR.value
                        error_info = "Failed to check data: the" \
                                " value of data must be np.ndarray, but get {}.".format(
                                        type(value))
-                        return ecode, error_info
+                        return error_code, error_info
        elif isinstance(npdata, dict):
            # batch_size = 1
            for _, value in npdata.items():
                if not isinstance(value, np.ndarray):
-                    ecode = ChannelDataEcode.TYPE_ERROR.value
+                    error_code = ChannelDataErrcode.TYPE_ERROR.value
                    error_info = "Failed to check data: the value " \
                            "of data must be np.ndarray, but get {}.".format(
                                    type(value))
                    break
        else:
-            ecode = ChannelDataEcode.TYPE_ERROR.value
+            error_code = ChannelDataErrcode.TYPE_ERROR.value
            error_info = "Failed to check data: the value of data " \
                    "must be dict, but get {}.".format(type(npdata))
-        return ecode, error_info
+        return error_code, error_info
    def parse(self):
        feed = None
@@ -191,8 +214,9 @@ class ChannelData(object):
            return 1
    def __str__(self):
-        return "type[{}], ecode[{}], id[{}]".format(
+        return "type[{}], error_code[{}], data_id[{}], log_id[{}], dict_data[{}]".format(
-            ChannelDataType(self.datatype).name, self.ecode, self.id)
+            ChannelDataType(self.datatype).name, self.error_code, self.id,
+            self.log_id, str(self.dictdata))
 class ProcessChannel(object):
@@ -289,14 +313,14 @@ class ProcessChannel(object):
    def push(self, channeldata, op_name=None):
        _LOGGER.debug(
-            self._log("(logid={}) Op({}) Pushing data".format(channeldata.id,
+            self._log("(data_id={} log_id={}) Op({}) Enter channel::push".
-                                                              op_name)))
+                      format(channeldata.id, channeldata.log_id, op_name)))
        if len(self._producers) == 0:
            _LOGGER.critical(
                self._log(
-                    "(logid={}) Op({}) Failed to push data: expected number"
+                    "(data_id={} log_id={}) Op({}) Failed to push data: expected number"
                    " of producers to be greater than 0, but the it is 0.".
-                    format(channeldata.id, op_name)))
+                    format(channeldata.id, channeldata.log_id, op_name)))
            os._exit(-1)
        elif len(self._producers) == 1:
            with self._cv:
@@ -310,19 +334,21 @@ class ProcessChannel(object):
                    raise ChannelStopError()
                self._cv.notify_all()
            _LOGGER.debug(
-                self._log("(logid={}) Op({}) Pushed data into internal queue.".
+                self._log(
-                          format(channeldata.id, op_name)))
+                    "(data_id={} log_id={}) Op({}) Pushed data into internal queue.".
+                    format(channeldata.id, channeldata.log_id, op_name)))
            return True
        elif op_name is None:
            _LOGGER.critical(
                self._log(
-                    "(logid={}) Op({}) Failed to push data: there are multiple "
+                    "(data_id={} log_id={}) Op({}) Failed to push data: there are multiple "
                    "producers, so op_name cannot be None.".format(
-                        channeldata.id, op_name)))
+                        channeldata.id, channeldata.log_id, op_name)))
            os._exit(-1)
        producer_num = len(self._producers)
        data_id = channeldata.id
+        log_id = channeldata.log_id
        put_data = None
        with self._cv:
            if data_id not in self._input_buf:
@@ -347,8 +373,8 @@ class ProcessChannel(object):
            if put_data is None:
                _LOGGER.debug(
                    self._log(
-                        "(logid={}) Op({}) Pushed data into input_buffer.".
+                        "(data_id={} log_id={}) Op({}) Pushed data into input_buffer.".
-                        format(data_id, op_name)))
+                        format(data_id, log_id, op_name)))
            else:
                while self._stop.value == 0:
                    try:
@@ -361,8 +387,8 @@ class ProcessChannel(object):
                _LOGGER.debug(
                    self._log(
-                        "(logid={}) Op({}) Pushed data into internal_queue.".
+                        "(data_id={} log_id={}) Op({}) Pushed data into internal_queue.".
-                        format(data_id, op_name)))
+                        format(data_id, log_id, op_name)))
            self._cv.notify_all()
        return True
@@ -403,9 +429,12 @@ class ProcessChannel(object):
                            self._cv.wait()
                if self._stop.value == 1:
                    raise ChannelStopError()
-            _LOGGER.debug(
-                self._log("(logid={}) Op({}) Got data".format(resp.values()[0]
+            if resp is not None:
-                                                              .id, op_name)))
+                list_values = list(resp.values())
+                _LOGGER.debug(
+                    self._log("(data_id={} log_id={}) Op({}) Got data".format(
+                        list_values[0].id, list_values[0].log_id, op_name)))
            return resp
        elif op_name is None:
            _LOGGER.critical(
@@ -432,10 +461,12 @@ class ProcessChannel(object):
                try:
                    channeldata = self._que.get(timeout=0)
                    self._output_buf.append(channeldata)
+                    list_values = list(channeldata.values())
                    _LOGGER.debug(
                        self._log(
-                            "(logid={}) Op({}) Pop ready item into output_buffer".
+                            "(data_id={} log_id={}) Op({}) Pop ready item into output_buffer".
-                            format(channeldata.values()[0].id, op_name)))
+                            format(list_values[0].id, list_values[0].log_id,
+                                   op_name)))
                    break
                except Queue.Empty:
                    if timeout is not None:
@@ -486,9 +517,12 @@ class ProcessChannel(object):
            self._cv.notify_all()
-        _LOGGER.debug(
+        if resp is not None:
-            self._log("(logid={}) Op({}) Got data from output_buffer".format(
+            list_values = list(resp.values())
-                resp.values()[0].id, op_name)))
+            _LOGGER.debug(
+                self._log(
+                    "(data_id={} log_id={}) Op({}) Got data from output_buffer".
+                    format(list_values[0].id, list_values[0].log_id, op_name)))
        return resp
    def stop(self):
@@ -586,14 +620,14 @@ class ThreadChannel(Queue.PriorityQueue):
    def push(self, channeldata, op_name=None):
        _LOGGER.debug(
-            self._log("(logid={}) Op({}) Pushing data".format(channeldata.id,
+            self._log("(data_id={} log_id={}) Op({}) Pushing data".format(
-                                                              op_name)))
+                channeldata.id, channeldata.log_id, op_name)))
        if len(self._producers) == 0:
            _LOGGER.critical(
                self._log(
-                    "(logid={}) Op({}) Failed to push data: expected number of "
+                    "(data_id={} log_id={}) Op({}) Failed to push data: expected number of "
                    "producers to be greater than 0, but the it is 0.".format(
-                        channeldata.id, op_name)))
+                        channeldata.id, channeldata.log_id, op_name)))
            os._exit(-1)
        elif len(self._producers) == 1:
            with self._cv:
@@ -607,19 +641,21 @@ class ThreadChannel(Queue.PriorityQueue):
                    raise ChannelStopError()
                self._cv.notify_all()
            _LOGGER.debug(
-                self._log("(logid={}) Op({}) Pushed data into internal_queue.".
+                self._log(
-                          format(channeldata.id, op_name)))
+                    "(data_id={} log_id={}) Op({}) Pushed data into internal_queue.".
+                    format(channeldata.id, channeldata.log_id, op_name)))
            return True
        elif op_name is None:
            _LOGGER.critical(
                self._log(
-                    "(logid={}) Op({}) Failed to push data: there are multiple"
+                    "(data_id={} log_id={}) Op({}) Failed to push data: there are multiple"
                    " producers, so op_name cannot be None.".format(
-                        channeldata.id, op_name)))
+                        channeldata.id, channeldata.log_id, op_name)))
            os._exit(-1)
        producer_num = len(self._producers)
        data_id = channeldata.id
+        log_id = channeldata.log_id
        put_data = None
        with self._cv:
            if data_id not in self._input_buf:
@@ -639,8 +675,8 @@ class ThreadChannel(Queue.PriorityQueue):
            if put_data is None:
                _LOGGER.debug(
                    self._log(
-                        "(logid={}) Op({}) Pushed data into input_buffer.".
+                        "(data_id={} log_id={}) Op({}) Pushed data into input_buffer.".
-                        format(data_id, op_name)))
+                        format(data_id, log_id, op_name)))
            else:
                while self._stop is False:
                    try:
@@ -653,8 +689,8 @@ class ThreadChannel(Queue.PriorityQueue):
                _LOGGER.debug(
                    self._log(
-                        "(logid={}) Op({}) Pushed data into internal_queue.".
+                        "(data_id={} log_id={}) Op({}) Pushed data into internal_queue.".
-                        format(data_id, op_name)))
+                        format(data_id, log_id, op_name)))
            self._cv.notify_all()
        return True
@@ -696,9 +732,11 @@ class ThreadChannel(Queue.PriorityQueue):
                            self._cv.wait()
                if self._stop:
                    raise ChannelStopError()
-            _LOGGER.debug(
+            if resp is not None:
-                self._log("(logid={}) Op({}) Got data".format(resp.values()[0]
+                list_values = list(resp.values())
-                                                              .id, op_name)))
+                _LOGGER.debug(
+                    self._log("(data_id={} log_id={}) Op({}) Got data".format(
+                        list_values[0].id, list_values[0].log_id, op_name)))
            return resp
        elif op_name is None:
            _LOGGER.critical(
@@ -725,10 +763,12 @@ class ThreadChannel(Queue.PriorityQueue):
                try:
                    channeldata = self.get(timeout=0)
                    self._output_buf.append(channeldata)
+                    list_values = list(channeldata.values())
                    _LOGGER.debug(
                        self._log(
-                            "(logid={}) Op({}) Pop ready item into output_buffer".
+                            "(data_id={} log_id={}) Op({}) Pop ready item into output_buffer".
-                            format(channeldata.values()[0].id, op_name)))
+                            format(list_values[0].id, list_values[0].log_id,
+                                   op_name)))
                    break
                except Queue.Empty:
                    if timeout is not None:
@@ -779,9 +819,12 @@ class ThreadChannel(Queue.PriorityQueue):
            self._cv.notify_all()
-        _LOGGER.debug(
+        if resp is not None:
-            self._log("(logid={}) Op({}) Got data from output_buffer".format(
+            list_values = list(resp.values())
-                resp.values()[0].id, op_name)))
+            _LOGGER.debug(
+                self._log(
+                    "(data_id={} log_id={}) Op({}) Got data from output_buffer".
+                    format(list_values[0].id, list_values[0].log_id, op_name)))
        return resp
    def stop(self):

--- a/python/pipeline/dag.py
+++ b/python/pipeline/dag.py
--- a/python/pipeline/gateway/proto/gateway.proto
+++ b/python/pipeline/gateway/proto/gateway.proto
@@ -19,22 +19,25 @@ option go_package = ".;pipeline_serving";
 import "google/api/annotations.proto";
 message Response {
-  repeated string key = 1;
+  int32 err_no = 1;
-  repeated string value = 2;
+  string err_msg = 2;
-  int32 ecode = 3;
+  repeated string key = 3;
-  string error_info = 4;
+  repeated string value = 4;
 };
 message Request {
  repeated string key = 1;
  repeated string value = 2;
  string name = 3;
-}
+  string method = 4;
+  int64 logid = 5;
+  string clientip = 6;
+};
 service PipelineService {
  rpc inference(Request) returns (Response) {
    option (google.api.http) = {
-      post : "/{name=*}/prediction"
+      post : "/{name=*}/{method=*}"
      body : "*"
    };
  }

--- a/python/pipeline/gateway/proxy_server.go
+++ b/python/pipeline/gateway/proxy_server.go
@@ -38,7 +38,8 @@ func run_proxy_server(grpc_port int, http_port int) error {
  ctx, cancel := context.WithCancel(ctx)
  defer cancel()
-  mux := runtime.NewServeMux()
+  //EmitDefaults=true, does not filter out the default inputs 
+  mux := runtime.NewServeMux(runtime.WithMarshalerOption(runtime.MIMEWildcard, &runtime.JSONPb{OrigName: true, EmitDefaults: true}))
  opts := []grpc.DialOption{grpc.WithInsecure()}
  err := gw.RegisterPipelineServiceHandlerFromEndpoint(ctx, mux, *pipelineEndpoint, opts)
  if err != nil {

--- a/python/pipeline/local_service_handler.py
+++ b/python/pipeline/local_service_handler.py
@@ -15,101 +15,220 @@
 import os
 import logging
 import multiprocessing
-try:
+#from paddle_serving_server_gpu import OpMaker, OpSeqMaker
-    from paddle_serving_server_gpu import OpMaker, OpSeqMaker, Server
+#from paddle_serving_server_gpu import Server as GpuServer
-    PACKAGE_VERSION = "GPU"
+#from paddle_serving_server import Server as CpuServer
-except ImportError:
-    from paddle_serving_server import OpMaker, OpSeqMaker, Server
-    PACKAGE_VERSION = "CPU"
 from . import util
+#from paddle_serving_app.local_predict import LocalPredictor
 _LOGGER = logging.getLogger(__name__)
 _workdir_name_gen = util.NameGenerator("workdir_")
 class LocalServiceHandler(object):
+    """
+    LocalServiceHandler is the processor of the local service, contains
+    three client types, brpc, grpc and local_predictor.If you use the 
+    brpc or grpc, serveing startup ability is provided.If you use
+    local_predictor, local predict ability is provided by paddle_serving_app.
+    """
    def __init__(self,
                 model_config,
+                 client_type='local_predictor',
                 workdir="",
                 thread_num=2,
                 devices="",
+                 fetch_names=None,
                 mem_optim=True,
                 ir_optim=False,
-                 available_port_generator=None):
+                 available_port_generator=None,
+                 use_trt=False,
+                 use_profile=False):
+        """
+        Initialization of localservicehandler
+        Args:
+           model_config: model config path
+           client_type: brpc, grpc and local_predictor[default]
+           workdir: work directory
+           thread_num: number of threads, concurrent quantity.
+           devices: gpu id list[gpu], "" default[cpu]
+           fetch_names: get fetch names out of LocalServiceHandler in 
+               local_predictor mode. fetch_names_ is compatible for Client().
+           mem_optim: use memory/graphics memory optimization, True default.
+           ir_optim: use calculation chart optimization, False default.
+           available_port_generator: generate available ports
+           use_trt: use nvidia tensorRt engine, False default.
+           use_profile: use profiling, False default.
+        Returns:
+           None
+        """
        if available_port_generator is None:
            available_port_generator = util.GetAvailablePortGenerator()
        self._model_config = model_config
        self._port_list = []
+        self._device_type = "cpu"
        if devices == "":
            # cpu
            devices = [-1]
+            self._device_type = "cpu"
            self._port_list.append(available_port_generator.next())
            _LOGGER.info("Model({}) will be launch in cpu device. Port({})"
                         .format(model_config, self._port_list))
        else:
            # gpu
-            if PACKAGE_VERSION == "CPU":
+            self._device_type = "gpu"
-                raise ValueError(
-                    "You are using the CPU version package("
-                    "paddle-serving-server), unable to set devices")
            devices = [int(x) for x in devices.split(",")]
            for _ in devices:
                self._port_list.append(available_port_generator.next())
            _LOGGER.info("Model({}) will be launch in gpu device: {}. Port({})"
                         .format(model_config, devices, self._port_list))
+        self._client_type = client_type
        self._workdir = workdir
        self._devices = devices
        self._thread_num = thread_num
        self._mem_optim = mem_optim
        self._ir_optim = ir_optim
+        self._local_predictor_client = None
        self._rpc_service_list = []
        self._server_pros = []
-        self._fetch_vars = None
+        self._use_trt = use_trt
+        self._use_profile = use_profile
+        self.fetch_names_ = fetch_names
    def get_fetch_list(self):
-        return self._fetch_vars
+        return self.fetch_names_
    def get_port_list(self):
        return self._port_list
+    def get_client(self, concurrency_idx):
+        """
+        Function get_client is only used for local predictor case, creates one
+        LocalPredictor object, and initializes the paddle predictor by function
+        load_model_config.The concurrency_idx is used to select running devices.  
+        Args:
+            concurrency_idx: process/thread index
+        Returns:
+            _local_predictor_client
+        """
+        #checking the legality of concurrency_idx.
+        device_num = len(self._devices)
+        if device_num <= 0:
+            _LOGGER.error("device_num must be not greater than 0. devices({})".
+                          format(self._devices))
+            raise ValueError("The number of self._devices error")
+        if concurrency_idx < 0:
+            _LOGGER.error("concurrency_idx({}) must be one positive number".
+                          format(concurrency_idx))
+            concurrency_idx = 0
+        elif concurrency_idx >= device_num:
+            concurrency_idx = concurrency_idx % device_num
+        _LOGGER.info("GET_CLIENT : concurrency_idx={}, device_num={}".format(
+            concurrency_idx, device_num))
+        from paddle_serving_app.local_predict import LocalPredictor
+        if self._local_predictor_client is None:
+            self._local_predictor_client = LocalPredictor()
+            use_gpu = False
+            if self._device_type == "gpu":
+                use_gpu = True
+            self._local_predictor_client.load_model_config(
+                model_path=self._model_config,
+                use_gpu=use_gpu,
+                gpu_id=self._devices[concurrency_idx],
+                use_profile=self._use_profile,
+                thread_num=self._thread_num,
+                mem_optim=self._mem_optim,
+                ir_optim=self._ir_optim,
+                use_trt=self._use_trt)
+        return self._local_predictor_client
    def get_client_config(self):
        return os.path.join(self._model_config, "serving_server_conf.prototxt")
    def _prepare_one_server(self, workdir, port, gpuid, thread_num, mem_optim,
                            ir_optim):
-        device = "gpu"
+        """
-        if gpuid == -1:
+        According to _device_type, generating one CpuServer or GpuServer, and
-            device = "cpu"
+        setting the model config amd startup params.
-        op_maker = OpMaker()
-        read_op = op_maker.create('general_reader')
+        Args:
-        general_infer_op = op_maker.create('general_infer')
+            workdir: work directory
-        general_response_op = op_maker.create('general_response')
+            port: network port
+            gpuid: gpu id
-        op_seq_maker = OpSeqMaker()
+            thread_num: thread num
-        op_seq_maker.add_op(read_op)
+            mem_optim: use memory/graphics memory optimization
-        op_seq_maker.add_op(general_infer_op)
+            ir_optim: use calculation chart optimization
-        op_seq_maker.add_op(general_response_op)
+        Returns:
-        server = Server()
+            server: CpuServer/GpuServer
+        """
+        if self._device_type == "cpu":
+            from paddle_serving_server import OpMaker, OpSeqMaker, Server
+            op_maker = OpMaker()
+            read_op = op_maker.create('general_reader')
+            general_infer_op = op_maker.create('general_infer')
+            general_response_op = op_maker.create('general_response')
+            op_seq_maker = OpSeqMaker()
+            op_seq_maker.add_op(read_op)
+            op_seq_maker.add_op(general_infer_op)
+            op_seq_maker.add_op(general_response_op)
+            server = Server()
+        else:
+            #gpu
+            from paddle_serving_server_gpu import OpMaker, OpSeqMaker, Server
+            op_maker = OpMaker()
+            read_op = op_maker.create('general_reader')
+            general_infer_op = op_maker.create('general_infer')
+            general_response_op = op_maker.create('general_response')
+            op_seq_maker = OpSeqMaker()
+            op_seq_maker.add_op(read_op)
+            op_seq_maker.add_op(general_infer_op)
+            op_seq_maker.add_op(general_response_op)
+            server = Server()
+            if gpuid >= 0:
+                server.set_gpuid(gpuid)
        server.set_op_sequence(op_seq_maker.get_op_sequence())
        server.set_num_threads(thread_num)
        server.set_memory_optimize(mem_optim)
        server.set_ir_optimize(ir_optim)
        server.load_model_config(self._model_config)
-        if gpuid >= 0:
+        server.prepare_server(
-            server.set_gpuid(gpuid)
+            workdir=workdir, port=port, device=self._device_type)
-        server.prepare_server(workdir=workdir, port=port, device=device)
+        if self.fetch_names_ is None:
-        if self._fetch_vars is None:
+            self.fetch_names_ = server.get_fetch_list()
-            self._fetch_vars = server.get_fetch_list()
        return server
    def _start_one_server(self, service_idx):
+        """
+        Start one server
+        Args:
+            service_idx: server index
+        Returns:
+            None
+        """
        self._rpc_service_list[service_idx].run_server()
    def prepare_server(self):
+        """
+        Prepare all servers to be started, and append them into list. 
+        """
        for i, device_id in enumerate(self._devices):
            if self._workdir != "":
                workdir = "{}_{}".format(self._workdir, i)
@@ -125,6 +244,9 @@ class LocalServiceHandler(object):
                    ir_optim=self._ir_optim))
    def start_server(self):
+        """
+        Start multiple processes and start one server in each process
+        """
        for i, service in enumerate(self._rpc_service_list):
            p = multiprocessing.Process(
                target=self._start_one_server, args=(i, ))

--- a/python/pipeline/operator.py
+++ b/python/pipeline/operator.py
--- a/python/pipeline/pipeline_client.py
+++ b/python/pipeline/pipeline_client.py
@@ -18,14 +18,20 @@ import numpy as np
 from numpy import *
 import logging
 import functools
-from .channel import ChannelDataEcode
+import json
+import socket
+from .channel import ChannelDataErrcode
 from .proto import pipeline_service_pb2
 from .proto import pipeline_service_pb2_grpc
+import six
 _LOGGER = logging.getLogger(__name__)
 class PipelineClient(object):
+    """
+    PipelineClient provides the basic capabilities of the pipeline SDK
+    """
    def __init__(self):
        self._channel = None
        self._profile_key = "pipeline.profile"
@@ -42,13 +48,38 @@ class PipelineClient(object):
    def _pack_request_package(self, feed_dict, profile):
        req = pipeline_service_pb2.Request()
+        logid = feed_dict.get("logid")
+        if logid is None:
+            req.logid = 0
+        else:
+            if sys.version_info.major == 2:
+                req.logid = long(logid)
+            elif sys.version_info.major == 3:
+                req.logid = int(logid)
+            feed_dict.pop("logid")
+        clientip = feed_dict.get("clientip")
+        if clientip is None:
+            hostname = socket.gethostname()
+            ip = socket.gethostbyname(hostname)
+            req.clientip = ip
+        else:
+            req.clientip = clientip
+            feed_dict.pop("clientip")
        np.set_printoptions(threshold=sys.maxsize)
        for key, value in feed_dict.items():
            req.key.append(key)
+            if (sys.version_info.major == 2 and isinstance(value,
+                                                           (str, unicode)) or
+                ((sys.version_info.major == 3) and isinstance(value, str))):
+                req.value.append(value)
+                continue
            if isinstance(value, np.ndarray):
                req.value.append(value.__repr__())
-            elif isinstance(value, (str, unicode)):
-                req.value.append(value)
            elif isinstance(value, list):
                req.value.append(np.array(value).__repr__())
            else:
@@ -60,29 +91,7 @@ class PipelineClient(object):
        return req
    def _unpack_response_package(self, resp, fetch):
-        if resp.ecode != 0:
+        return resp
-            return {
-                "ecode": resp.ecode,
-                "ecode_desc": ChannelDataEcode(resp.ecode),
-                "error_info": resp.error_info,
-            }
-        fetch_map = {"ecode": resp.ecode}
-        for idx, key in enumerate(resp.key):
-            if key == self._profile_key:
-                if resp.value[idx] != "":
-                    sys.stderr.write(resp.value[idx])
-                continue
-            if fetch is not None and key not in fetch:
-                continue
-            data = resp.value[idx]
-            try:
-                evaled_data = eval(data)
-                if isinstance(evaled_data, np.ndarray):
-                    data = evaled_data
-            except Exception as e:
-                pass
-            fetch_map[key] = data
-        return fetch_map
    def predict(self, feed_dict, fetch=None, asyn=False, profile=False):
        if not isinstance(feed_dict, dict):

--- a/python/pipeline/pipeline_server.py
+++ b/python/pipeline/pipeline_server.py
--- a/python/pipeline/proto/pipeline_service.proto
+++ b/python/pipeline/proto/pipeline_service.proto
@@ -19,13 +19,16 @@ message Request {
  repeated string key = 1;
  repeated string value = 2;
  optional string name = 3;
+  optional string method = 4;
+  optional int64 logid = 5;
+  optional string clientip = 6;
 };
 message Response {
-  repeated string key = 1;
+  optional int32 err_no = 1;
-  repeated string value = 2;
+  optional string err_msg = 2;
-  required int32 ecode = 3;
+  repeated string key = 3;
-  optional string error_info = 4;
+  repeated string value = 4;
 };
 service PipelineService {

--- a/python/setup.py.app.in
+++ b/python/setup.py.app.in
@@ -32,8 +32,8 @@ if '${PACK}' == 'ON':
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'sentencepiece', 'opencv-python<=4.2.0.32', 'pillow',
+    'six >= 1.10.0', 'sentencepiece<=0.1.92', 'opencv-python<=4.2.0.32', 'pillow',
-    'shapely<=1.6.1', 'pyclipper'
+    'pyclipper'
 ]
 packages=['paddle_serving_app',

--- a/python/setup.py.client.in
+++ b/python/setup.py.client.in
@@ -43,8 +43,8 @@ if '${PACK}' == 'ON':
    copy_lib()
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.11.0', 'numpy >= 1.12', 'grpcio >= 1.28.1',
+    'six >= 1.10.0', 'protobuf >= 3.11.0', 'numpy >= 1.12', 'grpcio <= 1.33.2',
-    'grpcio-tools >= 1.28.1'
+    'grpcio-tools <= 1.33.2'
 ]

--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
@@ -28,7 +28,7 @@ max_version, mid_version, min_version = util.python_version()
 util.gen_pipeline_code("paddle_serving_server")
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
+    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio <= 1.33.2', 'grpcio-tools <= 1.33.2',
    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app', 'func_timeout', 'pyyaml'
 ]

--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
@@ -30,7 +30,7 @@ max_version, mid_version, min_version = util.python_version()
 util.gen_pipeline_code("paddle_serving_server_gpu")
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
+    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio <= 1.33.2', 'grpcio-tools <= 1.33.2',
    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app', 'func_timeout', 'pyyaml'
 ]

--- a/requirements.txt
+++ b/requirements.txt
+sphinx==2.1.0
+mistune
+sphinx_rtd_theme
+paddlepaddle>=1.8.4
+shapely<=1.6.1
--- a/doc/requirements.txt
+++ b/doc/requirements.txt
 sphinx==2.1.0
 mistune
 sphinx_rtd_theme
-paddlepaddle>=1.6
+paddlepaddle>=1.8.4
+shapely
--- a/tools/Dockerfile.cuda10.1-cudnn7-trt6.devel
+++ b/tools/Dockerfile.cuda10.1-cudnn7-trt6.devel