Merge branch 'develop' into PaddlePM-patch-1

589372ec · Jiawei Wang · GitHub · 87961979 · e18a0045 · 589372ec
28 changed file
--- a/README.md
+++ b/README.md
@@ -45,21 +45,24 @@ nvidia-docker exec -it test bash
 ```
 ```shell
-pip install paddle-serving-client==0.3.2 
+pip install paddle-serving-client==0.4.0 
-pip install paddle-serving-server==0.3.2 # CPU
+pip install paddle-serving-server==0.4.0 # CPU
-pip install paddle-serving-server-gpu==0.3.2.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
-pip install paddle-serving-server-gpu==0.3.2.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.4.0.trt # GPU with CUDA10.1+TensorRT
 ```
 You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
 If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
-Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7 and Ubuntu 16/18.
+Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.
-Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.6/3.7.
+Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7.
-Recommended to install paddle >= 1.8.2.
+Recommended to install paddle >= 1.8.4.
+For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
 <h2 align="center"> Pre-built services with Paddle Serving</h2>
@@ -111,11 +114,11 @@ tar -xzf uci_housing.tar.gz
 Paddle Serving provides HTTP and RPC based service for users to access
-### HTTP service
+### RPC service
-Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a RPC service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
+A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```
 <center>
@@ -123,39 +126,24 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 |--------------|------|-----------|--------------------------------|
 | `thread` | int | `4` | Concurrency of current service |
 | `port` | int | `9292` | Exposed port of current service to users|
-| `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
 | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
 | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
 | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT  |
-Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
 </center>
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-```
-### RPC service
-A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
-``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
-```
 ``` python
 # A user can visit rpc service through paddle_serving_client API
 from paddle_serving_client import Client
+import numpy as np
 client = Client()
 client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9292"])
 data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-fetch_map = client.predict(feed={"x": data}, fetch=["price"])
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
 print(fetch_map)
 ```
 Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
@@ -167,6 +155,40 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
 - **Highly concurrent and efficient communication** between clients and servers supported.
 - **Multiple programming languages** supported on client side, such as Golang, C++ and python.
+### WEB service
+Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is `python/examples/fit_a_line`
+```python
+from paddle_serving_server.web_service import WebService
+import numpy as np
+class UciService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
+uci_service = UciService(name="uci")
+uci_service.load_model_config("uci_housing_model")
+uci_service.prepare_server(workdir="workdir", port=9292)
+uci_service.run_rpc_service()
+uci_service.run_web_service()
+```
+for client side,
+```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
+the response is
+```
+{"result":{"price":[[18.901151657104492]]}}
+```
 <h2 align="center">Document</h2>
 ### New to Paddle Serving

--- a/README_CN.md
+++ b/README_CN.md
@@ -47,21 +47,24 @@ nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/se
 nvidia-docker exec -it test bash
 ```
 ```shell
-pip install paddle-serving-client==0.3.2
+pip install paddle-serving-client==0.4.0
-pip install paddle-serving-server==0.3.2 # CPU
+pip install paddle-serving-server==0.4.0 # CPU
-pip install paddle-serving-server-gpu==0.3.2.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
-pip install paddle-serving-server-gpu==0.3.2.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.4.0.trt # GPU with CUDA10.1+TensorRT
 ```
 您可能需要使用国内镜像源（例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`）来加速下载。
 如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。
-paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7和Ubuntu 16/18。
+paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。
 paddle-serving-client和paddle-serving-app安装包支持Linux和Windows，其中paddle-serving-client仅支持python2.7/3.5/3.6。
-推荐安装1.8.2及以上版本的paddle
+推荐安装1.8.4及以上版本的paddle
+对于**Windows 10 用户**，请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)。
 <h2 align="center"> Paddle Serving预装的服务 </h2>
@@ -105,13 +108,12 @@ tar -xzf uci_housing.tar.gz
 Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
+<h3 align="center">RPC服务</h3>
-<h3 align="center">HTTP服务</h3>
+用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发，但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。
-Paddle Serving提供了一个名为`paddle_serving_server.serve`的内置python模块，可以使用单行命令启动RPC服务或HTTP服务。如果我们指定参数`--name uci`，则意味着我们将拥有一个HTTP服务，其URL为$IP:$PORT/uci/prediction`。
 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```
 <center>
@@ -126,21 +128,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT  |
-我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
+我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文
+档 [requests](https://requests.readthedocs.io/en/master/)。
 </center>
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-```
-<h3 align="center">RPC服务</h3>
-用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发，但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。
-``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
-```
 ``` python
 # A user can visit rpc service through paddle_serving_client API
 from paddle_serving_client import Client
@@ -150,12 +141,45 @@ client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9292"])
 data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-fetch_map = client.predict(feed={"x": data}, fetch=["price"])
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
 print(fetch_map)
 ```
 在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。
+<h3 align="center">HTTP服务</h3>
+用户也可以将数据格式处理逻辑放在服务器端进行，这样就可以直接用curl去访问服务，参考如下案例，在目录``python/examples/fit_a_line``
+```python
+from paddle_serving_server.web_service import WebService
+import numpy as np
+class UciService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
+uci_service = UciService(name="uci")
+uci_service.load_model_config("uci_housing_model")
+uci_service.prepare_server(workdir="workdir", port=9292)
+uci_service.run_rpc_service()
+uci_service.run_web_service()
+```
+客户端输入
+```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
+返回结果
+```
+{"result":{"price":[[18.901151657104492]]}}
+```
 <h2 align="center">Paddle Serving的核心功能</h2>
 - 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.

--- a/doc/COMPILE.md
+++ b/doc/COMPILE.md
@@ -100,14 +100,21 @@ make -j10
 you can execute `make install` to put targets under directory `./output`, you need to add`-DCMAKE_INSTALL_PREFIX=./output`to specify output path to cmake command shown above.
 ### Integrated GPU version paddle inference library
+### CUDA_PATH is the cuda install path,use the command(whereis cuda) to check,it should be /usr/local/cuda.
+### CUDNN_LIBRARY && CUDA_CUDART_LIBRARY is the lib path, it should be /usr/local/cuda/lib64/
 ``` shell
+export CUDA_PATH='/usr/local'
+export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
+export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
 mkdir server-build-gpu && cd server-build-gpu
 cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \  
+    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
+    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \  
    -DSERVER=ON \
    -DWITH_GPU=ON ..
 make -j10
@@ -116,6 +123,10 @@ make -j10
 ### Integrated TRT version paddle inference library
 ```
+export CUDA_PATH='/usr/local'
+export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
+export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
 mkdir server-build-trt && cd server-build-trt
 cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
@@ -123,6 +134,7 @@ cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
    -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
+    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
    -DSERVER=ON \
    -DWITH_GPU=ON \
    -DWITH_TRT=ON ..
@@ -166,12 +178,14 @@ make
 ## Install wheel package
 Regardless of the client, server or App part, after compiling, install the whl package in `python/dist/` in the temporary directory(`server-build-cpu`, `server-build-gpu`, `client-build`,`app-build`) of the compilation process.
+for example：cd server-build-cpu/python/dist && pip install -U xxxxx.whl
 ## Note
 When running the python server, it will check the `SERVING_BIN` environment variable. If you want to use your own compiled binary file, set the environment variable to the path of the corresponding binary file, usually`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`.
+BUILD_DIR is the absolute path of server build CPU or server build GPU。
+for example: cd server-build-cpu && export SERVING_BIN=${PWD}/core/general-server/serving

--- a/doc/COMPILE_CN.md
+++ b/doc/COMPILE_CN.md
@@ -97,14 +97,20 @@ make -j10
 可以执行`make install`把目标产出放在`./output`目录下，cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
 ### 集成GPU版本Paddle Inference Library
+### CUDA_PATH是cuda的安装路径，可以使用命令行whereis cuda命令确认你的cuda安装路径，通常应该是/usr/local/cuda
+### CUDNN_LIBRARY CUDA_CUDART_LIBRARY 是cuda库文件的路径，通常应该是/usr/local/cuda/lib64/
 ``` shell
+export CUDA_PATH='/usr/local'
+export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
+export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
 mkdir server-build-gpu && cd server-build-gpu
 cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
    -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
+    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
    -DSERVER=ON \
    -DWITH_GPU=ON ..
 make -j10
@@ -113,6 +119,10 @@ make -j10
 ### 集成TensorRT版本Paddle Inference Library
 ```
+export CUDA_PATH='/usr/local'
+export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
+export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
 mkdir server-build-trt && cd server-build-trt
 cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
    -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
@@ -120,6 +130,7 @@ cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
    -DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
    -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
    -DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
+    -DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
    -DSERVER=ON \
    -DWITH_GPU=ON \
    -DWITH_TRT=ON ..
@@ -162,12 +173,16 @@ make
 ## 安装wheel包
 无论是Client端，Server端还是App部分，编译完成后，安装编译过程临时目录（`server-build-cpu`、`server-build-gpu`、`client-build`、`app-build`）下的`python/dist/` 中的whl包即可。
+例如：cd server-build-cpu/python/dist && pip install -U xxxxx.whl
 ## 注意事项
 运行python端Server时，会检查`SERVING_BIN`环境变量，如果想使用自己编译的二进制文件，请将设置该环境变量为对应二进制文件的路径，通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`。
+其中BUILD_DIR为server-build-cpu或server-build-gpu的绝对路径。
+可以cd server-build-cpu路径下，执行export SERVING_BIN=${PWD}/core/general-server/serving

--- a/doc/DOCKER_IMAGES.md
+++ b/doc/DOCKER_IMAGES.md
@@ -28,6 +28,7 @@ You can get images in two ways:
 ## Image description
 Runtime images cannot be used for compilation.
+If you want to customize your Serving based on source code, use the version with the suffix - devel.
 |                         Description                          |   OS    |             TAG              |                          Dockerfile                          |
 | :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |

--- a/doc/DOCKER_IMAGES_CN.md
+++ b/doc/DOCKER_IMAGES_CN.md
@@ -28,6 +28,7 @@
 ## 镜像说明
 运行时镜像不能用于开发编译。
+若需要基于源代码二次开发编译，请使用后缀为-devel的版本。
 | 镜像说明                                           | 操作系统 | TAG                          | Dockerfile                                                   |
 | -------------------------------------------------- | -------- | ---------------------------- | ------------------------------------------------------------ |

--- a/doc/PIPELINE_SERVING.md
+++ b/doc/PIPELINE_SERVING.md
@@ -7,14 +7,46 @@ Paddle Serving is usually used for the deployment of single model, but the end-t
 Paddle Serving provides a user-friendly programming framework for multi-model composite services, Pipeline Serving, which aims to reduce the threshold of programming, improve resource utilization (especially GPU), and improve the prediction efficiency.
-## Architecture Design
+## ★ Architecture Design
-The Server side is built based on gRPC and graph execution engine. The relationship between them is shown in the following figure.
+The Server side is built based on <b>RPC Service</b> and <b>graph execution engine</b>. The relationship between them is shown in the following figure.
 <center>
 <img src='pipeline_serving-image1.png' height = "250" align="middle"/>
 </center>
-### Graph Execution Engine
+### 1. RPC Service
+In order to meet the needs of different users, the RPC service starts one Web server and one RPC server at the same time, and can process 2 types of requests, RESTful API and gRPC.The gPRC gateway receives RESTful API requests and forwards requests to the gRPC server through the reverse proxy server; gRPC requests are received by the gRPC server, so the two types of requests are processed by the gRPC Service in a unified manner to ensure that the processing logic is consistent.
+#### <b>1.1 Request and Respose of proto
+gRPC service and gRPC gateway service are generated with service.proto.
+```proto
+message Request {
+  repeated string key = 1;  
+  repeated string value = 2;
+  optional string name = 3;
+  optional string method = 4;
+  optional int64 logid = 5;
+  optional string clientip = 6;
+};
+message Response {
+  optional int32 err_no = 1;
+  optional string err_msg = 2;
+  repeated string key = 3;
+  repeated string value = 4;
+};
+```
+The `key` and `value` in the Request are paired string arrays. The `name` and `method` correspond to the URL of the RESTful API://{ip}:{port}/{name}/{method}.The `logid` and `clientip` are convenient for users to connect service-level requests and customize strategies.
+In Response, `err_no` and `err_msg` express the correctness and error information of the processing result, and `key` and `value` are the returned results.
+### 2. Graph Execution Engine
 The graph execution engine consists of OPs and Channels, and the connected OPs share one Channel.
@@ -28,7 +60,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 </center>
-### OP Design
+#### <b>2.1 OP Design</b>
 - The default function of a single OP is to access a single Paddle Serving Service based on the input Channel data and put the result into the output Channel.
 - OP supports user customization, including preprocess, process, postprocess functions that can be inherited and implemented by the user.
@@ -36,7 +68,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 - OP can obtain data from multiple different RPC requests for Auto-Batching.
 - OP can be started by a thread or process.
-### Channel Design
+#### <b>2.2 Channel Design</b>
 - Channel is the data structure for sharing data between OPs, responsible for sharing data or sharing data status information.
 - Outputs from multiple OPs can be stored in the same Channel, and data from the same Channel can be used by multiple OPs.
@@ -47,8 +79,17 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 </center>
+#### <b>2.3 client type design</b>
+- Prediction type (client_type) of Op has 3 types, brpc, grpc and local_predictor
+- brpc: Using bRPC Client to interact with remote Serving by network, performance is better than grpc.
+  - grpc: Using gRPC Client to interact with remote Serving by network, cross-platform deployment supported.
+  - local_predictor: Load the model and predict in the local service without interacting with the network. Support multi-card deployment, and TensorRT prediction.
+  - Selection: 
+    - Time cost(lower is better): local_predict < brpc <= grpc
+    - Microservice: Split the brpc or grpc model into independent services, simplify development and deployment complexity, and improve resource utilization
-### Extreme Case Consideration
+#### <b>2.4 Extreme Case Consideration</b>
 - Request timeout
@@ -65,9 +106,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
  - For output buffer, you can use a similar process as input buffer, which adjusts the concurrency of OP3 and OP4 to control the buffer length of output buffer. (The length of the output buffer depends on the speed at which downstream OPs obtain data from the output buffer)
  - The amount of data in the Channel will not exceed `worker_num` of gRPC, that is, it will not exceed the thread pool size.
-## Detailed Design
+## ★ Detailed Design
-### User Interface Design
 #### 1. General OP Definition
@@ -79,11 +118,13 @@ def __init__(name=None,
             server_endpoints=[],
             fetch_list=[],
             client_config=None,
+             client_type=None,
             concurrency=1,
             timeout=-1,
             retry=1,
             batch_size=1,
-             auto_batching_timeout=None)
+             auto_batching_timeout=None,
+             local_service_handler=None)
 ```
 The meaning of each parameter is as follows:
@@ -92,14 +133,16 @@ The meaning of each parameter is as follows:
 | :-------------------: | :----------------------------------------------------------: |
 |         name          | (str) String used to identify the OP type, which must be globally unique. |
 |       input_ops       |     (list) A list of all previous OPs of the current Op.     |
-|   server_endpoints    | (list) List of endpoints for remote Paddle Serving Service. If this parameter is not set, the OP will not access the remote Paddle Serving Service, that is, the process operation will not be performed. |
+|   server_endpoints    | (list) List of endpoints for remote Paddle Serving Service. If this parameter is not set,it is considered as local_precditor mode, and the configuration is read from local_service_conf |
 |      fetch_list       | (list) List of fetch variable names for remote Paddle Serving Service. |
 |     client_config     | (str) The path of the client configuration file corresponding to the Paddle Serving Service. |
+|     client_type       | （str)brpc, grpc or local_predictor. local_predictor does not start the Serving service, in-process prediction|
 |      concurrency      |             (int) The number of concurrent OPs.              |
 |        timeout        | (int) The timeout time of the process operation, in ms. If the value is less than zero, no timeout is considered. |
 |         retry         | (int) Timeout number of retries. When the value is 1, no retries are made. |
 |      batch_size       | (int) The expected batch_size of Auto-Batching, since building batches may time out, the actual batch_size may be less than the set value. |
-| auto_batching_timeout | (float) Timeout for building batches of Auto-Batching (the unit is ms). |
+| auto_batching_timeout | (float) Timeout for building batches of Auto-Batching (the unit is ms). When batch_size> 1, auto_batching_timeout should be set, otherwise the waiting will be blocked when the number of requests is insufficient for batch_size|
+| local_service_handler | (object) local predictor handler，assigned by Op init() input parameters or created in Op init()|
 #### 2. General OP Secondary Development Interface
@@ -156,7 +199,7 @@ def init_op(self):
 It should be **noted** that in the threaded version of OP, each OP will only call this function once, so the loaded resources must be thread safe.
-#### 3. RequestOp Definition
+#### 3. RequestOp Definition and Secondary Development Interface
 RequestOp is used to process RPC data received by Pipeline Server, and the processed data will be added to the graph execution engine. Its constructor is as follows:
@@ -164,7 +207,7 @@ RequestOp is used to process RPC data received by Pipeline Server, and the proce
 def __init__(self)
 ```
-#### 4. RequestOp Secondary Development Interface
+When the default RequestOp cannot meet the parameter parsing requirements, you can customize the request parameter parsing method by rewriting the following two interfaces.
 |           Interface or Variable           |                           Explain                            |
 | :---------------------------------------: | :----------------------------------------------------------: |
@@ -188,7 +231,7 @@ def unpack_request_package(self, request):
 The return value is required to be a dictionary type.
-#### 5. ResponseOp Definition
+#### 4. ResponseOp Definition and Secondary Development Interface
 ResponseOp is used to process the prediction results of the graph execution engine. The processed data will be used as the RPC return value of Pipeline Server. Its constructor is as follows:
@@ -198,7 +241,7 @@ def __init__(self, input_ops)
 `input_ops` is the last OP of graph execution engine. Users can construct different DAGs by setting different `input_ops` without modifying the topology of OPs.
-#### 6. ResponseOp Secondary Development Interface
+When the default ResponseOp cannot meet the requirements of the result return format, you can customize the return package packaging method by rewriting the following two interfaces.
 |            Interface or Variable             |                           Explain                            |
 | :------------------------------------------: | :----------------------------------------------------------: |
@@ -237,7 +280,7 @@ def pack_response_package(self, channeldata):
  return resp
 ```
-#### 7. PipelineServer Definition
+#### 5. PipelineServer Definition
 The definition of PipelineServer is relatively simple, as follows:
@@ -251,22 +294,137 @@ server.run_server()
 Where `response_op` is the responseop mentioned above, PipelineServer will initialize Channels according to the topology relationship of each OP and build the calculation graph. `config_yml_path` is the configuration file of PipelineServer. The example file is as follows:
 ```yaml
-rpc_port: 18080  # gRPC port
+# gRPC port
-worker_num: 1  # gRPC thread pool size (the number of processes in the process version servicer). The default is 1
+rpc_port: 18080  
-build_dag_each_worker: false  # Whether to use process server or not. The default is false
-http_port: 0 # HTTP service port. Do not start HTTP service when the value is less or equals 0. The default value is 0.
+# http port, do not start HTTP service when the value is less or equals 0. The default value is 0.
+http_port: 18071 
+# gRPC thread pool size (the number of processes in the process version servicer). The default is 1
+worker_num: 1  
+ # Whether to use process server or not. The default is false
+build_dag_each_worker: false 
 dag:
-    is_thread_op: true  # Whether to use the thread version of OP. The default is true
+    # Whether to use the thread version of OP. The default is true
-    client_type: brpc  # Use brpc or grpc client. The default is brpc
+    is_thread_op: true  
-    retry: 1  # The number of times DAG executor retries after failure. The default value is 1, that is, no retrying
-    use_profile: false  # Whether to print the log on the server side. The default is false
+    # The number of times DAG executor retries after failure. The default value is 1, that is, no retrying
+    retry: 1 
+    # Whether to print the log on the server side. The default is false
+    use_profile: false  
+    # Monitoring time interval of Tracer (in seconds). Do not start monitoring when the value is less than 1. The default value is -1
    tracer:
-        interval_s: 600 # Monitoring time interval of Tracer (in seconds). Do not start monitoring when the value is less than 1. The default value is -1
+        interval_s: 600 
+op:
+    bow:
+        # Concurrency, when is_thread_op=True, it's thread concurrency; otherwise, it's process concurrency
+        concurrency: 1
+        # Client types, brpc, grpc and local_predictor
+        client_type: brpc
+        # Retry times, no retry by default
+        retry: 1
+        # Prediction timeout, ms
+        timeout: 3000
+        # Serving IPs
+        server_endpoints: ["127.0.0.1:9393"]
+        # Client config of bow model
+        client_config: "imdb_bow_client_conf/serving_client_conf.prototxt"
+        # Fetch list
+        fetch_list: ["prediction"]    
+        # Batch size, default 1
+        batch_size: 1
+        # Batch query timeout
+        auto_batching_timeout: 2000
+```
+### 6. Special usages
+#### 6.1 <b>Business custom error type</b>
+Users can customize the error code according to the business, inherit ProductErrCode, and return it in the return list in Op's preprocess or postprocess. The next stage of processing will skip the post OP processing based on the custom error code.
+```python
+class ProductErrCode(enum.Enum):
+    """
+    ProductErrCode is a base class for recording business error code. 
+    product developers inherit this class and extend more error codes. 
+    """
+    pass
+```
+#### <b>6.2 Skip process stage</b>
+The 2rd result of the result list returned by preprocess is `is_skip_process=True`, indicating whether to skip the process stage of the current OP and directly enter the postprocess processing
+```python
+def preprocess(self, input_dicts, data_id, log_id):
+        """
+        In preprocess stage, assembling data for process stage. users can 
+        override this function for model feed features.
+        Args:
+            input_dicts: input data to be preprocessed
+            data_id: inner unique id
+            log_id: global unique id for RTT
+        Return:
+            input_dict: data for process stage
+            is_skip_process: skip process stage or not, False default
+            prod_errcode: None default, otherwise, product errores occured.
+                          It is handled in the same way as exception. 
+            prod_errinfo: "" default
+        """
+        # multiple previous Op
+        if len(input_dicts) != 1:
+            _LOGGER.critical(
+                self._log(
+                    "Failed to run preprocess: this Op has multiple previous "
+                    "inputs. Please override this func."))
+            os._exit(-1)
+        (_, input_dict), = input_dicts.items()
+        return input_dict, False, None, ""
 ```
+#### <b>6.3 Custom proto Request and Response</b>
+When the default proto structure does not meet the business requirements, at the same time, the Request and Response message structures of the proto in the following two files remain the same.
-## Example
+> pipeline/gateway/proto/gateway.proto 
+> pipeline/proto/pipeline_service.proto
+Recompile Serving Server again.
+#### <b>6.4 Custom URL</b>
+The grpc gateway processes post requests. The default `method` is `prediction`, for example: 127.0.0.1:8080/ocr/prediction. Users can customize the name and method, and can seamlessly switch services with existing URLs.
+```proto
+service PipelineService {
+  rpc inference(Request) returns (Response) {
+    option (google.api.http) = {
+      post : "/{name=*}/{method=*}"
+      body : "*"
+    };
+  }
+};
+```
+***
+## ★ Classic examples
 Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `python/examples/pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:
@@ -277,7 +435,7 @@ Here, we build a simple imdb model enable example to show how to use Pipeline Se
 </center>
-### Get the model file and start the Paddle Serving Service
+### 1. Get the model file and start the Paddle Serving Service
 ```shell
 cd python/examples/pipeline/imdb_model_ensemble
@@ -288,7 +446,83 @@ python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.
 PipelineServing also supports local automatic startup of PaddleServingService. Please refer to the example `python/examples/pipeline/ocr`.
-### Start PipelineServer
+### 2. Create config.yaml
+Because there is a lot of configuration information in config.yaml,, only part of the OP configuration is shown here. For full information, please refer to `python/examples/pipeline/imdb_model_ensemble/config.yaml`
+```yaml
+op:
+    bow:
+        # Concurrency, when is_thread_op=True, it's thread concurrency; otherwise, it's process concurrency
+        concurrency: 1
+        # Client types, brpc, grpc and local_predictor
+        client_type: brpc
+        # Retry times, no retry by default
+        retry: 1
+        # Predcition timeout, ms
+        timeout: 3000
+        # Serving IPs
+        server_endpoints: ["127.0.0.1:9393"]
+        # Client config of bow model
+        client_config: "imdb_bow_client_conf/serving_client_conf.prototxt"
+        # Fetch list
+        fetch_list: ["prediction"]    
+        # Batch request size, default 1
+        batch_size: 1
+        # Batch query timeout
+        auto_batching_timeout: 2000
+    cnn:
+        # Concurrency
+        concurrency: 1
+        # Client types, brpc, grpc and local_predictor
+        client_type: brpc
+        # Retry times, no retry by default
+        retry: 1
+        # Predcition timeout, ms
+        timeout: 3000
+        # Serving IPs
+        server_endpoints: ["127.0.0.1:9292"]
+        # Client config of cnn model
+        client_config: "imdb_cnn_client_conf/serving_client_conf.prototxt"
+        # Fetch list
+        fetch_list: ["prediction"]
+        # Batch request size, default 1
+        batch_size: 1
+        # Batch query timeout
+        auto_batching_timeout: 2000
+    combine:
+        # Concurrency
+        concurrency: 1
+        #R etry times, no retry by default
+        retry: 1
+        # Predcition timeout, ms
+        timeout: 3000
+        # Batch request size, default 1
+        batch_size: 1
+        # Batch query timeout, ms
+        auto_batching_timeout: 2000
+### 3. Start PipelineServer
 Run the following code
@@ -359,7 +593,7 @@ server.prepare_server('config.yml')
 server.run_server()
 ```
-### Perform prediction through PipelineClient
+### 4. Perform prediction through PipelineClient
 ```python
 from paddle_serving_client.pipeline import PipelineClient
@@ -385,13 +619,16 @@ for f in futures:
        exit(1)
 ```
+***
+## ★ Performance analysis
-## How to optimize with the timeline tool
+### 1. How to optimize with the timeline tool
 In order to better optimize the performance, PipelineServing provides a timeline tool to monitor the time of each stage of the whole service.
-### Output profile information on server side
+### 2. Output profile information on server side
 The server is controlled by the `use_profile` field in yaml:
@@ -418,8 +655,29 @@ if __name__ == "__main__":
 Specific operation: open Chrome browser, input in the address bar `chrome://tracing/` , jump to the tracing page, click the load button, open the saved `trace` file, and then visualize the time information of each stage of the prediction service.
-### Output profile information on client side
+### 3. Output profile information on client side
 The profile function can be enabled by setting `profile=True` in the `predict` interface on the client side.
 After the function is enabled, the client will print the log information corresponding to the prediction to the standard output during the prediction process, and the subsequent analysis and processing are the same as that of the server.
+### 4. Analytical methods
+```
+cost of one single OP：
+op_cost = process(pre + mid + post) 
+OP Concurrency: 
+op_concurrency = op_cost(s) * qps_expected
+Service throughput：
+service_throughput = 1 / slowest_op_cost * op_concurrency
+Service average cost：
+service_avg_cost = ∑op_concurrency in critical Path
+Channel accumulations：
+channel_acc_size = QPS(down - up) * time
+Average cost of batch predictor：
+avg_batch_cost = (N * pre + mid + post) / N 
+```
--- a/doc/PIPELINE_SERVING_CN.md
+++ b/doc/PIPELINE_SERVING_CN.md
@@ -7,15 +7,47 @@ Paddle Serving 通常用于单模型的一键部署，但端到端的深度学
 Paddle Serving 提供了用户友好的多模型组合服务编程框架，Pipeline Serving，旨在降低编程门槛，提高资源使用率（尤其是GPU设备），提升整体的预估效率。
-## 整体架构设计
+## ★ 整体架构设计
-Server端基于 gRPC 和图执行引擎构建，两者的关系如下图所示。
+Server端基于<b>RPC服务层</b>和<b>图执行引擎</b>构建，两者的关系如下图所示。
 <center>
 <img src='pipeline_serving-image1.png' height = "250" align="middle"/>
 </center>
-### 图执行引擎
+</n>
+### 1. RPC服务层
+为满足用户不同的使用需求，RPC服务层同时启动1个Web服务器和1个RPC服务器，可同时处理RESTful API、gRPC 2种类型请求。gPRC gateway接收RESTful API请求通过反向代理服务器将请求转发给gRPC Service；gRPC请求由gRPC service接收，所以，2种类型的请求统一由gRPC Service处理，确保处理逻辑一致。
+#### <b>1.1 proto的输入输出结构</b>
+gRPC服务和gRPC gateway服务统一用service.proto生成。
+```proto
+message Request {
+  repeated string key = 1;  
+  repeated string value = 2;
+  optional string name = 3;
+  optional string method = 4;
+  optional int64 logid = 5;
+  optional string clientip = 6;
+};
+message Response {
+  optional int32 err_no = 1;
+  optional string err_msg = 2;
+  repeated string key = 3;
+  repeated string value = 4;
+};
+```
+Request中`key`与`value`是配对的string数组。 `name`与`method`对应RESTful API的URL://{ip}:{port}/{name}/{method}。`logid`和`clientip`便于用户串联服务级请求和自定义策略。
+Response中`err_no`和`err_msg`表达处理结果的正确性和错误信息，`key`和`value`为返回结果。
+### 2. 图执行引擎
 图执行引擎由 OP 和 Channel 构成，相连接的 OP 之间会共享一个 Channel。
@@ -29,7 +61,7 @@ Server端基于 gRPC 和图执行引擎构建，两者的关系如下图所示
 </center>
-### OP的设计
+#### <b>2.1 OP的设计</b>
 - 单个 OP 默认的功能是根据输入的 Channel 数据，访问一个 Paddle Serving 的单模型服务，并将结果存在输出的 Channel
 - 单个 OP 可以支持用户自定义，包括 preprocess，process，postprocess 三个函数都可以由用户继承和实现
@@ -37,7 +69,7 @@ Server端基于 gRPC 和图执行引擎构建，两者的关系如下图所示
 - 单个 OP 可以获取多个不同 RPC 请求的数据，以实现 Auto-Batching
 - OP 可以由线程或进程启动
-### Channel的设计
+#### <b>2.2 Channel的设计</b>
 - Channel 是 OP 之间共享数据的数据结构，负责共享数据或者共享数据状态信息
 - Channel 可以支持多个OP的输出存储在同一个 Channel，同一个 Channel 中的数据可以被多个 OP 使用
@@ -47,8 +79,18 @@ Server端基于 gRPC 和图执行引擎构建，两者的关系如下图所示
 <img src='pipeline_serving-image3.png' height = "500" align="middle"/>
 </center>
+#### <b>2.3 预测类型的设计</b>
-### 极端情况的考虑
+- OP的预测类型(client_type)有3种类型，brpc、grpc和local_predictor
+  - brpc: 使用bRPC Client与远端的Serving服务网络交互，性能优于grpc
+  - grpc: 使用gRPC Client与远端的Serving服务网络交互，支持跨平台部署
+  - local_predictor: 本地服务内加载模型并完成预测，不需要与网络交互。支持多卡部署，和TensorRT高性能预测。
+  - 选型: 
+    - 延时(越少越好): local_predict < brpc <= grpc
+    - 微服务: brpc或grpc模型分拆成独立服务，简化开发和部署复杂度，提升资源利用率
+#### <b>2.4 极端情况的考虑</b>
 - 请求超时的处理
@@ -65,9 +107,11 @@ Server端基于 gRPC 和图执行引擎构建，两者的关系如下图所示
  - 对于 output buffer，可以采用和 input buffer 类似的处理方法，即调整 OP3 和 OP4 的并发数，使得 output buffer 的缓冲长度得到控制（output buffer 的长度取决于下游 OP 从 output buffer 获取数据的速度）
  - 同时 Channel 中数据量不会超过 gRPC 的 `worker_num`，即线程池大小
-### 用户接口设计
+***
+## ★ 详细设计
-#### 1. 普通 OP 定义
+### 1. 普通 OP 定义
 普通 OP 作为图执行引擎中的基本单元，其构造函数如下：
@@ -77,11 +121,13 @@ def __init__(name=None,
             server_endpoints=[],
             fetch_list=[],
             client_config=None,
+             client_type=None,
             concurrency=1,
             timeout=-1,
             retry=1,
             batch_size=1,
-             auto_batching_timeout=None)
+             auto_batching_timeout=None,
+             local_service_handler=None)
 ```
 各参数含义如下
@@ -90,17 +136,21 @@ def __init__(name=None,
 | :-------------------: | :----------------------------------------------------------: |
 |         name          |    （str）用于标识 OP 类型的字符串，该字段必须全局唯一。     |
 |       input_ops       |            （list）当前 OP 的所有前继 OP 的列表。            |
-|   server_endpoints    | （list）远程 Paddle Serving Service 的 endpoints 列表。如果不设置该参数，则不访问远程 Paddle Serving Service，即 不会执行 process 操作。 |
+|   server_endpoints    | （list）远程 Paddle Serving Service 的 endpoints 列表。如果不设置该参数，认为是local_precditor模式，从local_service_conf中读取配置。 |
 |      fetch_list       |     （list）远程 Paddle Serving Service 的 fetch 列表。      |
 |     client_config     | （str）Paddle Serving Service 对应的 Client 端配置文件路径。 |
+|      client_type      | (str) 可选择brpc、grpc或local_predictor。local_predictor不启动Serving服务，进程内预测。 |
 |      concurrency      |                     （int）OP 的并发数。                     |
 |        timeout        | （int）process 操作的超时时间，单位为毫秒。若该值小于零，则视作不超时。 |
 |         retry         |       （int）超时重试次数。当该值为 1 时，不进行重试。       |
-|      batch_size       | （int）进行 Auto-Batching 的期望 batch_size 大小，由于构建 batch 可能超时，实际 batch_size 可能小于设定值。 |
+|      batch_size       | （int）进行 Auto-Batching 的期望 batch_size 大小，由于构建 batch 可能超时，实际 batch_size 可能小于设定值，默认为 1。 |
-| auto_batching_timeout | （float）进行 Auto-Batching 构建 batch 的超时时间，单位为毫秒。 |
+| auto_batching_timeout | （float）进行 Auto-Batching 构建 batch 的超时时间，单位为毫秒。batch_size > 1时，要设置auto_batching_timeout，否则请求数量不足batch_size时会阻塞等待。 |
+| local_service_handler | (object) local predictor handler，Op init()入参赋值 或 在Op init()中创建|
-#### 2. 普通 OP二次开发接口
+### 2. 普通 OP二次开发接口
+OP 二次开发的目的是满足业务开发人员控制OP处理策略。
 |                    变量或接口                    |                             说明                             |
 | :----------------------------------------------: | :----------------------------------------------------------: |
@@ -154,7 +204,7 @@ def init_op(self):
 需要**注意**的是，在线程版 OP 中，每个 OP 只会调用一次该函数，故加载的资源必须要求是线程安全的。
-#### 3. RequestOp 定义
+### 3. RequestOp 定义 与 二次开发接口
 RequestOp 用于处理 Pipeline Server 接收到的 RPC 数据，处理后的数据将会被加入到图执行引擎中。其构造函数如下：
@@ -162,7 +212,7 @@ RequestOp 用于处理 Pipeline Server 接收到的 RPC 数据，处理后的数
 def __init__(self)
 ```
-#### 4. RequestOp 二次开发接口
+当默认的RequestOp无法满足参数解析需求时，可通过重写下面2个接口自定义请求参数解析方法。
 |                变量或接口                 |                    说明                    |
 | :---------------------------------------: | :----------------------------------------: |
@@ -186,7 +236,7 @@ def unpack_request_package(self, request):
 要求返回值是一个字典类型。
-#### 5. ResponseOp 定义
+#### 4. ResponseOp 定义 与 二次开发接口
 ResponseOp 用于处理图执行引擎的预测结果，处理后的数据将会作为 Pipeline Server 的RPC 返回值，其构造函数如下：
@@ -196,7 +246,7 @@ def __init__(self, input_ops)
 其中，`input_ops` 是图执行引擎的最后一个 OP，用户可以通过设置不同的 `input_ops` 以在不修改 OP 的拓扑关系下构造不同的 DAG。
-#### 6. ResponseOp 二次开发接口
+当默认的 ResponseOp 无法满足结果返回格式要求时，可通过重写下面2个接口自定义返回包打包方法。
 |                  变量或接口                  |                    说明                     |
 | :------------------------------------------: | :-----------------------------------------: |
@@ -235,7 +285,7 @@ def pack_response_package(self, channeldata):
  return resp
 ```
-#### 7. PipelineServer定义
+#### 5. PipelineServer定义
 PipelineServer 的定义比较简单，如下所示：
@@ -249,22 +299,134 @@ server.run_server()
 其中，`response_op` 为上面提到的 ResponseOp，PipelineServer 将会根据各个 OP 的拓扑关系初始化 Channel 并构建计算图。`config_yml_path` 为 PipelineServer 的配置文件，示例文件如下：
 ```yaml
-rpc_port: 18080  # gRPC端口号
+# gRPC端口号
-worker_num: 1  # gRPC线程池大小（进程版 Servicer 中为进程数），默认为 1
+rpc_port: 18080 
-build_dag_each_worker: false  # 是否使用进程版 Servicer，默认为 false
-http_port: 0 # HTTP 服务的端口号，若该值小于或等于 0 则不开启 HTTP 服务，默认为 0
+# http端口号，若该值小于或等于 0 则不开启 HTTP 服务，默认为 0
+http_port: 18071 
+# #worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+worker_num: 1  
+# 是否使用进程版 Servicer，默认为 false
+build_dag_each_worker: false  
 dag:
-    is_thread_op: true  # 是否使用线程版Op，默认为 true
+    # op资源类型, True, 为线程模型；False，为进程模型，默认为 True
-    client_type: brpc  # 使用 brpc 或 grpc client，默认为 brpc
+    is_thread_op: true  
-    retry: 1  # DAG Executor 在失败后重试次数，默认为 1，即不重试
-    use_profile: false  # 是否在 Server 端打印日志，默认为 false
+    # DAG Executor 在失败后重试次数，默认为 1，即不重试
+    retry: 1  
+    # 是否在 Server 端打印日志，默认为 false
+    use_profile: false  
+    # 跟踪框架吞吐，每个OP和channel的工作情况。无tracer时不生成数据
    tracer:
-        interval_s: 600 # Tracer 监控的时间间隔，单位为秒。当该值小于 1 时不启动监控，默认为 -1
+        interval_s: 600 # 监控的时间间隔，单位为秒。当该值小于 1 时不启动监控，默认为 -1
+op:
+    bow:
+        # 并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+        # client连接类型，brpc
+        client_type: brpc
+        # Serving交互重试次数，默认不重试
+        retry: 1
+        # Serving交互超时时间, 单位ms
+        timeout: 3000
+        # Serving IPs
+        server_endpoints: ["127.0.0.1:9393"]
+        # bow模型client端配置
+        client_config: "imdb_bow_client_conf/serving_client_conf.prototxt"
+        # Fetch结果列表，以client_config中fetch_var的alias_name为准
+        fetch_list: ["prediction"]
+        # 批量查询Serving的数量, 默认1。batch_size>1要设置 auto_batching_timeout，否则不足batch_size时会阻塞
+        batch_size: 1
+        # 批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
+```
+### 6. 特殊用法
+#### 6.1 <b>业务自定义错误类型</b>
+用户可根据业务场景自定义错误码，继承ProductErrCode，在Op的preprocess或postprocess中返回列表中返回，下一阶段处理会根据自定义错误码跳过后置OP处理。
+```python
+class ProductErrCode(enum.Enum):
+    """
+    ProductErrCode is a base class for recording business error code. 
+    product developers inherit this class and extend more error codes. 
+    """
+    pass
+```
+#### <b>6.2 跳过OP process阶段</b>
+preprocess返回结果列表的第二个结果是`is_skip_process=True`表示是否跳过当前OP的process阶段，直接进入postprocess处理
+```python
+def preprocess(self, input_dicts, data_id, log_id):
+        """
+        In preprocess stage, assembling data for process stage. users can 
+        override this function for model feed features.
+        Args:
+            input_dicts: input data to be preprocessed
+            data_id: inner unique id
+            log_id: global unique id for RTT
+        Return:
+            input_dict: data for process stage
+            is_skip_process: skip process stage or not, False default
+            prod_errcode: None default, otherwise, product errores occured.
+                          It is handled in the same way as exception. 
+            prod_errinfo: "" default
+        """
+        # multiple previous Op
+        if len(input_dicts) != 1:
+            _LOGGER.critical(
+                self._log(
+                    "Failed to run preprocess: this Op has multiple previous "
+                    "inputs. Please override this func."))
+            os._exit(-1)
+        (_, input_dict), = input_dicts.items()
+        return input_dict, False, None, ""
 ```
+#### <b>6.3 自定义proto Request 和 Response结构</b>
+当默认proto结构不满足业务需求时，同时下面2个文件的proto的Request和Response message结构，保持一致。
-## 例子
+> pipeline/gateway/proto/gateway.proto 
+> pipeline/proto/pipeline_service.proto
+再重新编译Serving Server。
+#### <b>6.4 自定义URL</b>
+grpc gateway处理post请求，默认`method`是`prediction`，例如:127.0.0.1:8080/ocr/prediction。用户可自定义name和method，对于已有url的服务可无缝切换
+```proto
+service PipelineService {
+  rpc inference(Request) returns (Response) {
+    option (google.api.http) = {
+      post : "/{name=*}/{method=*}"
+      body : "*"
+    };
+  }
+};
+```
+***
+## ★ 典型示例
 这里通过搭建简单的 imdb model ensemble 例子来展示如何使用 Pipeline Serving，相关代码在 `python/examples/pipeline/imdb_model_ensemble` 文件夹下可以找到，例子中的 Server 端结构如下图所示：
@@ -275,7 +437,7 @@ dag:
 </center>
-### 获取模型文件并启动 Paddle Serving Service
+### 1. 获取模型文件并启动 Paddle Serving Service
 ```shell
 cd python/examples/pipeline/imdb_model_ensemble
@@ -286,9 +448,84 @@ python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.
 PipelineServing 也支持本地自动启动 PaddleServingService，请参考 `python/examples/pipeline/ocr` 下的例子。
-### 启动 PipelineServer
+### 2. 创建config.yaml
+由于config.yaml配置信息量很多，这里仅展示OP部分配置，全量信息参考`python/examples/pipeline/imdb_model_ensemble/config.yaml`
+```yaml
+op:
+    bow:
+        # 并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+        # client连接类型，brpc
+        client_type: brpc
+        # Serving交互重试次数，默认不重试
+        retry: 1
+        # Serving交互超时时间, 单位ms
+        timeout: 3000
+        # Serving IPs
+        server_endpoints: ["127.0.0.1:9393"]
+        # bow模型client端配置
+        client_config: "imdb_bow_client_conf/serving_client_conf.prototxt"
+        # Fetch结果列表，以client_config中fetch_var的alias_name为准
+        fetch_list: ["prediction"]
+        # 批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout，否则不足batch_size时会阻塞
+        batch_size: 1
+        # 批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
+    cnn:
+        # 并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+        # client连接类型，brpc
+        client_type: brpc
+        # Serving交互重试次数，默认不重试
+        retry: 1
+        # 预测超时时间, 单位ms
+        timeout: 3000
-运行下面代码
+        # Serving IPs
+        server_endpoints: ["127.0.0.1:9292"]
+        # cnn模型client端配置
+        client_config: "imdb_cnn_client_conf/serving_client_conf.prototxt"
+        # Fetch结果列表，以client_config中fetch_var的alias_name为准
+        fetch_list: ["prediction"]
+        # 批量查询Serving的数量, 默认1。
+        batch_size: 1
+        # 批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
+    combine:
+        # 并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+        # Serving交互重试次数，默认不重试
+        retry: 1
+        # 预测超时时间, 单位ms
+        timeout: 3000
+        # 批量查询Serving的数量, 默认1。
+        batch_size: 1
+        # 批量查询超时，与batch_size配合使用
+        auto_batching_timeout: 2000
+```
+### 3. 启动 PipelineServer
+代码示例中，重点留意3个自定义Op的proprocess、postprocess处理，以及Combin Op初始化列表input_ops=[bow_op, cnn_op]，设置Combin Op的前置OP列表。
 ```python
 from paddle_serving_server.pipeline import Op, RequestOp, ResponseOp
@@ -356,7 +593,7 @@ server.prepare_server('config.yml')
 server.run_server()
 ```
-### 通过 PipelineClient 执行预测
+### 4. 通过 PipelineClient 执行预测
 ```python
 from paddle_serving_client.pipeline import PipelineClient
@@ -382,13 +619,16 @@ for f in futures:
        exit(1)
 ```
+***
+## ★ 性能分析
-## 如何通过 Timeline 工具进行优化
+### 1. 如何通过 Timeline 工具进行优化
 为了更好地对性能进行优化，PipelineServing 提供了 Timeline 工具，对整个服务的各个阶段时间进行打点。
-### 在 Server 端输出 Profile 信息
+### 2. 在 Server 端输出 Profile 信息
 Server 端用 yaml 中的 `use_profile` 字段进行控制：
@@ -415,8 +655,29 @@ if __name__ == "__main__":
 具体操作：打开 chrome 浏览器，在地址栏输入 `chrome://tracing/` ，跳转至 tracing 页面，点击 load 按钮，打开保存的 `trace` 文件，即可将预测服务的各阶段时间信息可视化。
-### 在 Client 端输出 Profile 信息
+### 3. 在 Client 端输出 Profile 信息
 Client 端在 `predict` 接口设置 `profile=True`，即可开启 Profile 功能。
 开启该功能后，Client 端在预测的过程中会将该次预测对应的日志信息打印到标准输出，后续分析处理同 Server。
+### 4. 分析方法
+```
+单OP耗时：
+op_cost = process(pre + mid + post) 
+OP期望并发数：
+op_concurrency  = 单OP耗时(s) * 期望QPS
+服务吞吐量：
+service_throughput = 1 / 最慢OP的耗时 * 并发数
+服务平响：
+service_avg_cost = ∑op_concurrency 【关键路径】
+Channel堆积：
+channel_acc_size = QPS(down - up) * time
+批量预测平均耗时：
+avg_batch_cost = (N * pre + mid + post) / N 
+```
--- a/doc/RUN_IN_DOCKER.md
+++ b/doc/RUN_IN_DOCKER.md
@@ -2,6 +2,8 @@
 ([简体中文](RUN_IN_DOCKER_CN.md)|English)
+One of the biggest benefits of Docker is portability, which can be deployed on multiple operating systems and mainstream cloud computing platforms. The Paddle Serving Docker image can be deployed on Linux, Mac and Windows platforms.
 ## Requirements
 Docker (GPU version requires nvidia-docker to be installed on the GPU machine)
@@ -30,63 +32,9 @@ The `-p` option is to map the `9292` port of the container to the `9292` port of
 ### Install PaddleServing
-In order to make the image smaller, the PaddleServing package is not installed in the image. You can run the following command to install it:
+The mirror comes with `paddle_serving_server`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
-```bash
-pip install paddle-serving-server
-```
-You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source of the following example) to speed up the download:
-```shell
-pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-### Test example
-Get the trained Boston house price prediction model by the following command:
-```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
-```
- Test HTTP service
-  Running on the Server side (inside the container):
-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
-  ```
-  Running on the Client side (inside or outside the container):
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
- Test RPC service
-  Running on the Server side (inside the container):
-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
-  ```
-  Running following Python code on the Client side (inside or outside the container, The `paddle-serving-client` package needs to be installed):
-  ```bash
-  from paddle_serving_client import Client
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
+If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.
 ## GPU
@@ -98,7 +46,7 @@ The GPU version is basically the same as the CPU version, with only some differe
 Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
 ```shell
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
 ```
 ### Create container
@@ -108,77 +56,21 @@ nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/se
 nvidia-docker exec -it test bash
 ```
-The `-p` option is to map the `9292` port of the container to the `9292` port of the host.
+or
-### Install PaddleServing
-In order to make the image smaller, the PaddleServing package is not installed in the image. You can run the following command to install it:
-```bash
-pip install paddle-serving-server-gpu
-```
-You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source of the following example) to speed up the download:
-```shell
-pip install paddle-serving-server-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-### Test example
-When running the GPU Server, you need to set the GPUs used by the prediction service through the `--gpu_ids` option, and the CPU is used by default. An error will be reported when the value of `--gpu_ids` exceeds the environment variable `CUDA_VISIBLE_DEVICES`. The following example specifies to use a GPU with index 0:
-```shell
-export CUDA_VISIBLE_DEVICES=0,1
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --gpu_ids 0
-```
-Get the trained Boston house price prediction model by the following command:
 ```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
+docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
-tar -xzf uci_housing.tar.gz
+docker exec -it test bash
 ```
- Test HTTP service
+The `-p` option is to map the `9292` port of the container to the `9292` port of the host.
-  Running on the Server side (inside the container):
-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --name uci --gpu_ids 0
-  ```
-  Running on the Client side (inside or outside the container):
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
- Test RPC service
-  Running on the Server side (inside the container):
-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0
-  ```
-  Running following Python code on the Client side (inside or outside the container, The `paddle-serving-client` package needs to be installed):
-  ```bash
-  from paddle_serving_client import Client
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
+### Install PaddleServing
+The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
+If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.
-## Attention
+## Precautious
 Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md).
--- a/doc/RUN_IN_DOCKER_CN.md
+++ b/doc/RUN_IN_DOCKER_CN.md
@@ -2,6 +2,8 @@
 (简体中文|[English](RUN_IN_DOCKER.md))
+Docker最大的好处之一就是可移植性，可在多种操作系统和主流的云计算平台部署。使用Paddle Serving Docker镜像可在Linux、Mac和Windows平台部署。
 ## 环境要求
 Docker（GPU版本需要在GPU机器上安装nvidia-docker）
@@ -18,7 +20,6 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
 docker pull hub.baidubce.com/paddlepaddle/serving:latest
 ```
 ### 创建容器并进入
 ```bash
@@ -30,74 +31,11 @@ docker exec -it test bash
 ### 安装PaddleServing
-为了减小镜像的体积，镜像中没有安装Serving包，要执行下面命令进行安装。
+镜像里自带对应镜像tag版本的`paddle_serving_server`，`paddle_serving_client`，`paddle_serving_app`，如果用户不需要更改版本，可以直接使用，适用于没有外网服务的环境。
-```bash
-pip install paddle-serving-server
-```
-您可能需要使用国内镜像源（例如清华源）来加速下载。
-```shell
-pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-### 测试example
-通过下面命令获取训练好的Boston房价预估模型：
-```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
-```
- 测试HTTP服务
-  在Server端（容器内）运行：
+如果需要更换版本，请参照首页的指导，下载对应版本的pip包。
-  ```bash
+## GPU 版本
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
-  ```
-  在Client端（容器内或容器外）运行：
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
- 测试RPC服务
-  在Server端（容器内）运行：
-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
-  ```
-  在Client端（容器内或容器外，需要安装`paddle-serving-client`包）运行下面Python代码：
-  ```python
-  from paddle_serving_client import Client
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
-## GPU版本
-GPU版本与CPU版本基本一致，只有部分接口命名的差别（GPU版本需要在GPU机器上安装nvidia-docker）。
-### 获取镜像
-参考[该文档](DOCKER_IMAGES_CN.md)获取镜像，这里以 `cuda9.0-cudnn7` 的镜像为例：
-```shell
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
-```
 ### 创建容器并进入
@@ -105,74 +43,19 @@ nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
 nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
 nvidia-docker exec -it test bash
 ```
+或者
-`-p`选项是为了将容器的`9292`端口映射到宿主机的`9292`端口。
-### 安装PaddleServing
-为了减小镜像的体积，镜像中没有安装Serving包，要执行下面命令进行安装。
 ```bash
-pip install paddle-serving-server-gpu
+docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
-```
+docker exec -it test bash
-您可能需要使用国内镜像源（例如清华源）来加速下载。
-```shell
-pip install paddle-serving-server-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-### 测试example
-在运行GPU版Server时需要通过`--gpu_ids`选项设置预测服务使用的GPU，缺省状态默认使用CPU。当设置的`--gpu_ids`超出环境变量`CUDA_VISIBLE_DEVICES`时会报错。下面的示例为指定使用索引为0的GPU：
-```shell
-export CUDA_VISIBLE_DEVICES=0,1
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --gpu_ids 0
-```
-通过下面命令获取训练好的Boston房价预估模型：
-```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
 ```
- 测试HTTP服务
+`-p`选项是为了将容器的`9292`端口映射到宿主机的`9292`端口。
-  在Server端（容器内）运行：
-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --name uci --gpu_ids 0
-  ```
-  在Client端（容器内或容器外）运行：
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
- 测试RPC服务
-  在Server端（容器内）运行：
-  ```bash
+### 安装PaddleServing
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0
-  ```
-  在Client端（容器内或容器外，需要安装`paddle-serving-client`包）运行下面Python代码：
+镜像里自带对应镜像tag版本的`paddle_serving_server_gpu`，`paddle_serving_client`，`paddle_serving_app`，如果用户不需要更改版本，可以直接使用，适用于没有外网服务的环境。
-  ```bash
+如果需要更换版本，请参照首页的指导，下载对应版本的pip包。
-  from paddle_serving_client import Client
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
 ## 注意事项

--- a/doc/WINDOWS_TUTORIAL.md
+++ b/doc/WINDOWS_TUTORIAL.md
+## Paddle Serving for Windows Users
+(English|[简体中文](./WINDOWS_TUTORIAL_CN.md))
+### Summary
+This document guides users how to build Paddle Serving service on the Windows platform. Due to the limited support of third-party libraries, the Windows platform currently only supports the use of web services to build local predictor prediction services. If you want to experience all the services, you need to use Docker for Windows to simulate the operating environment of Linux.
+### Running Paddle Serving on Native Windows System
+**Configure Python environment variables to PATH**: **We only support Python 3.5+ on Native Windows System.**. First, you need to add the directory where the Python executable program is located to the PATH. Usually in **System Properties/My Computer Properties**-**Advanced**-**Environment Variables**, click Path and add the path at the beginning. For example, `C:\Users\$USER\AppData\Local\Programs\Python\Python36`, and finally click **OK** continuously. If you enter python on Powershell, you can enter the python interactive interface, indicating that the environment variable configuration is successful.
+**Install wget**: Because all the downloads in the tutorial and the built-in model download function in `paddle_serving_app` all use the wget tool, download the binary package at the [link](http://gnuwin32.sourceforge.net/packages/wget.htm), unzip and copy it to `C:\Windows\System32`, if there is a security prompt, you need to pass it.
+**Install Git**: For details, see [Git official website](https://git-scm.com/downloads)
+**Install the necessary C++ library (optional)**: Some users may encounter the problem that the dll cannot be linked during the `import paddle` stage. It is recommended to [Install Visual Studio Community Edition](https://visualstudio.microsoft.com/), and install the relevant components of C++.
+**Install Paddle and Serving**: In Powershell, execute
+```
+python -m pip install -U paddle_serving_server paddle_serving_client paddle_serving_app paddlepaddle`
+```
+for GPU users,
+```
+python -m pip install -U paddle_serving_server_gpu paddle_serving_client paddle_serving_app paddlepaddle-gpu
+```
+**Git clone Serving Project:**
+```
+git clone https://github.com/paddlepaddle/Serving
+pip install -r python/requirements_win.txt
+```
+**Run OCR example**:
+```
+cd Serving/python/example/ocr
+python -m paddle_serving_app.package --get_model ocr_rec
+tar -xzvf ocr_rec.tar.gz
+python -m paddle_serving_app.package --get_model ocr_det
+tar -xzvf ocr_det.tar.gz
+python ocr_debugger_server.py cpu &
+python ocr_web_client.py
+```
+### Create a new Paddle Serving Web Service on Windows
+Currently Windows supports the Local Predictor of the Web Service framework. The server code framework is as follows
+```
+# filename:your_webservice.py
+from paddle_serving_server.web_service import WebService
+# If it is the GPU version, please use from paddle_serving_server_gpu.web_service import WebService
+class YourWebService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        #Implement pre-processing here
+        #feed_dict is key: var names, value: numpy array input
+        #fetch_names is a list of fetch variable names
+        The meaning of #is_batch is whether the numpy array in the value of feed_dict contains the batch dimension
+        return feed_dict, fetch_names, is_batch
+    def postprocess(self, feed={}, fetch=[], fetch_map=None):
+        #fetch map is the returned dictionary after prediction, the key is the fetch names given when the process returns, and the value is the var specific value corresponding to the fetch names
+        #After processing here, the result needs to be converted into a dictionary again, and the type of values should be a list, so that it can be serialized in JSON to facilitate web return
+        return response
+your_service = YourService(name="XXX")
+your_service.load_model_config("your_model_path")
+your_service.prepare_server(workdir="workdir", port=9292)
+# If you are a GPU user, you can refer to the python example under python/examples/ocr
+your_service.run_debugger_service()
+# Windows platform cannot use run_rpc_service() interface
+your_service.run_web_service()
+```
+Client code example
+```
+# filename:your_client.py
+import requests
+import json
+import base64
+import os, sys
+import time
+import cv2 # If you need to upload pictures
+# Used for image reading, the principle is to use base64 encoding file content
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode(
+        'utf8') #data.tostring()).decode('utf8')
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:9292/XXX/prediction" # XXX depends on the initial name parameter of the server YourService
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+print(r.json())
+```
+The user only needs to follow the above instructions and implement the relevant content in the corresponding function. For more information, please refer to [How to develop a new Web Service? ](./NEW_WEB_SERVICE.md)
+Execute after development
+```
+python your_webservice.py &
+python your_client.py
+```
+Because the port needs to be occupied, there may be a security prompt during the startup process. Please click through and an IP address will be generated. It should be noted that when the Windows platform starts the service, the local IP address may not be 127.0.0.1. You need to confirm the IP address and then see how the Client should set the access IP.
+### Docker for Windows User Guide
+The above content is used for native Windows. If users want to experience complete functions, they need to use Docker tools to model Linux systems.
+Please refer to [Docker Desktop](https://www.docker.com/products/docker-desktop) to install Docker
+After installation, start the docker linux engine and download the relevant image. In the Serving directory
+```
+docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
+# There is no expose port here, users can set -p to perform port mapping as needed
+docker run --rm -dit --name serving_devel -v $PWD:/Serving hub.baidubce.com/paddlepaddle/serving:latest-devel
+docker exec -it serving_devel bash
+cd /Serving
+```
+The rest of the operations are exactly the same as the Linux version.
--- a/doc/WINDOWS_TUTORIAL_CN.md
+++ b/doc/WINDOWS_TUTORIAL_CN.md
+## Windows平台使用Paddle Serving指导
+([English](./WINDOWS_TUTORIAL.md)|简体中文）
+### 综述
+本文档指导用户如何在Windows平台手把手搭建Paddle Serving服务。由于受限第三方库的支持，Windows平台目前只支持用web service的方式搭建local predictor预测服务。如果想要体验全部的服务，需要使用Docker for Windows，来模拟Linux的运行环境。
+### 原生Windows系统运行Paddle Serving
+**配置Python环境变量到PATH**：**目前原生Windows仅支持Python 3.5或更高版本**。首先需要将Python的可执行程序所在目录加入到PATH当中。通常在**系统属性/我的电脑属性**-**高级**-**环境变量** ，点选Path，并在开头加上路径。例如`C:\Users\$USER\AppData\Local\Programs\Python\Python36`，最后连续点击**确定** 。在Powershell上如果输入python可以进入python交互界面，说明环境变量配置成功。
+**安装wget工具**：由于教程当中所有的下载，以及`paddle_serving_app`当中内嵌的模型下载功能，都是用到wget工具，在链接[下载wget](http://gnuwin32.sourceforge.net/packages/wget.htm)，解压后复制到`C:\Windows\System32`下，如有安全提示需要通过。
+**安装Git工具**： 详情参见[Git官网](https://git-scm.com/downloads)
+**安装必要的C++库（可选）**：部分用户可能会在`import paddle`阶段遇见dll无法链接的问题，建议可以[安装Visual Studio社区版本](`https://visualstudio.microsoft.com/`) ，并且安装C++的相关组件。
+**安装Paddle和Serving**：在Powershell，执行
+```
+python -m pip install -U paddle_serving_server paddle_serving_client paddle_serving_app paddlepaddle`
+```
+如果是GPU用户
+```
+python -m pip install -U paddle_serving_server_gpu paddle_serving_client paddle_serving_app paddlepaddle-gpu
+```
+**下载Serving库**：
+```
+git clone https://github.com/paddlepaddle/Serving
+pip install -r python/requirements_win.txt
+```
+**运行OCR示例**：
+```
+cd Serving/python/example/ocr
+python -m paddle_serving_app.package --get_model ocr_rec
+tar -xzvf ocr_rec.tar.gz
+python -m paddle_serving_app.package --get_model ocr_det
+tar -xzvf ocr_det.tar.gz
+python ocr_debugger_server.py cpu &
+python ocr_web_client.py
+```
+### 创建新的Windows支持的Paddle Serving服务
+目前Windows支持Web Service框架的Local Predictor。服务端代码框架如下
+```
+# filename:your_webservice.py
+from paddle_serving_server.web_service import WebService
+# 如果是GPU版本，请使用 from paddle_serving_server_gpu.web_service import WebService
+class YourWebService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        #在这里实现前处理
+        #feed_dict是 key: var names, value: numpy array input
+        #fetch_names 是fetch变量名列表
+        #is_batch的含义是feed_dict的value里的numpy array是否包含了batch维度
+        return feed_dict, fetch_names, is_batch
+    def postprocess(self, feed={}, fetch=[], fetch_map=None):
+        #fetch map是经过预测之后的返回字典，key是process返回时给定的fetch names，value是对应fetch names的var具体值
+        #在这里做处理之后，结果需重新转换成字典，并且values的类型应是列表list，这样可以JSON序列化方便web返回
+        return response
+your_service = YourService(name="XXX")
+your_service.load_model_config("your_model_path")
+your_service.prepare_server(workdir="workdir", port=9292)
+# 如果是GPU用户，可以参照python/examples/ocr下的python示例
+your_service.run_debugger_service()
+# Windows平台不可以使用 run_rpc_service()接口
+your_service.run_web_service()
+```
+客户端代码示例
+```
+# filename：your_client.py
+import requests
+import json
+import base64
+import os, sys
+import time
+import cv2 # 如果需要上传图片
+# 用于图片读取，原理是采用base64编码文件内容
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode(
+        'utf8')  #data.tostring()).decode('utf8')
+headers = {"Content-type": "application/json"}
+url = "http://127.0.0.1:9292/XXX/prediction" # XXX取决于服务端YourService的初始化name参数
+r = requests.post(url=url, headers=headers, data=json.dumps(data))
+print(r.json())
+```
+用户只需要按照如上指示，在对应函数中实现相关内容即可。更多信息请参见[如何开发一个新的Web Service？](./NEW_WEB_SERVICE_CN.md)
+开发完成后执行
+```
+python your_webservice.py &
+python your_client.py
+```
+因为需要占用端口，因此启动过程可能会有安全提示，请点选通过，就会有IP地址生成。需要注意的是，Windows平台启动服务时，本地IP地址可能不是127.0.0.1，需要确认好IP地址再看Client应该如何设定访问IP。
+### Docker for Windows 使用指南
+以上内容用于原生的Windows，如果用户想要体验完整的功能，需要使用Docker工具，来模拟Linux系统。
+安装Docker请参考[Docker Desktop](https://www.docker.com/products/docker-desktop)
+安装之后启动docker的linux engine，下载相关镜像。在Serving目录下
+```
+docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
+# 此处没有expose端口，用户可根据需要设置-p来进行端口映射
+docker run --rm -dit --name serving_devel -v $PWD:/Serving hub.baidubce.com/paddlepaddle/serving:latest-devel 
+docker exec -it serving_devel bash
+cd /Serving
+```
+其余操作与Linux版本完全一致。
--- a/java/README.md
+++ b/java/README.md
-## Java Demo
+## Tutorial of Java Client for Paddle Serving
+(English|[简体中文](./README_CN.md))
+### Development Environment
+In order to facilitate users to use java for development, we provide the compiled Serving project to be placed in the java mirror. The way to get the mirror and enter the development environment is
+```
+docker pull hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker exec -it java_serving bash
+cd Serving/java
+```
+The Serving folder is at the develop branch when the docker image is generated. You need to git pull to the latest version or git checkout to the desired branch.
+### Install client dependencies
+Due to the large number of dependent libraries, the image has been compiled once at the time of generation, and the user can perform the following operations
-### Install package
 ```
 mvn compile
 mvn install
@@ -9,18 +27,49 @@ mvn compile
 mvn install
 ```
-### Start Server
+### Start the server
-take the fit_a_line demo as example
+Take the fit_a_line model as an example, the server starts
 ```
- python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #CPU
+cd ../../python/examples/fit_a_line
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #GPU
+sh get_data.sh
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang &
 ```
-### Client Predict
+Client prediction
 ```
+cd ../../../java/examples/target
 java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample fit_a_line
 ```
-The Java example also contains the prediction client of Bert, Model_enaemble, asyn_predict, batch_predict, Cube_local, Cube_quant, and Yolov4 models.
+Take yolov4 as an example, the server starts
+```
+python -m paddle_serving_app.package --get_model yolov4
+tar -xzvf yolov4.tar.gz
+python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang & #It needs to be executed in GPU Docker, otherwise the execution method of CPU must be used.
+```
+Client prediction
+```
+# in /Serving/java/examples/target
+java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample yolov4 ../../../python/examples/yolov4/000000570688.jpg
+# The case of yolov4 needs to specify a picture as input
+```
+### Customization guidance
+The above example is running in CPU mode. If GPU mode is required, there are two options.
+The first is that GPU Serving and Java Client are in the same image. After starting the corresponding image, the user needs to move /Serving/java in the java image to the corresponding image.
+The second is to deploy GPU Serving and Java Client separately. If they are on the same host, you can learn the IP address of the corresponding container through ifconfig, and then when you connect to client.connect in `examples/src/main/java/PaddleServingClientExample.java` Make changes to the endpoint, and then compile it again. Or select `--net=host` to bind the network device of docker and host when docker starts, so that it can run directly without customizing java code.
+**It should be noted that in the example, all models need to use `--use_multilang` to start GRPC multi-programming language support, and the port number is 9393. If you need another port, you need to modify it in the java file**
+**Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/PIPELINE_SERVING.md) for details). The next version (0.4.1) of the Pipeline Serving Client for Java will be released. **
--- a/java/README_CN.md
+++ b/java/README_CN.md
-## Java 示例
+## 用于Paddle Serving的Java客户端
+([English](./README.md)|简体中文)
+### 开发环境
+为了方便用户使用java进行开发，我们提供了编译好的Serving工程放置在java镜像当中，获取镜像并进入开发环境的方式是
+```
+docker pull hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker exec -it java_serving bash
+cd Serving/java
+```
+Serving文件夹是镜像生成时的develop分支工程目录，需要git pull 到最新版本，或者git checkout 到想要的分支。
 ### 安装客户端依赖
+由于依赖库数量庞大，因此镜像已经在生成时编译过一次，用户执行以下操作即可
 ```
 mvn compile
 mvn install
@@ -11,16 +29,47 @@ mvn install
 ### 启动服务端
-以fit_a_line模型为例
+以fit_a_line模型为例，服务端启动
 ```
- python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #CPU
+cd ../../python/examples/fit_a_line
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #GPU
+sh get_data.sh
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang &
 ```
-### 客户端预测
+客户端预测
 ```
+cd ../../../java/examples/target
 java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample fit_a_line
 ```
-java示例中还包含了bert、model_enaemble、asyn_predict、batch_predict、cube_local、cube_quant、yolov4模型的预测客户端。
+以yolov4为例子，服务端启动
+```
+python -m paddle_serving_app.package --get_model yolov4
+tar -xzvf yolov4.tar.gz
+python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang &  #需要在GPU Docker当中执行，否则要使用CPU的执行方式。
+```
+客户端预测
+```
+# in /Serving/java/examples/target
+java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample yolov4 ../../../python/examples/yolov4/000000570688.jpg
+# yolov4的案例需要指定一个图片作为输入
+```
+### 二次开发指导
+上述示例是在CPU模式下运行，如果需要GPU模式，可以有两种选择。
+第一种是GPU Serving和Java Client在同一个镜像，需要用户在启动对应的镜像后，把java镜像当中的/Serving/java移动到对应的镜像中。
+第二种是GPU Serving和Java Client分开部署，如果在同一台宿主机，可以通过ifconfig了解对应容器的IP地址，然后在`examples/src/main/java/PaddleServingClientExample.java`当中对client.connect时的endpoint做修改，然后再编译一次。 或者在docker启动时选择 `--net=host`来绑定docker和宿主机的网络设备，这样不需要定制java代码可以直接运行。
+**需要注意的是，在示例中，所有模型都需要使用`--use_multilang`来启动GRPC多编程语言支持，以及端口号都是9393，如果需要别的端口，需要在java文件里修改**
+**目前Serving已推出Pipeline模式（详见[Pipeline Serving](../doc/PIPELINE_SERVING_CN.md)），下个版本（0.4.1）面向Java的Pipeline Serving Client将会发布，敬请期待。**
--- a/python/examples/fit_a_line/README.md
+++ b/python/examples/fit_a_line/README.md
@@ -14,12 +14,6 @@ sh get_data.sh
 ### Start server
-``` shell
-python test_server.py uci_housing_model/
-```
-You can also start the default RPC service with the following line of code:
 ```shell
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393
 ```
@@ -40,7 +34,7 @@ python test_client.py uci_housing_client/serving_client_conf.prototxt
 Start a web service with default web service hosting modules:
 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --name uci
+python test_server.py
 ```
 ### Client prediction

--- a/python/examples/fit_a_line/README_CN.md
+++ b/python/examples/fit_a_line/README_CN.md
@@ -41,7 +41,7 @@ python test_client.py uci_housing_client/serving_client_conf.prototxt
 通过下面的一行代码开启默认web服务：
 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --name uci
+python test_server.py
 ```
 ### 客户端预测

--- a/python/examples/fit_a_line/test_client.py
+++ b/python/examples/fit_a_line/test_client.py
@@ -15,6 +15,7 @@
 from paddle_serving_client import Client
 import sys
+import numpy as np
 client = Client()
 client.load_client_config(sys.argv[1])
@@ -27,7 +28,6 @@ test_reader = paddle.batch(
    batch_size=1)
 for data in test_reader():
-    import numpy as np
    new_data = np.zeros((1, 1, 13)).astype("float32")
    new_data[0] = data[0][0]
    fetch_map = client.predict(

--- a/python/examples/fit_a_line/test_server.py
+++ b/python/examples/fit_a_line/test_server.py
@@ -13,24 +13,24 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
-import os
+from paddle_serving_server.web_service import WebService
-import sys
+import numpy as np
-from paddle_serving_server import OpMaker
-from paddle_serving_server import OpSeqMaker
-from paddle_serving_server import Server
-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_infer_op = op_maker.create('general_infer')
-response_op = op_maker.create('general_response')
-op_seq_maker = OpSeqMaker()
+class UciService(WebService):
-op_seq_maker.add_op(read_op)
+    def preprocess(self, feed=[], fetch=[]):
-op_seq_maker.add_op(general_infer_op)
+        feed_batch = []
-op_seq_maker.add_op(response_op)
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
+uci_service = UciService(name="uci")
-server.load_model_config(sys.argv[1])
+uci_service.load_model_config("uci_housing_model")
-server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
+uci_service.prepare_server(workdir="workdir", port=9292)
-server.run_server()
+uci_service.run_rpc_service()
+uci_service.run_web_service()
--- a/python/examples/ocr_detection/7.jpg
+++ b/python/examples/ocr_detection/7.jpg
--- a/python/examples/ocr_detection/text_det_client.py
+++ b/python/examples/ocr_detection/text_det_client.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import os
-from paddle_serving_client import Client
-from paddle_serving_app.reader import Sequential, File2Image, ResizeByFactor
-from paddle_serving_app.reader import Div, Normalize, Transpose
-from paddle_serving_app.reader import DBPostProcess, FilterBoxes
-client = Client()
-client.load_client_config("ocr_det_client/serving_client_conf.prototxt")
-client.connect(["127.0.0.1:9494"])
-read_image_file = File2Image()
-preprocess = Sequential([
-    ResizeByFactor(32, 960), Div(255),
-    Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
-        (2, 0, 1))
-])
-post_func = DBPostProcess({
-    "thresh": 0.3,
-    "box_thresh": 0.5,
-    "max_candidates": 1000,
-    "unclip_ratio": 1.5,
-    "min_size": 3
-})
-filter_func = FilterBoxes(10, 10)
-img = read_image_file(name)
-ori_h, ori_w, _ = img.shape
-img = preprocess(img)
-new_h, new_w, _ = img.shape
-ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
-outputs = client.predict(feed={"image": img}, fetch=["concat_1.tmp_0"])
-dt_boxes_list = post_func(outputs["concat_1.tmp_0"], [ratio_list])
-dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
@@ -58,7 +58,7 @@ class WebService(object):
        if os.path.isdir(model_config):
            client_config = "{}/serving_server_conf.prototxt".format(
                model_config)
-        elif os.path.isfile(path):
+        elif os.path.isfile(model_config):
            client_config = model_config
        model_conf = m_config.GeneralModelConfig()
        f = open(client_config, 'r')

--- a/python/paddle_serving_server_gpu/__init__.py
+++ b/python/paddle_serving_server_gpu/__init__.py
@@ -25,7 +25,9 @@ from .version import serving_server_version
 from contextlib import closing
 import argparse
 import collections
-import fcntl
+import sys
+if sys.platform.startswith('win') is False:
+    import fcntl
 import shutil
 import numpy as np
 import grpc

--- a/python/paddle_serving_server_gpu/web_service.py
+++ b/python/paddle_serving_server_gpu/web_service.py
@@ -64,7 +64,7 @@ class WebService(object):
        if os.path.isdir(model_config):
            client_config = "{}/serving_server_conf.prototxt".format(
                model_config)
-        elif os.path.isfile(path):
+        elif os.path.isfile(model_config):
            client_config = model_config
        model_conf = m_config.GeneralModelConfig()
        f = open(client_config, 'r')

--- a/python/requirements_mac.txt
+++ b/python/requirements_mac.txt
+numpy>=1.12, <=1.16.4 ; python_version<"3.5"
+shapely==1.7.0
+wheel>=0.34.0, <0.35.0
+setuptools>=44.1.0
+opencv-python==4.2.0.32
+google>=2.0.3
+opencv-python==4.2.0.32
+protobuf>=3.12.2
+grpcio-tools>=1.33.2
+grpcio>=1.33.2
+func-timeout>=4.3.5
+pyyaml>=1.3.0
+sentencepiece==0.1.83
+flask>=1.1.2
+ujson>=2.0.3
--- a/requirements_win.txt
+++ b/requirements_win.txt
--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
@@ -29,7 +29,7 @@ util.gen_pipeline_code("paddle_serving_server")
 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio <= 1.33.2', 'grpcio-tools <= 1.33.2',
-    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app', 'func_timeout', 'pyyaml'
+    'flask >= 1.1.1', 'func_timeout', 'pyyaml'
 ]
 packages=['paddle_serving_server',

--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
@@ -31,7 +31,7 @@ util.gen_pipeline_code("paddle_serving_server_gpu")
 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio <= 1.33.2', 'grpcio-tools <= 1.33.2',
-    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app', 'func_timeout', 'pyyaml'
+    'flask >= 1.1.1', 'func_timeout', 'pyyaml'
 ]
 packages=['paddle_serving_server_gpu',

--- a/requirements.txt
+++ b/requirements.txt
-sphinx==2.1.0
-mistune
-sphinx_rtd_theme
-paddlepaddle>=1.8.4
-shapely<=1.6.1