fix ce script

819519ec · MRXLT · ee72ff4b · d3e8fedf · 819519ec · 819519ec
147 changed file
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -85,6 +85,17 @@ include(generic)
 include(flags)
 endif()

+if (APP)
+include(external/zlib)
+include(external/boost)
+include(external/protobuf)
+include(external/gflags)
+include(external/glog)
+include(external/pybind11)
+include(external/python)
+include(generic)
+endif()
+
 if (SERVER)
 include(external/cudnn)
 include(paddlepaddle)

--- a/README.md
+++ b/README.md
+([简体中文](./README_CN.md)|English)
+
 <p align="center">
    <br>
 <img src='doc/serving_logo.png' width = "600" height = "130">
    <br>
 <p>

+
 <p align="center">
    <br>
    <a href="https://travis-ci.com/PaddlePaddle/Serving">
@@ -23,28 +26,20 @@ We consider deploying deep learning inference service online to be a user-facing
    <img src="doc/demo.gif" width="700">
 </p>

-<h2 align="center">Some Key Features</h2>
-
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
- **Extensible framework design** which can support model serving beyond Paddle.

 <h2 align="center">Installation</h2>

 We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md)
 ```
 # Run CPU Docker
-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker pull hub.baidubce.com/paddlepaddle/serving:latest
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it test bash
 ```
 ```
 # Run GPU Docker
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
 nvidia-docker exec -it test bash
 ```

@@ -56,10 +51,44 @@ pip install paddle-serving-server-gpu # GPU

 You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.

+If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
+
 Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.

+
+<h2 align="center"> Pre-built services with Paddle Serving</h2>
+
+<h3 align="center">Chinese Word Segmentation</h4>
+
+``` shell
+> python -m paddle_serving_app.package -get_model lac
+> tar -xzf lac.tar.gz
+> python lac_web_service.py 9292 &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
+{"result":[{"word_seg":"我|爱|北京|天安门"}]}
+```
+
+<h3 align="center">Image Classification</h4>
+
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+    
+``` shell
+> python -m paddle_serving_app.package -get_model resnet_v2_50_imagenet
+> tar -xzf resnet_v2_50_imagenet.tar.gz
+> python resnet50_imagenet_classify.py resnet50_serving_model &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
+```
+
+
 <h2 align="center">Quick Start Example</h2>

+This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
+
 ### Boston House Price Prediction model
 ``` shell
 wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
@@ -82,7 +111,9 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `port` | int | `9292` | Exposed port of current service to users|
 | `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
-| `mem_optim` | bool | `False` | Enable memory optimization |
+| `mem_optim` | bool | `False` | Enable memory / graphic memory optimization |
+| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
+| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |

 Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
 </center>
@@ -113,138 +144,13 @@ print(fetch_map)
 ```
 Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.

-<h2 align="center"> Pre-built services with Paddle Serving</h2>
-
-<h3 align="center">Chinese Word Segmentation</h4>
-
- **Description**: 
-``` shell
-Chinese word segmentation HTTP service that can be deployed with one line command.
-```
-
- **Download Servable Package**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-```
- **Host web service**: 
-``` shell
-tar -xzf lac_model_jieba_web.tar.gz
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
-```
- **Request sample**: 
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
-```
- **Request result**: 
-``` shell
-{"word_seg":"我|爱|北京|天安门"}
-```
-
-<h3 align="center">Image Classification</h4>
-
- **Description**: 
-``` shell
-Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
-Note: This demo needs paddle-serving-server-gpu. 
-```
-
- **Download Servable Package**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
-```
- **Host web service**: 
-``` shell
-tar -xzf imagenet_demo.tar.gz
-python image_classification_service_demo.py resnet50_serving_model
-```
- **Request sample**: 
-
-<p align="center">
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-```
- **Request result**: 
-``` shell
-{"label":"daisy","prob":0.9341403245925903}
-```
-
-<h3 align="center">More Demos</h3>
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Bert-Base-Baike                                              |
-| URL                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
-| Description        | Get semantic representation from a Chinese Sentence          |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Resnet50-Imagenet                                            |
-| URL                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| Description        | Get image semantic representation from an image              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Resnet101-Imagenet                                           |
-| URL                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| Description        | Get image semantic representation from an image              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | CNN-IMDB                                                     |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | LSTM-IMDB                                                    |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | BOW-IMDB                                                     |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Jieba-LAC                                                    |
-| URL                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
-| Description        | Get word segmentation from a Chinese Sentence                |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | DNN-CTR                                                      |
-| URL                | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz                            |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
-| Description        | Get click probability from a feature vector of item          |
+<h2 align="center">Some Key Features of Paddle Serving</h2>

+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
+- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
+- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
+- **Highly concurrent and efficient communication** between clients and servers supported.
+- **Multiple programming languages** supported on client side, such as Golang, C++ and python.

 <h2 align="center">Document</h2>

@@ -259,11 +165,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
 - [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
 - [Golang client](doc/IMDB_GO_CLIENT.md)
 - [Compile from source code](doc/COMPILE.md)
+- [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md)
+- [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md)

 ### About Efficiency
 - [How to profile Paddle Serving latency?](python/examples/util)
- [How to optimize performance?(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [Deploy multi-services on one GPU(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
+- [How to optimize performance?(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
+- [Deploy multi-services on one GPU(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
 - [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
 - [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)


--- a/README_CN.md
+++ b/README_CN.md
+(简体中文|[English](./README.md))
+
 <p align="center">
    <br>
 <img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130">
    <br>
 <p>

+
 <p align="center">
    <br>
    <a href="https://travis-ci.com/PaddlePaddle/Serving">
@@ -24,14 +27,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
    <img src="doc/demo.gif" width="700">
 </p>

-<h2 align="center">核心功能</h2>

- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
- **可伸缩框架设计** 可支持不限于Paddle的模型服务.

 <h2 align="center">安装</h2>

@@ -39,14 +35,14 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务

 ```
 # 启动 CPU Docker
-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker pull hub.baidubce.com/paddlepaddle/serving:latest
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it test bash
 ```
 ```
 # 启动 GPU Docker
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
 nvidia-docker exec -it test bash
 ```
 ```shell
@@ -57,9 +53,42 @@ pip install paddle-serving-server-gpu # GPU

 您可能需要使用国内镜像源（例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`）来加速下载。

+如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。
+
 客户端安装包支持Centos 7和Ubuntu 18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。

-<h2 align="center">快速启动示例</h2>
+<h2 align="center"> Paddle Serving预装的服务 </h2>
+
+<h3 align="center">中文分词</h4>
+
+``` shell
+> python -m paddle_serving_app.package -get_model lac
+> tar -xzf lac.tar.gz
+> python lac_web_service.py 9292 &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
+{"result":[{"word_seg":"我|爱|北京|天安门"}]}
+```
+
+<h3 align="center">图像分类</h4>
+
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+    
+``` shell
+> python -m paddle_serving_app.package -get_model resnet_v2_50_imagenet
+> tar -xzf resnet_v2_50_imagenet.tar.gz
+> python resnet50_imagenet_classify.py resnet50_serving_model &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
+```
+
+
+<h2 align="center">快速开始示例</h2>
+
+这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的，而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程，请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)

 <h3 align="center">波士顿房价预测</h3>

@@ -87,6 +116,8 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
 | `mem_optim` | bool | `False` | Enable memory optimization |
+| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
+| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |

 我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
 </center>
@@ -118,139 +149,13 @@ print(fetch_map)
 ```
 在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。

-<h2 align="center">Paddle Serving预装的服务</h2>
-
-<h3 align="center">中文分词模型</h4>
-
- **介绍**: 
-``` shell
-本示例为中文分词HTTP服务一键部署
-```
-
- **下载服务包**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-```
- **启动web服务**: 
-``` shell
-tar -xzf lac_model_jieba_web.tar.gz
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
-```
- **客户端请求示例**: 
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
-```
- **返回结果示例**: 
-``` shell
-{"word_seg":"我|爱|北京|天安门"}
-```
-
-<h3 align="center">图像分类模型</h4>
-
- **介绍**: 
-``` shell
-图像分类模型由Imagenet数据集训练而成，该服务会返回一个标签及其概率
-注意：本示例需要安装paddle-serving-server-gpu
-```
-
- **下载服务包**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
-```
- **启动web服务**: 
-``` shell
-tar -xzf imagenet_demo.tar.gz
-python image_classification_service_demo.py resnet50_serving_model
-```
- **客户端请求示例**: 
-
-<p align="center">
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-```
- **返回结果示例**: 
-``` shell
-{"label":"daisy","prob":0.9341403245925903}
-```
-
-<h3 align="center">更多示例</h3>
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名              | Bert-Base-Baike                                              |
-| 下载链接                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
-| 客户端/服务端代码     | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
-| 介绍                | 获得一个中文语句的语义表示          |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | Resnet50-Imagenet                                            |
-| 下载链接                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| 介绍        | 获得一张图片的图像语义表示              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名       | Resnet101-Imagenet                                           |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| 介绍      | 获得一张图片的图像语义表示              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名        | CNN-IMDB                                                     |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍       | 从一个中文语句获得类别及其概率           |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | LSTM-IMDB                                                    |
-| 下载链接               | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍        | 从一个英文语句获得类别及其概率            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | BOW-IMDB                                                     |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍       | 从一个英文语句获得类别及其概率            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | Jieba-LAC                                                    |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
-| 介绍       | 获取中文语句的分词                |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | DNN-CTR                                                      |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz                    |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
-| 介绍        | 从项目的特征向量中获得点击概率        |
-
+<h2 align="center">Paddle Serving的核心功能</h2>

+- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
+- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
+- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
+- 支持客户端和服务端之间 **高并发和高效通信**.
+- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.

 <h2 align="center">文档</h2>

@@ -265,11 +170,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
 - [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
 - [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
 - [如何编译PaddleServing?](doc/COMPILE_CN.md)
+- [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md)
+- [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md)

 ### 关于Paddle Serving性能
 - [如何测试Paddle Serving性能？](python/examples/util/)
- [如何优化性能?](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [在一张GPU上启动多个预测服务](doc/PERFORMANCE_OPTIM_CN.md)
+- [如何优化性能?](doc/PERFORMANCE_OPTIM_CN.md)
+- [在一张GPU上启动多个预测服务](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
 - [CPU版Benchmarks](doc/BENCHMARKING.md)
 - [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)


--- a/cmake/paddlepaddle.cmake
+++ b/cmake/paddlepaddle.cmake
@@ -31,7 +31,7 @@ message( "WITH_GPU = ${WITH_GPU}")
 # Paddle Version should be one of:
 # latest: latest develop build
 # version number like 1.5.2
-SET(PADDLE_VERSION "1.7.1")
+SET(PADDLE_VERSION "1.7.2")

 if (WITH_GPU)
    SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda${CUDA_VERSION_MAJOR}-cudnn7-avx-mkl")

--- a/core/CMakeLists.txt
+++ b/core/CMakeLists.txt
@@ -23,6 +23,11 @@ add_subdirectory(pdcodegen)
 add_subdirectory(sdk-cpp)
 endif()

+if (APP)
+add_subdirectory(configure)
+endif()
+
+
 if(CLIENT)
 add_subdirectory(general-client)
 endif()

--- a/core/configure/CMakeLists.txt
+++ b/core/configure/CMakeLists.txt
+if (SERVER OR CLIENT)
 LIST(APPEND protofiles
        ${CMAKE_CURRENT_LIST_DIR}/proto/server_configure.proto
        ${CMAKE_CURRENT_LIST_DIR}/proto/sdk_configure.proto
@@ -28,6 +29,7 @@ FILE(GLOB inc ${CMAKE_CURRENT_BINARY_DIR}/*.pb.h)

 install(FILES ${inc}
        DESTINATION ${PADDLE_SERVING_INSTALL_DIR}/include/configure)
+endif()

 py_proto_compile(general_model_config_py_proto SRCS proto/general_model_config.proto)
 add_custom_target(general_model_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
@@ -51,6 +53,14 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD

 endif()

+if (APP)
+add_custom_command(TARGET general_model_config_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/proto
+                COMMENT "Copy generated general_model_config proto file into directory paddle_serving_app/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+endif()
+
 if (SERVER)
 py_proto_compile(server_config_py_proto SRCS proto/server_configure.proto)
 add_custom_target(server_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)

--- a/core/configure/proto/server_configure.proto
+++ b/core/configure/proto/server_configure.proto
@@ -43,6 +43,7 @@ message EngineDesc {
  optional bool enable_memory_optimization = 13;
  optional bool static_optimization = 14;
  optional bool force_update_static_cache = 15;
+  optional bool enable_ir_optimization = 16;
 };

 // model_toolkit conf

--- a/core/cube/cube-agent/src/agent/util.go
+++ b/core/cube/cube-agent/src/agent/util.go
@@ -83,9 +83,6 @@ func JsonReq(method, requrl string, timeout int, kv *map[string]string,
 }

 func GetHdfsMeta(src string) (master, ugi, path string, err error) {
-	//src = "hdfs://root:rootpasst@st1-inf-platform0.st01.baidu.com:54310/user/mis_user/news_dnn_ctr_cube_1/1501836820/news_dnn_ctr_cube_1_part54.tar"
-	//src = "hdfs://st1-inf-platform0.st01.baidu.com:54310/user/mis_user/news_dnn_ctr_cube_1/1501836820/news_dnn_ctr_cube_1_part54.tar"
-
 	ugiBegin := strings.Index(src, "//")
 	ugiPos := strings.LastIndex(src, "@")
 	if ugiPos != -1 && ugiBegin != -1 {

--- a/core/general-client/CMakeLists.txt
+++ b/core/general-client/CMakeLists.txt
 if(CLIENT)
 add_subdirectory(pybind11)
 pybind11_add_module(serving_client src/general_model.cpp src/pybind_general_model.cpp)
-target_link_libraries(serving_client PRIVATE -Wl,--whole-archive utils sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz)
+target_link_libraries(serving_client PRIVATE -Wl,--whole-archive utils sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -Wl,-rpath,'$ORIGIN'/lib)
 endif()
--- a/core/general-client/include/general_model.h
+++ b/core/general-client/include/general_model.h
@@ -69,15 +69,27 @@ class ModelRes {
  const std::vector<int64_t>& get_int64_by_name(const std::string& name) {
    return _int64_value_map[name];
  }
+  std::vector<int64_t>&& get_int64_by_name_with_rv(const std::string& name) {
+    return std::move(_int64_value_map[name]);
+  }
  const std::vector<float>& get_float_by_name(const std::string& name) {
    return _float_value_map[name];
  }
-  const std::vector<int>& get_shape(const std::string& name) {
+  std::vector<float>&& get_float_by_name_with_rv(const std::string& name) {
+    return std::move(_float_value_map[name]);
+  }
+  const std::vector<int>& get_shape_by_name(const std::string& name) {
    return _shape_map[name];
  }
-  const std::vector<int>& get_lod(const std::string& name) {
+  std::vector<int>&& get_shape_by_name_with_rv(const std::string& name) {
+    return std::move(_shape_map[name]);
+  }
+  const std::vector<int>& get_lod_by_name(const std::string& name) {
    return _lod_map[name];
  }
+  std::vector<int>&& get_lod_by_name_with_rv(const std::string& name) {
+    return std::move(_lod_map[name]);
+  }
  void set_engine_name(const std::string& engine_name) {
    _engine_name = engine_name;
  }
@@ -121,17 +133,33 @@ class PredictorRes {
                                                const std::string& name) {
    return _models[model_idx].get_int64_by_name(name);
  }
+  std::vector<int64_t>&& get_int64_by_name_with_rv(const int model_idx,
+                                                   const std::string& name) {
+    return std::move(_models[model_idx].get_int64_by_name_with_rv(name));
+  }
  const std::vector<float>& get_float_by_name(const int model_idx,
                                              const std::string& name) {
    return _models[model_idx].get_float_by_name(name);
  }
-  const std::vector<int>& get_shape(const int model_idx,
-                                    const std::string& name) {
-    return _models[model_idx].get_shape(name);
+  std::vector<float>&& get_float_by_name_with_rv(const int model_idx,
+                                                 const std::string& name) {
+    return std::move(_models[model_idx].get_float_by_name_with_rv(name));
+  }
+  const std::vector<int>& get_shape_by_name(const int model_idx,
+                                            const std::string& name) {
+    return _models[model_idx].get_shape_by_name(name);
+  }
+  const std::vector<int>&& get_shape_by_name_with_rv(const int model_idx,
+                                                     const std::string& name) {
+    return std::move(_models[model_idx].get_shape_by_name_with_rv(name));
+  }
+  const std::vector<int>& get_lod_by_name(const int model_idx,
+                                          const std::string& name) {
+    return _models[model_idx].get_lod_by_name(name);
  }
-  const std::vector<int>& get_lod(const int model_idx,
-                                  const std::string& name) {
-    return _models[model_idx].get_lod(name);
+  const std::vector<int>&& get_lod_by_name_with_rv(const int model_idx,
+                                                   const std::string& name) {
+    return std::move(_models[model_idx].get_lod_by_name_with_rv(name));
  }
  void add_model_res(ModelRes&& res) {
    _engine_names.push_back(res.engine_name());

--- a/core/general-client/src/general_model.cpp
+++ b/core/general-client/src/general_model.cpp
@@ -258,9 +258,10 @@ int PredictorClient::batch_predict(
      ModelRes model;
      model.set_engine_name(output.engine_name());

+      int idx = 0;
+
      for (auto &name : fetch_name) {
        // int idx = _fetch_name_to_idx[name];
-        int idx = 0;
        int shape_size = output.insts(0).tensor_array(idx).shape_size();
        VLOG(2) << "fetch var " << name << " index " << idx << " shape size "
                << shape_size;
@@ -279,9 +280,9 @@ int PredictorClient::batch_predict(
        idx += 1;
      }

+      idx = 0;
      for (auto &name : fetch_name) {
        // int idx = _fetch_name_to_idx[name];
-        int idx = 0;
        if (_fetch_name_to_type[name] == 0) {
          VLOG(2) << "ferch var " << name << "type int";
          model._int64_value_map[name].resize(
@@ -345,7 +346,7 @@ int PredictorClient::numpy_predict(
    PredictorRes &predict_res_batch,
    const int &pid) {
  int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size());
-
+  VLOG(2) << "batch size: " << batch_size;
  predict_res_batch.clear();
  Timer timeline;
  int64_t preprocess_start = timeline.TimeStampUS();
@@ -462,7 +463,7 @@ int PredictorClient::numpy_predict(
            for (ssize_t j = 0; j < int_array.shape(1); j++) {
              for (ssize_t k = 0; k < int_array.shape(2); k++) {
                for (ssize_t l = 0; k < int_array.shape(3); l++) {
-                  tensor->add_float_data(int_array(i, j, k, l));
+                  tensor->add_int64_data(int_array(i, j, k, l));
                }
              }
            }
@@ -474,7 +475,7 @@ int PredictorClient::numpy_predict(
          for (ssize_t i = 0; i < int_array.shape(0); i++) {
            for (ssize_t j = 0; j < int_array.shape(1); j++) {
              for (ssize_t k = 0; k < int_array.shape(2); k++) {
-                tensor->add_float_data(int_array(i, j, k));
+                tensor->add_int64_data(int_array(i, j, k));
              }
            }
          }
@@ -484,7 +485,7 @@ int PredictorClient::numpy_predict(
          auto int_array = int_feed[vec_idx].unchecked<2>();
          for (ssize_t i = 0; i < int_array.shape(0); i++) {
            for (ssize_t j = 0; j < int_array.shape(1); j++) {
-              tensor->add_float_data(int_array(i, j));
+              tensor->add_int64_data(int_array(i, j));
            }
          }
          break;
@@ -492,7 +493,7 @@ int PredictorClient::numpy_predict(
        case 1: {
          auto int_array = int_feed[vec_idx].unchecked<1>();
          for (ssize_t i = 0; i < int_array.shape(0); i++) {
-            tensor->add_float_data(int_array(i));
+            tensor->add_int64_data(int_array(i));
          }
          break;
        }
@@ -536,9 +537,9 @@ int PredictorClient::numpy_predict(
      ModelRes model;
      model.set_engine_name(output.engine_name());

+      int idx = 0;
      for (auto &name : fetch_name) {
        // int idx = _fetch_name_to_idx[name];
-        int idx = 0;
        int shape_size = output.insts(0).tensor_array(idx).shape_size();
        VLOG(2) << "fetch var " << name << " index " << idx << " shape size "
                << shape_size;
@@ -557,9 +558,10 @@ int PredictorClient::numpy_predict(
        idx += 1;
      }

+      idx = 0;
+
      for (auto &name : fetch_name) {
        // int idx = _fetch_name_to_idx[name];
-        int idx = 0;
        if (_fetch_name_to_type[name] == 0) {
          VLOG(2) << "ferch var " << name << "type int";
          model._int64_value_map[name].resize(

--- a/core/general-client/src/pybind_general_model.cpp
+++ b/core/general-client/src/pybind_general_model.cpp
@@ -32,24 +32,41 @@ PYBIND11_MODULE(serving_client, m) {
      .def(py::init())
      .def("get_int64_by_name",
           [](PredictorRes &self, int model_idx, std::string &name) {
-             return self.get_int64_by_name(model_idx, name);
-           },
-           py::return_value_policy::reference)
+             // see more: https://github.com/pybind/pybind11/issues/1042
+             std::vector<int64_t> *ptr = new std::vector<int64_t>(
+                 std::move(self.get_int64_by_name_with_rv(model_idx, name)));
+             auto capsule = py::capsule(ptr, [](void *p) {
+               delete reinterpret_cast<std::vector<int64_t> *>(p);
+             });
+             return py::array(ptr->size(), ptr->data(), capsule);
+           })
      .def("get_float_by_name",
           [](PredictorRes &self, int model_idx, std::string &name) {
-             return self.get_float_by_name(model_idx, name);
-           },
-           py::return_value_policy::reference)
+             std::vector<float> *ptr = new std::vector<float>(
+                 std::move(self.get_float_by_name_with_rv(model_idx, name)));
+             auto capsule = py::capsule(ptr, [](void *p) {
+               delete reinterpret_cast<std::vector<float> *>(p);
+             });
+             return py::array(ptr->size(), ptr->data(), capsule);
+           })
      .def("get_shape",
           [](PredictorRes &self, int model_idx, std::string &name) {
-             return self.get_shape(model_idx, name);
-           },
-           py::return_value_policy::reference)
+             std::vector<int> *ptr = new std::vector<int>(
+                 std::move(self.get_shape_by_name_with_rv(model_idx, name)));
+             auto capsule = py::capsule(ptr, [](void *p) {
+               delete reinterpret_cast<std::vector<int> *>(p);
+             });
+             return py::array(ptr->size(), ptr->data(), capsule);
+           })
      .def("get_lod",
           [](PredictorRes &self, int model_idx, std::string &name) {
-             return self.get_lod(model_idx, name);
-           },
-           py::return_value_policy::reference)
+             std::vector<int> *ptr = new std::vector<int>(
+                 std::move(self.get_lod_by_name_with_rv(model_idx, name)));
+             auto capsule = py::capsule(ptr, [](void *p) {
+               delete reinterpret_cast<std::vector<int> *>(p);
+             });
+             return py::array(ptr->size(), ptr->data(), capsule);
+           })
      .def("variant_tag", [](PredictorRes &self) { return self.variant_tag(); })
      .def("get_engine_names",
           [](PredictorRes &self) { return self.get_engine_names(); });
@@ -100,7 +117,8 @@ PYBIND11_MODULE(serving_client, m) {
                                       fetch_name,
                                       predict_res_batch,
                                       pid);
-           })
+           },
+           py::call_guard<py::gil_scoped_release>())
      .def("numpy_predict",
           [](PredictorClient &self,
              const std::vector<std::vector<py::array_t<float>>>

--- a/core/general-server/op/general_reader_op.cpp
+++ b/core/general-server/op/general_reader_op.cpp
@@ -131,7 +131,7 @@ int GeneralReaderOp::inference() {
      lod_tensor.dtype = paddle::PaddleDType::FLOAT32;
    }

-    if (req->insts(0).tensor_array(i).shape(0) == -1) {
+    if (model_config->_is_lod_feed[i]) {
      lod_tensor.lod.resize(1);
      lod_tensor.lod[0].push_back(0);
      VLOG(2) << "var[" << i << "] is lod_tensor";
@@ -153,6 +153,7 @@ int GeneralReaderOp::inference() {
  // specify the memory needed for output tensor_vector
  for (int i = 0; i < var_num; ++i) {
    if (out->at(i).lod.size() == 1) {
+      int tensor_size = 0;
      for (int j = 0; j < batch_size; ++j) {
        const Tensor &tensor = req->insts(j).tensor_array(i);
        int data_len = 0;
@@ -162,15 +163,28 @@ int GeneralReaderOp::inference() {
          data_len = tensor.float_data_size();
        }
        VLOG(2) << "tensor size for var[" << i << "]: " << data_len;
+        tensor_size += data_len;

        int cur_len = out->at(i).lod[0].back();
        VLOG(2) << "current len: " << cur_len;

-        out->at(i).lod[0].push_back(cur_len + data_len);
-        VLOG(2) << "new len: " << cur_len + data_len;
+        int sample_len = 0;
+        if (tensor.shape_size() == 1) {
+          sample_len = data_len;
+        } else {
+          sample_len = tensor.shape(0);
+        }
+        out->at(i).lod[0].push_back(cur_len + sample_len);
+        VLOG(2) << "new len: " << cur_len + sample_len;
+      }
+      out->at(i).data.Resize(tensor_size * elem_size[i]);
+      out->at(i).shape = {out->at(i).lod[0].back()};
+      for (int j = 1; j < req->insts(0).tensor_array(i).shape_size(); ++j) {
+        out->at(i).shape.push_back(req->insts(0).tensor_array(i).shape(j));
+      }
+      if (out->at(i).shape.size() == 1) {
+        out->at(i).shape.push_back(1);
      }
-      out->at(i).data.Resize(out->at(i).lod[0].back() * elem_size[i]);
-      out->at(i).shape = {out->at(i).lod[0].back(), 1};
      VLOG(2) << "var[" << i
              << "] is lod_tensor and len=" << out->at(i).lod[0].back();
    } else {

--- a/core/general-server/op/general_response_op.cpp
+++ b/core/general-server/op/general_response_op.cpp
@@ -15,8 +15,10 @@
 #include "core/general-server/op/general_response_op.h"
 #include <algorithm>
 #include <iostream>
+#include <map>
 #include <memory>
 #include <sstream>
+#include <utility>
 #include "core/general-server/op/general_infer_helper.h"
 #include "core/predictor/framework/infer.h"
 #include "core/predictor/framework/memory.h"
@@ -86,17 +88,20 @@ int GeneralResponseOp::inference() {
    // To get the order of model return values
    output->set_engine_name(pre_name);
    FetchInst *fetch_inst = output->add_insts();
+
    for (auto &idx : fetch_index) {
      Tensor *tensor = fetch_inst->add_tensor_array();
      tensor->set_elem_type(1);
      if (model_config->_is_lod_fetch[idx]) {
-        VLOG(2) << "out[" << idx << "] is lod_tensor";
+        VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx]
+                << " is lod_tensor";
        for (int k = 0; k < in->at(idx).shape.size(); ++k) {
          VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
          tensor->add_shape(in->at(idx).shape[k]);
        }
      } else {
-        VLOG(2) << "out[" << idx << "] is tensor";
+        VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx]
+                << " is tensor";
        for (int k = 0; k < in->at(idx).shape.size(); ++k) {
          VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
          tensor->add_shape(in->at(idx).shape[k]);
@@ -111,6 +116,8 @@ int GeneralResponseOp::inference() {
        cap *= in->at(idx).shape[j];
      }
      if (in->at(idx).dtype == paddle::PaddleDType::INT64) {
+        VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx]
+                << "].";
        int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data());
        if (model_config->_is_lod_fetch[idx]) {
          FetchInst *fetch_p = output->mutable_insts(0);
@@ -127,8 +134,11 @@ int GeneralResponseOp::inference() {
            fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
          }
        }
+        VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
        var_idx++;
      } else if (in->at(idx).dtype == paddle::PaddleDType::FLOAT32) {
+        VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx]
+                << "].";
        float *data_ptr = static_cast<float *>(in->at(idx).data.data());
        if (model_config->_is_lod_fetch[idx]) {
          FetchInst *fetch_p = output->mutable_insts(0);
@@ -145,6 +155,7 @@ int GeneralResponseOp::inference() {
            fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
          }
        }
+        VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
        var_idx++;
      }
    }

--- a/core/predictor/framework/infer.h
+++ b/core/predictor/framework/infer.h
@@ -35,6 +35,7 @@ class InferEngineCreationParams {
  InferEngineCreationParams() {
    _path = "";
    _enable_memory_optimization = false;
+    _enable_ir_optimization = false;
    _static_optimization = false;
    _force_update_static_cache = false;
  }
@@ -45,10 +46,16 @@ class InferEngineCreationParams {
    _enable_memory_optimization = enable_memory_optimization;
  }

+  void set_enable_ir_optimization(bool enable_ir_optimization) {
+    _enable_ir_optimization = enable_ir_optimization;
+  }
+
  bool enable_memory_optimization() const {
    return _enable_memory_optimization;
  }

+  bool enable_ir_optimization() const { return _enable_ir_optimization; }
+
  void set_static_optimization(bool static_optimization = false) {
    _static_optimization = static_optimization;
  }
@@ -68,6 +75,7 @@ class InferEngineCreationParams {
              << "model_path = " << _path << ", "
              << "enable_memory_optimization = " << _enable_memory_optimization
              << ", "
+              << "enable_ir_optimization = " << _enable_ir_optimization << ", "
              << "static_optimization = " << _static_optimization << ", "
              << "force_update_static_cache = " << _force_update_static_cache;
  }
@@ -75,6 +83,7 @@ class InferEngineCreationParams {
 private:
  std::string _path;
  bool _enable_memory_optimization;
+  bool _enable_ir_optimization;
  bool _static_optimization;
  bool _force_update_static_cache;
 };
@@ -150,6 +159,11 @@ class ReloadableInferEngine : public InferEngine {
      force_update_static_cache = conf.force_update_static_cache();
    }

+    if (conf.has_enable_ir_optimization()) {
+      _infer_engine_params.set_enable_ir_optimization(
+          conf.enable_ir_optimization());
+    }
+
    _infer_engine_params.set_path(_model_data_path);
    if (enable_memory_optimization) {
      _infer_engine_params.set_enable_memory_optimization(true);

--- a/core/sdk-cpp/include/endpoint_config.h
+++ b/core/sdk-cpp/include/endpoint_config.h
@@ -22,23 +22,23 @@ namespace baidu {
 namespace paddle_serving {
 namespace sdk_cpp {

-#define PARSE_CONF_ITEM(conf, item, name, fail)             \
-  do {                                                      \
-    if (conf.has_##name()) {                                \
-      item.set(conf.name());                                \
-    } else {                                                \
-      LOG(ERROR) << "Not found key in configue: " << #name; \
-    }                                                       \
+#define PARSE_CONF_ITEM(conf, item, name, fail)          \
+  do {                                                   \
+    if (conf.has_##name()) {                             \
+      item.set(conf.name());                             \
+    } else {                                             \
+      VLOG(2) << "Not found key in configue: " << #name; \
+    }                                                    \
  } while (0)

-#define ASSIGN_CONF_ITEM(dest, src, fail)                          \
-  do {                                                             \
-    if (!src.init) {                                               \
-      LOG(ERROR) << "Cannot assign an unintialized item: " << #src \
-                 << " to dest: " << #dest;                         \
-      return fail;                                                 \
-    }                                                              \
-    dest = src.value;                                              \
+#define ASSIGN_CONF_ITEM(dest, src, fail)                       \
+  do {                                                          \
+    if (!src.init) {                                            \
+      VLOG(2) << "Cannot assign an unintialized item: " << #src \
+              << " to dest: " << #dest;                         \
+      return fail;                                              \
+    }                                                           \
+    dest = src.value;                                           \
  } while (0)

 template <typename T>

--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -21,7 +21,7 @@ The following Python code will process the data `test_data/part-0` and write to

 [//file]:#process.py
 ``` python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')

@@ -39,7 +39,7 @@ Here, we [use docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/R
 First, start the BOW server, which enables the `8000` port:

 ``` shell
-docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it bow-server bash
 pip install paddle-serving-server
 python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
@@ -49,7 +49,7 @@ exit
 Similarly, start the LSTM server, which enables the `9000` port:

 ```bash
-docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it lstm-server bash
 pip install paddle-serving-server
 python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
@@ -78,7 +78,7 @@ with open('processed.data') as f:
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1

@@ -88,7 +88,7 @@ with open('processed.data') as f:

 In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.

-When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
+When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.

 ### Expected Results


--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -20,7 +20,7 @@ sh get_data.sh
 下面Python代码将处理`test_data/part-0`的数据，写入`processed.data`文件中。

 ```python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')

@@ -38,7 +38,7 @@ with open('test_data/part-0') as fin:
 首先启动BOW Server，该服务启用`8000`端口：

 ```bash
-docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it bow-server bash
 pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
 python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
@@ -48,7 +48,7 @@ exit
 同理启动LSTM Server，该服务启用`9000`端口：

 ```bash
-docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it lstm-server bash
 pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
 python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
@@ -76,7 +76,7 @@ with open('processed.data') as f:
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1


--- a/doc/BERT_10_MINS_CN.md
+++ b/doc/BERT_10_MINS_CN.md
@@ -13,10 +13,10 @@ import paddlehub as hub
 model_name = "bert_chinese_L-12_H-768_A-12"
 module = hub.Module(model_name)
 inputs, outputs, program = module.context(trainable=True, max_seq_len=20)
-feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask", "pooled_output", "sequence_output"]
+feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask"]
 fetch_keys = ["pooled_output", "sequence_output"]
 feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
-fetch_dict = dict(zip(fetch_keys, [outputs[x]] for x in fetch_keys))
+fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys]))

 import paddle_serving_client.io as serving_io
 serving_io.save_model("bert_seq20_model", "bert_seq20_client", feed_dict, fetch_dict, program)

--- a/doc/COMPILE.md
+++ b/doc/COMPILE.md
@@ -9,14 +9,18 @@
 - Golang: 1.9.2 and later
 - Git：2.17.1 and later
 - CMake：3.2.2 and later
- Python：2.7.2 and later
+- Python：2.7.2 and later / 3.6 and later

 It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you: 

- CPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-devel`，dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
- GPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu-devel`，dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
+- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`，dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
+- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`，dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)

-This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python 3, just adjust the Python options of cmake.
+This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
+
+- Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
+- Set  `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
+- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`

 ## Get Code

@@ -32,6 +36,8 @@ cd Serving && git submodule update --init --recursive
 export PYTHONROOT=/usr/
 ```

+In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
+
 ## Compile Server

 ### Integrated CPU version paddle inference library
@@ -54,6 +60,8 @@ make -j10

 execute `make install` to put targets under directory `./output`

+**Attention：** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
+
 ## Compile Client

 ``` shell

--- a/doc/COMPILE_CN.md
+++ b/doc/COMPILE_CN.md
@@ -9,14 +9,18 @@
 - Golang: 1.9.2及以上
 - Git：2.17.1及以上
 - CMake：3.2.2及以上
- Python：2.7.2及以上
+- Python：2.7.2及以上 / 3.6及以上

 推荐使用Docker编译，我们已经为您准备好了Paddle Serving编译环境：

- CPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-devel`，dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
- GPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu-devel`，dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
+- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`，dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
+- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`，dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)

-本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译，只需要调整cmake的Python相关选项即可。
+本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译，只需要调整cmake的Python相关选项即可：
+
+- 将`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
+- 将`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
+- 将`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`

 ## 获取代码

@@ -32,6 +36,8 @@ cd Serving && git submodule update --init --recursive
 export PYTHONROOT=/usr/
 ```

+我们提供默认Centos7的Python路径为`/usr/bin/python`，如果您要使用我们的Centos6镜像，需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`。
+
 ## 编译Server部分

 ### 集成CPU版本Paddle Inference Library
@@ -54,6 +60,8 @@ make -j10

 执行`make install`可以把目标产出放在`./output`目录下。

+**注意：** 编译成功后，需要设置`SERVING_BIN`路径，详见后面的[注意事项](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE_CN.md#注意事项)。
+
 ## 编译Client部分

 ``` shell

--- a/doc/DESIGN_DOC_CN.md
+++ b/doc/DESIGN_DOC_CN.md
@@ -26,7 +26,7 @@ serving_io.save_model("serving_model", "client_conf",
                      {"words": data}, {"prediction": prediction},
                      fluid.default_main_program())
 ```
-代码示例中，`{"words": data}`和`{"prediction": prediction}`分别指定了模型的输入和输出，`"words"`和`"prediction"`是输出和输出变量的别名，设计别名的目的是为了使开发者能够记忆自己训练模型的输入输出对应的字段。`data`和`prediction`则是Paddle训练过程中的`[Variable](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Variable_cn.html#variable)`，通常代表张量([Tensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Tensor_cn.html#tensor))或变长张量([LodTensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#lodtensor))。调用保存命令后，会按照用户指定的`"serving_model"`和`"client_conf"`生成两个目录，内容如下：
+代码示例中，`{"words": data}`和`{"prediction": prediction}`分别指定了模型的输入和输出，`"words"`和`"prediction"`是输入和输出变量的别名，设计别名的目的是为了使开发者能够记忆自己训练模型的输入输出对应的字段。`data`和`prediction`则是Paddle训练过程中的`[Variable](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Variable_cn.html#variable)`，通常代表张量([Tensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Tensor_cn.html#tensor))或变长张量([LodTensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#lodtensor))。调用保存命令后，会按照用户指定的`"serving_model"`和`"client_conf"`生成两个目录，内容如下：
 ``` shell
 .
 ├── client_conf

--- a/doc/HOT_LOADING_IN_SERVING.md
+++ b/doc/HOT_LOADING_IN_SERVING.md
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold

 ### Product model

-Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
+Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.

 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())

 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
-    os.system('{} fs -put -f {} {}'.format(
-      hadoop_bin, local_file_path, remote_path))
+    afs = 'afs://***.***.***.***:***' # User needs to change
+    uci = '***,***' # User needs to change
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))

 name = "uci_housing"
 for pass_id in range(30):

--- a/doc/HOT_LOADING_IN_SERVING_CN.md
+++ b/doc/HOT_LOADING_IN_SERVING_CN.md
@@ -46,7 +46,7 @@ Paddle Serving提供了一个自动监控脚本，远端地址更新模型后会

 ### 生产模型

-在`product_path`下运行下面的Python代码生产模型，每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
+在`product_path`下运行下面的Python代码生产模型（运行前需要修改hadoop相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。

 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())

 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
-    os.system('{} fs -put -f {} {}'.format(
-      hadoop_bin, local_file_path, remote_path))
+    afs = 'afs://***.***.***.***:***' # User needs to change
+    uci = '***,***' # User needs to change
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))

 name = "uci_housing"
 for pass_id in range(30):

--- a/doc/LATEST_PACKAGES.md
+++ b/doc/LATEST_PACKAGES.md
+# Latest Wheel Packages
+
+## CPU server
+### Python 3
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py3-none-any.whl
+```
+
+### Python 2
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py2-none-any.whl
+```
+
+## GPU server
+### Python 3
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py3-none-any.whl
+```
+### Python 2
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py2-none-any.whl
+```
+
+## Client
+### Python 3.7
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp37-none-manylinux1_x86_64.whl
+```
+### Python 3.6
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp36-none-manylinux1_x86_64.whl
+```
+### Python 2.7
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp27-none-manylinux1_x86_64.whl
+```
+
+## App
+### Python 3
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py3-none-any.whl
+```
+
+### Python 2
+```
+https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py2-none-any.whl
+```
--- a/doc/PERFORMANCE_OPTIM.md
+++ b/doc/PERFORMANCE_OPTIM.md
+# Performance Optimization
+
+([简体中文](./PERFORMANCE_OPTIM_CN.md)|English)
+
+Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
+
+For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
+
+For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
+
+For computation-intensive prediction services, you can use GPU prediction services instead of CPU prediction services, or increase the number of graphics cards for GPU prediction services.
+
+Under the same conditions, the communication time of the HTTP prediction service provided by Paddle Serving is longer than that of the RPC prediction service, so for communication-intensive services, please give priority to using RPC communication.
+
+Parameters for performance optimization:
+
+| Parameters | Type | Default | Description                                                  |
+| ---------- | ---- | ------- | ------------------------------------------------------------ |
+| mem_optim  | bool | False   | Enable memory / graphic memory optimization                                   |
+| ir_optim   | bool | Fasle   | Enable analysis and optimization of calculation graph,including OP fusion, etc |
--- a/doc/PERFORMANCE_OPTIM_CN.md
+++ b/doc/PERFORMANCE_OPTIM_CN.md
 # 性能优化

-由于模型结构的不同，在执行预测时不同的预测对计算资源的消耗也不相同，对于在线的预测服务来说，对计算资源要求较少的模型，通信的时间成本占比就会较高，称为通信密集型服务，对计算资源要求较多的模型，推理计算的时间成本较高，称为计算密集型服务。对于这两种服务类型，可以根据实际需求采取不同的方式进行优化
+(简体中文|[English](./PERFORMANCE_OPTIM.md))
+
+由于模型结构的不同，在执行预测时不同的预测服务对计算资源的消耗也不相同。对于在线的预测服务来说，对计算资源要求较少的模型，通信的时间成本占比就会较高，称为通信密集型服务，对计算资源要求较多的模型，推理计算的时间成本较高，称为计算密集型服务。对于这两种服务类型，可以根据实际需求采取不同的方式进行优化

 对于一个预测服务来说，想要判断属于哪种类型，最简单的方法就是看时间占比，Paddle Serving提供了[Timeline工具](../python/examples/util/README_CN.md)，可以直观的展现预测服务中各阶段的耗时。

@@ -10,4 +12,9 @@

 在相同条件下，Paddle Serving提供的HTTP预测服务的通信时间是大于RPC预测服务的，因此对于通信密集型的服务请优先考虑使用RPC的通信方式。

-对于模型较大，预测服务内存或显存占用较多的情况，可以通过将--mem_optim选项设置为True来开启内存/显存优化。
+性能优化相关参数：
+
+| 参数      | 类型 | 默认值 | 含义                      |
+| --------- | ---- | ------ | -------------------------------- |
+| mem_optim | bool | False  | 开启内存/显存优化                |
+| ir_optim  | bool | Fasle  | 开启计算图分析优化，包括OP融合等 |
--- a/doc/RUN_IN_DOCKER.md
+++ b/doc/RUN_IN_DOCKER.md
@@ -17,7 +17,7 @@ You can get images in two ways:
 1. Pull image directly

   ```bash
-   docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
+   docker pull hub.baidubce.com/paddlepaddle/serving:latest
   ```

 2. Building image based on dockerfile
@@ -25,13 +25,13 @@ You can get images in two ways:
   Create a new folder and copy [Dockerfile](../tools/Dockerfile) to this folder, and run the following command:

   ```bash
-   docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0 .
+   docker build -t hub.baidubce.com/paddlepaddle/serving:latest .
   ```

 ### Create container

 ```bash
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it test bash
 ```

@@ -53,12 +53,6 @@ pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple

 ### Test example

-Before running the GPU version of the Server side code, you need to set the `CUDA_VISIBLE_DEVICES` environment variable to specify which GPUs the prediction service uses. The following example specifies two GPUs with indexes 0 and 1:
-
-```bash
-export CUDA_VISIBLE_DEVICES=0,1
-```
-
 Get the trained Boston house price prediction model by the following command:

 ```bash
@@ -71,13 +65,13 @@ tar -xzf uci_housing.tar.gz
  Running on the Server side (inside the container):

  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci &>std.log 2>err.log &
+  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
  ```

  Running on the Client side (inside or outside the container):

  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
  ```

 - Test RPC service
@@ -85,7 +79,7 @@ tar -xzf uci_housing.tar.gz
  Running on the Server side (inside the container):

  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 &>std.log 2>err.log &
+  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
  ```

  Running following Python code on the Client side (inside or outside the container, The `paddle-serving-client` package needs to be installed):
@@ -115,7 +109,7 @@ You can also get images in two ways:
 1. Pull image directly

   ```bash
-   nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+   nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
   ```

 2. Building image based on dockerfile
@@ -123,13 +117,13 @@ You can also get images in two ways:
   Create a new folder and copy [Dockerfile.gpu](../tools/Dockerfile.gpu) to this folder, and run the following command:

   ```bash
-   nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu .
+   nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:latest-gpu .
   ```

 ### Create container

 ```bash
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
 nvidia-docker exec -it test bash
 ```

@@ -176,7 +170,7 @@ tar -xzf uci_housing.tar.gz
  Running on the Client side (inside or outside the container):

  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
  ```

 - Test RPC service

--- a/doc/RUN_IN_DOCKER_CN.md
+++ b/doc/RUN_IN_DOCKER_CN.md
@@ -17,7 +17,7 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
 1. 直接拉取镜像

   ```bash
-   docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
+   docker pull hub.baidubce.com/paddlepaddle/serving:latest
   ```

 2. 基于Dockerfile构建镜像
@@ -25,13 +25,13 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
   建立新目录，复制[Dockerfile](../tools/Dockerfile)内容到该目录下Dockerfile文件。执行

   ```bash
-   docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0 .
+   docker build -t hub.baidubce.com/paddlepaddle/serving:latest .
   ```

 ### 创建容器并进入

 ```bash
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
 docker exec -it test bash
 ```

@@ -65,13 +65,13 @@ tar -xzf uci_housing.tar.gz
  在Server端（容器内）运行：

  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci &>std.log 2>err.log &
+  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
  ```

  在Client端（容器内或容器外）运行：

  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
  ```

 - 测试RPC服务
@@ -79,7 +79,7 @@ tar -xzf uci_housing.tar.gz
  在Server端（容器内）运行：

  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 &>std.log 2>err.log &
+  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
  ```

  在Client端（容器内或容器外，需要安装`paddle-serving-client`包）运行下面Python代码：
@@ -107,7 +107,7 @@ GPU版本与CPU版本基本一致，只有部分接口命名的差别（GPU版
 1. 直接拉取镜像

   ```bash
-   nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+   nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
   ```

 2. 基于Dockerfile构建镜像
@@ -115,13 +115,13 @@ GPU版本与CPU版本基本一致，只有部分接口命名的差别（GPU版
   建立新目录，复制[Dockerfile.gpu](../tools/Dockerfile.gpu)内容到该目录下Dockerfile文件。执行

   ```bash
-   nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu .
+   nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:latest-gpu .
   ```

 ### 创建容器并进入

 ```bash
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
 nvidia-docker exec -it test bash
 ```

@@ -168,7 +168,7 @@ tar -xzf uci_housing.tar.gz
  在Client端（容器内或容器外）运行：

  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
  ```

 - 测试RPC服务

--- a/doc/SAVE.md
+++ b/doc/SAVE.md
-## How to save a servable model of Paddle Serving?
+# How to save a servable model of Paddle Serving?

 ([简体中文](./SAVE_CN.md)|English)

- Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
+## Save from training or prediction script 
+Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
 ``` python
 import paddle_serving_client.io as serving_io
 serving_io.save_model("imdb_model", "imdb_client_conf",
                      {"words": data}, {"prediction": prediction},
                      fluid.default_main_program())
 ```
-`imdb_model` is the server side model with serving configurations. `imdb_client_conf` is the client rpc configurations. Serving has a 
-dictionary for `Feed` and `Fetch` variables for client to assign. In the example, `{"words": data}` is the feed dict that specify the input of saved inference model. `{"prediction": prediction}` is the fetch dic that specify the output of saved inference model. An alias name can be defined for feed and fetch variables. An example of how to use alias name
+`imdb_model` is the server side model with serving configurations. `imdb_client_conf` is the client rpc configurations. 
+
+Serving has a dictionary for `Feed` and `Fetch` variables for client to assign. In the example, `{"words": data}` is the feed dict that specify the input of saved inference model. `{"prediction": prediction}` is the fetch dic that specify the output of saved inference model. An alias name can be defined for feed and fetch variables. An example of how to use alias name
 is as follows:
 ``` python
 from paddle_serving_client import Client
@@ -29,3 +31,19 @@ for line in sys.stdin:
    fetch_map = client.predict(feed=feed, fetch=fetch)
    print("{} {}".format(fetch_map["prediction"][1], label[0]))
 ```
+
+## Export from saved model files
+If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
+```python
+import paddle_serving_client.io as serving_io
+serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
+```
+dirname (str) - Path of saved model files. Program file and parameter files are saved in this directory.
+
+serving_server (str, optional) - The path of model files and configuration files for server. Default: "serving_server".
+
+serving_client (str, optional) - The path of configuration files for client. Default: "serving_client".
+
+model_filename (str, optional) - The name of file to load the inference program. If it is None, the default filename `__model__` will be used. Default: None.
+
+paras_filename (str, optional) - The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. Default: None.
--- a/doc/SAVE_CN.md
+++ b/doc/SAVE_CN.md
-## 怎样保存用于Paddle Serving的模型？
+# 怎样保存用于Paddle Serving的模型？

 (简体中文|[English](./SAVE.md))

- 目前，Paddle Serving提供了一个save_model接口供用户访问，该接口与Paddle的`save_inference_model`类似。
+## 从训练或预测脚本中保存
+目前，Paddle Serving提供了一个save_model接口供用户访问，该接口与Paddle的`save_inference_model`类似。

 ``` python
 import paddle_serving_client.io as serving_io
@@ -10,7 +11,9 @@ serving_io.save_model("imdb_model", "imdb_client_conf",
                      {"words": data}, {"prediction": prediction},
                      fluid.default_main_program())
 ```
-imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。 Serving有一个 提供给用户存放Feed和Fetch变量信息的字典。 在示例中，`{words”：data}` 是用于指定已保存推理模型输入的提要字典。`{"prediction"：projection}`是指定保存的推理模型输出的字典。可以为feed和fetch变量定义一个别名。 如何使用别名的例子 示例如下：
+imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。
+
+Serving有一个提供给用户存放Feed和Fetch变量信息的字典。 在示例中，`{"words"：data}` 是用于指定已保存推理模型输入的提要字典。`{"prediction"：projection}`是指定保存的推理模型输出的字典。可以为feed和fetch变量定义一个别名。 如何使用别名的例子 示例如下：

 ``` python
 from paddle_serving_client import Client
@@ -29,3 +32,19 @@ for line in sys.stdin:
    fetch_map = client.predict(feed=feed, fetch=fetch)
    print("{} {}".format(fetch_map["prediction"][1], label[0]))
 ```
+
+## 从已保存的模型文件中导出
+如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型，则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
+```python
+import paddle_serving_client.io as serving_io
+serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client",  model_filename=None, params_filename=None)
+```
+dirname (str) – 需要转换的模型文件存储路径，Program结构文件和参数文件均保存在此目录。
+
+serving_server (str, 可选) - 转换后的模型文件和配置文件的存储路径。默认值为serving_server。
+
+serving_client (str, 可选) - 转换后的客户端配置文件存储路径。默认值为serving_client。
+
+model_filename (str，可选) – 存储需要转换的模型Inference Program结构的文件名称。如果设置为None，则使用 `__model__` 作为默认的文件名。默认值为None。
+
+params_filename (str，可选) – 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的二进制文件中，它才需要被指定。如果模型参数是存储在各自分离的文件中，设置它的值为None。默认值为None。
--- a/doc/TRAIN_TO_SERVICE.md
+++ b/doc/TRAIN_TO_SERVICE.md
@@ -350,12 +350,12 @@ In the above command, the first parameter is the saved server-side model and con
 After starting the HTTP prediction service, you can make prediction with a single command:

 ```
-curl -H "Content-Type: application/json" -X POST -d '{"words": "i am very sad | 0", "fetch": ["prediction"]}' http://127.0.0.1:9292/imdb/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
 ```
 When the inference process is normal, the prediction probability is returned, as shown below.

 ```
-{"prediction": [0.5592559576034546,0.44074398279190063]}
+{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
 ```

 **Note**: The effect of each model training may be slightly different, and the inferred probability value using the trained model may not be consistent with the example.
--- a/doc/TRAIN_TO_SERVICE_CN.md
+++ b/doc/TRAIN_TO_SERVICE_CN.md
@@ -353,12 +353,12 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
 启动完HTTP预测服务，即可通过一行命令进行预测：

 ```
-curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
 ```
 预测流程正常时，会返回预测概率，示例如下。

 ```
-{"prediction":[0.5592559576034546,0.44074398279190063]}
+{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
 ```

 **注意**：每次模型训练的效果可能略有不同，使用训练出的模型预测概率数值可能与示例不一致。
--- a/doc/UWSGI_DEPLOY.md
+++ b/doc/UWSGI_DEPLOY.md
-# 使用uwsgi启动HTTP预测服务
+# Deploy HTTP service with uWSGI

-在提供的fit_a_line示例中，启动HTTP预测服务后会看到有以下信息：
+([简体中文](./UWSGI_DEPLOY_CN.md)|English)
+
+In fit_a_line example, after starting the HTTP prediction service, you will see the following information:

 ```shell
 web service address:
@@ -13,46 +15,31 @@ http://10.127.3.150:9393/uci/prediction
 * Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
 ```

-这里会提示启动的HTTP服务是开发模式，并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发，实际部署过程中配合需要WSGI（Web Server Gateway Interface）使用。
+Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. 
+The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.

-下面我们展示一下如何使用[uWSGI](https://github.com/unbit/uwsgi)模块来部署HTTP预测服务用于生产环境。
+Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.

-编写HTTP服务脚本

 ```python
 #uwsgi_service.py
 from paddle_serving_server.web_service import WebService
-from flask import Flask, request

-#配置预测服务
+#Define prediction service
 uci_service = WebService(name = "uci")
 uci_service.load_model_config("./uci_housing_model")
 uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_server()
-
-#配置flask服务
-app_instance = Flask(__name__)
-@app_instance.before_first_request
-def init():
-    global uci_service
-    uci_service._launch_web_service()
-
-service_name = "/" + uci_service.name + "/prediction"
-@app_instance.route(service_name, methods=["POST"])
-def run():
-    return uci_service.get_prediction(request)
-
-#run方法用于直接调试中直接启动服务
-if __name__ == "__main__":
-    app_instance.run()
+uci_service.run_rpc_service()
+#Get flask application
+app_instance = uci_service.get_app_instance()
 ```

-使用uwsgi启动HTTP服务
+Start service with uWSGI

 ```bash
-uwsgi --http :9000 --wsgi-file uwsgi_service.py --callable app_instance --processes 4
+uwsgi --http :9393 --module uwsgi_service:app_instance
 ```

-使用--processes参数可以指定服务的进程数，请注意目前Serving HTTP 服务暂时不支持多线程的方式使用。
+Use the --processes parameter to specify the number of service processes. 

-更多uWSGI的信息请参考[uWSGI使用文档](https://uwsgi-docs.readthedocs.io/en/latest/)
+For more information about uWSGI, please refer to [uWSGI documentation](https://uwsgi-docs.readthedocs.io/en/latest/)
--- a/doc/UWSGI_DEPLOY_CN.md
+++ b/doc/UWSGI_DEPLOY_CN.md
+# 使用uwsgi启动HTTP预测服务
+
+(简体中文|[English](./UWSGI_DEPLOY.md))
+
+在提供的fit_a_line示例中，启动HTTP预测服务后会看到有以下信息：
+
+```shell
+web service address:
+http://10.127.3.150:9393/uci/prediction
+ * Serving Flask app "serve" (lazy loading)
+ * Environment: production
+   WARNING: This is a development server. Do not use it in a production deployment.
+   Use a production WSGI server instead.
+ * Debug mode: off
+ * Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
+```
+
+这里会提示启动的HTTP服务是开发模式，并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发，实际部署过程中配合需要WSGI（Web Server Gateway Interface）使用。
+
+下面我们展示一下如何使用[uWSGI](https://github.com/unbit/uwsgi)模块来部署HTTP预测服务用于生产环境。
+
+编写HTTP服务脚本
+
+```python
+#uwsgi_service.py
+from paddle_serving_server.web_service import WebService
+
+#配置预测服务
+uci_service = WebService(name = "uci")
+uci_service.load_model_config("./uci_housing_model")
+uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
+uci_service.run_rpc_service()
+#获取flask服务
+app_instance = uci_service.get_app_instance()
+```
+
+使用uwsgi启动HTTP服务
+
+```bash
+uwsgi --http :9393 --module uwsgi_service:app_instance
+```
+
+使用--processes参数可以指定服务的进程数。
+
+更多uWSGI的信息请参考[uWSGI使用文档](https://uwsgi-docs.readthedocs.io/en/latest/)
--- a/paddle_inference/inferencer-fluid-cpu/include/fluid_cpu_engine.h
+++ b/paddle_inference/inferencer-fluid-cpu/include/fluid_cpu_engine.h
@@ -194,6 +194,12 @@ class FluidCpuAnalysisDirCore : public FluidFamilyCore {
      analysis_config.EnableMemoryOptim();
    }

+    if (params.enable_ir_optimization()) {
+      analysis_config.SwitchIrOptim(true);
+    } else {
+      analysis_config.SwitchIrOptim(false);
+    }
+
    AutoLock lock(GlobalPaddleCreateMutex::instance());
    _core =
        paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(analysis_config);

--- a/paddle_inference/inferencer-fluid-gpu/include/fluid_gpu_engine.h
+++ b/paddle_inference/inferencer-fluid-gpu/include/fluid_gpu_engine.h
@@ -198,6 +198,12 @@ class FluidGpuAnalysisDirCore : public FluidFamilyCore {
      analysis_config.EnableMemoryOptim();
    }

+    if (params.enable_ir_optimization()) {
+      analysis_config.SwitchIrOptim(true);
+    } else {
+      analysis_config.SwitchIrOptim(false);
+    }
+
    AutoLock lock(GlobalPaddleCreateMutex::instance());
    _core =
        paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(analysis_config);

--- a/python/CMakeLists.txt
+++ b/python/CMakeLists.txt
@@ -19,6 +19,8 @@ endif()
 if (CLIENT)
 configure_file(${CMAKE_CURRENT_SOURCE_DIR}/setup.py.client.in
    ${CMAKE_CURRENT_BINARY_DIR}/setup.py)
+configure_file(${CMAKE_CURRENT_SOURCE_DIR}/../tools/python_tag.py
+    ${CMAKE_CURRENT_BINARY_DIR}/python_tag.py)
 endif()

 if (APP)
@@ -43,7 +45,8 @@ if (APP)
 add_custom_command(
        OUTPUT ${PADDLE_SERVING_BINARY_DIR}/.timestamp
        COMMAND cp -r ${CMAKE_CURRENT_SOURCE_DIR}/paddle_serving_app/ ${PADDLE_SERVING_BINARY_DIR}/python/
-        COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel)
+        COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel
+        DEPENDS ${SERVING_APP_CORE} general_model_config_py_proto ${PY_FILES})
 add_custom_target(paddle_python ALL DEPENDS ${PADDLE_SERVING_BINARY_DIR}/.timestamp)
 endif()

@@ -52,6 +55,7 @@ add_custom_command(
 	OUTPUT ${PADDLE_SERVING_BINARY_DIR}/.timestamp
 	COMMAND cp -r ${CMAKE_CURRENT_SOURCE_DIR}/paddle_serving_client/ ${PADDLE_SERVING_BINARY_DIR}/python/
 	COMMAND ${CMAKE_COMMAND} -E copy ${SERVING_CLIENT_CORE} ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/serving_client.so
+    COMMAND env ${py_env} ${PYTHON_EXECUTABLE} python_tag.py
 	COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel
 	DEPENDS ${SERVING_CLIENT_CORE} sdk_configure_py_proto ${PY_FILES})
 add_custom_target(paddle_python ALL DEPENDS serving_client ${PADDLE_SERVING_BINARY_DIR}/.timestamp)

--- a/python/examples/bert/README.md
+++ b/python/examples/bert/README.md
@@ -71,28 +71,3 @@ set environmental variable to specify which gpus are used, the command above mea
 ```
 curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
 ```
-
-### Benchmark
-
-Model：bert_chinese_L-12_H-768_A-12
-
-GPU：GPU V100 * 1
-
-CUDA/cudnn Version：CUDA 9.2，cudnn 7.1.4
-
-
-In the test, 10 thousand samples in the sample data are copied into 100 thousand samples. Each client thread sends a sample of the number of threads. The batch size is 1, the max_seq_len is 20(not 128 as described above), and the time unit is seconds.
-
-When the number of client threads is 4, the prediction speed can reach 432 samples per second.
-Because a single GPU can only perform serial calculations internally, increasing the number of client threads can only reduce the idle time of the GPU. Therefore, after the number of threads reaches 4, the increase in the number of threads does not improve the prediction speed.
-
-| client  thread num | prepro | client infer | op0   | op1    | op2  | postpro | total  |
-| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
-| 1                  | 3.05   | 290.54       | 0.37  | 239.15 | 6.43 | 0.71    | 365.63 |
-| 4                  | 0.85   | 213.66       | 0.091 | 200.39 | 1.62 | 0.2     | 231.45 |
-| 8                  | 0.42   | 223.12       | 0.043 | 110.99 | 0.8  | 0.098   | 232.05 |
-| 12                 | 0.32   | 225.26       | 0.029 | 73.87  | 0.53 | 0.078   | 231.45 |
-| 16                 | 0.23   | 227.26       | 0.022 | 55.61  | 0.4  | 0.056   | 231.9  |
-
-the following is the client thread num - latency bar chart:
-![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
--- a/python/examples/bert/README_CN.md
+++ b/python/examples/bert/README_CN.md
@@ -67,27 +67,3 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien
 ```
 curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
 ```
-
-### Benchmark
-
-模型：bert_chinese_L-12_H-768_A-12
-
-设备：GPU V100 * 1
-
-环境：CUDA 9.2，cudnn 7.1.4
-
-测试中将样例数据中的1W个样本复制为10W个样本，每个client线程发送线程数分之一个样本，batch size为1，max_seq_len为20（而不是上面的128），时间单位为秒.
-
-在client线程数为4时，预测速度可以达到432样本每秒。
-由于单张GPU内部只能串行计算，client线程增多只能减少GPU的空闲时间，因此在线程数达到4之后，线程数增多对预测速度没有提升。
-
-| client  thread num | prepro | client infer | op0   | op1    | op2  | postpro | total  |
-| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
-| 1                  | 3.05   | 290.54       | 0.37  | 239.15 | 6.43 | 0.71    | 365.63 |
-| 4                  | 0.85   | 213.66       | 0.091 | 200.39 | 1.62 | 0.2     | 231.45 |
-| 8                  | 0.42   | 223.12       | 0.043 | 110.99 | 0.8  | 0.098   | 232.05 |
-| 12                 | 0.32   | 225.26       | 0.029 | 73.87  | 0.53 | 0.078   | 231.45 |
-| 16                 | 0.23   | 227.26       | 0.022 | 55.61  | 0.4  | 0.056   | 231.9  |
-
-总耗时变化规律如下：  
-![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
--- a/python/examples/bert/benchmark.py
+++ b/python/examples/bert/benchmark.py
@@ -22,11 +22,8 @@ import time
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import benchmark_args, show_latency
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
-from bert_reader import BertReader
+from paddle_serving_app.reader import ChineseBertReader
+
 args = benchmark_args()


@@ -45,8 +42,7 @@ def single_func(idx, resource):
        latency_list = []

    if args.request == "rpc":
-        reader = BertReader(vocab_file="vocab.txt", max_seq_len=20)
-
+        reader = ChineseBertReader({"max_seq_len": 128})
        fetch = ["pooled_output"]
        client = Client()
        client.load_client_config(args.model)
@@ -78,7 +74,10 @@ def single_func(idx, resource):
    elif args.request == "http":
        raise ("not implemented")
    end = time.time()
-    return [[end - start], latency_list]
+    if latency_flags:
+        return [[end - start], latency_list]
+    else:
+        return [[end - start]]


 if __name__ == '__main__':
@@ -86,7 +85,7 @@ if __name__ == '__main__':
    endpoint_list = [
        "127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295"
    ]
-    turns = 1000
+    turns = 10
    start = time.time()
    result = multi_thread_runner.run(
        single_func, args.thread, {"endpoint": endpoint_list,

--- a/python/examples/bert/benchmark.sh
+++ b/python/examples/bert/benchmark.sh
@@ -3,25 +3,25 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
 export FLAGS_profile_server=1
 export FLAGS_profile_client=1
 export FLAGS_serving_latency=1
-python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
+python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog &

 sleep 5

 #warm up
-$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1

 for thread_num in 4 8 16
 do
 for batch_size in 1 4 16 64 256
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+    python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
    echo "model name :" $1
    echo "thread num :" $thread_num
    echo "batch size :" $batch_size
    echo "=================Done===================="
    echo "model name :$1" >> profile_log_$1
    echo "batch size :$batch_size" >> profile_log_$1
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log_$1
+    python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
    tail -n 8 profile >> profile_log_$1
    echo "" >> profile_log_$1
 done

--- a/python/examples/bert/bert_client.py
+++ b/python/examples/bert/bert_client.py
@@ -25,7 +25,7 @@ from paddlehub.common.logger import logger
 import socket
 from paddle_serving_client import Client
 from paddle_serving_client.utils import benchmark_args
-from paddle_serving_app import ChineseBertReader
+from paddle_serving_app.reader import ChineseBertReader

 args = benchmark_args()


--- a/python/examples/bert/bert_web_service.py
+++ b/python/examples/bert/bert_web_service.py
@@ -14,19 +14,22 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
 from paddle_serving_server_gpu.web_service import WebService
-from bert_reader import BertReader
+from paddle_serving_app.reader import ChineseBertReader
 import sys
 import os


 class BertService(WebService):
    def load(self):
-        self.reader = BertReader(vocab_file="vocab.txt", max_seq_len=128)
+        self.reader = ChineseBertReader({
+            "vocab_file": "vocab.txt",
+            "max_seq_len": 128
+        })

-    def preprocess(self, feed={}, fetch=[]):
-        feed_res = [{
-            "words": self.reader.process(ins["words"].encode("utf-8"))
-        } for ins in feed]
+    def preprocess(self, feed=[], fetch=[]):
+        feed_res = [
+            self.reader.process(ins["words"].encode("utf-8")) for ins in feed
+        ]
        return feed_res, fetch


@@ -37,5 +40,5 @@ gpu_ids = os.environ["CUDA_VISIBLE_DEVICES"]
 bert_service.set_gpus(gpu_ids)
 bert_service.prepare_server(
    workdir="workdir", port=int(sys.argv[2]), device="gpu")
-bert_service.run_server()
-bert_service.run_flask()
+bert_service.run_rpc_service()
+bert_service.run_web_service()
--- a/python/examples/imagenet/image_http_client.py
+++ b/python/examples/imagenet/image_http_client.py
@@ -12,37 +12,32 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import requests
-import base64
-import json
-import time
-import os
+from paddle_serving_client import Client
+from paddle_serving_app.reader import ChineseBertReader
 import sys

-py_version = sys.version_info[0]
+client = Client()
+client.load_client_config("./bert_seq32_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])

+reader = ChineseBertReader({"max_seq_len": 32})
+fetch = ["sequence_10", "sequence_12", "pooled_output"]
+expected_shape = {
+    "sequence_10": (4, 32, 768),
+    "sequence_12": (4, 32, 768),
+    "pooled_output": (4, 768)
+}
+batch_size = 4
+feed_batch = []

-def predict(image_path, server):
-    if py_version == 2:
-        image = base64.b64encode(open(image_path).read())
+for line in sys.stdin:
+    feed = reader.process(line)
+    if len(feed_batch) < batch_size:
+        feed_batch.append(feed)
    else:
-        image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
-    req = json.dumps({"feed": [{"image": image}], "fetch": ["score"]})
-    r = requests.post(
-        server, data=req, headers={"Content-Type": "application/json"})
-    try:
-        print(r.json()["result"]["score"])
-    except ValueError:
-        print(r.text)
-    return r
-
-
-if __name__ == "__main__":
-    server = "http://127.0.0.1:9393/image/prediction"
-    image_list = os.listdir("./image_data/n01440764/")
-    start = time.time()
-    for img in image_list:
-        image_file = "./image_data/n01440764/" + img
-        res = predict(image_file, server)
-    end = time.time()
-    print(end - start)
+        fetch_map = client.predict(feed=feed_batch, fetch=fetch)
+        feed_batch = []
+        for var_name in fetch:
+            if fetch_map[var_name].shape != expected_shape[var_name]:
+                print("fetch var {} shape error.".format(var_name))
+                sys.exit(1)
--- a/python/examples/cascade_rcnn/README.md
+++ b/python/examples/cascade_rcnn/README.md
+# Cascade RCNN model on Paddle Serving
+
+([简体中文](./README_CN.md)|English)
+
+### Get The Cascade RCNN Model
+```
+sh get_data.sh
+```
+If you want to have more detection models, please refer to [Paddle Detection Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
+
+### Start the service
+```
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9292 --gpu_id 0
+```
+
+### Perform prediction
+```
+python test_client.py 
+```
+
+Image with bounding boxes and json result would be saved in `output` folder.
--- a/python/examples/cascade_rcnn/README_CN.md
+++ b/python/examples/cascade_rcnn/README_CN.md
+# 使用Paddle Serving部署Cascade RCNN模型
+
+(简体中文|[English](./README.md))
+
+## 获得Cascade RCNN模型
+```
+sh get_data.sh
+```
+如果你想要更多的检测模型，请参考[Paddle检测模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
+
+### 启动服务
+```
+python -m paddle_serving_server_gpu.serve --model serving_server --port 9292 --gpu_id 0
+```
+
+### 执行预测
+```
+python test_client.py
+```
+
+客户端已经为图片做好了后处理，在`output`文件夹下存放各个框的json格式信息还有后处理结果图片。
--- a/python/examples/lac/get_data.sh
+++ b/python/examples/lac/get_data.sh
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-tar -zxvf lac_model_jieba_web.tar.gz
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/cascade_rcnn_r50_fpx_1x_serving.tar.gz
+tar xf cascade_rcnn_r50_fpx_1x_serving.tar.gz
--- a/python/examples/criteo_ctr_with_cube/README.md
+++ b/python/examples/criteo_ctr_with_cube/README.md
@@ -2,16 +2,6 @@

 ([简体中文](./README_CN.md)|English)

-### Compile Source Code
-in the root directory of this git project
-```
-mkdir build_server
-cd build_server
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
-make -j10
-make install -j10
-```
-
 ### Get Sample Dataset

 go to directory `python/examples/criteo_ctr_with_cube`
@@ -31,7 +21,9 @@ the model will be in ./ctr_server_model_kv and ./ctr_client_config.

 ### Start Sparse Parameter Indexing Service
 ```
-cp ../../../build_server/output/bin/cube* ./cube/
+wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz
+tar xf cube_app.tar.gz
+mv cube_app/cube* ./cube/
 sh cube_prepare.sh &
 ```


--- a/python/examples/criteo_ctr_with_cube/README_CN.md
+++ b/python/examples/criteo_ctr_with_cube/README_CN.md
 ## 带稀疏参数索引服务的CTR预测服务
 (简体中文|[English](./README.md))

-### 编译源代码
-在本项目的根目录下，执行
-```
-mkdir build_server
-cd build_server
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
-make -j10
-make install -j10
-```
-
 ### 获取样例数据
 进入目录 `python/examples/criteo_ctr_with_cube`
 ```
@@ -29,7 +19,9 @@ mv models/data ./cube/

 ### 启动稀疏参数索引服务
 ```
-cp ../../../build_server/output/bin/cube* ./cube/
+wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz
+tar xf cube_app.tar.gz
+mv cube_app/cube* ./cube/
 sh cube_prepare.sh &
 ```


--- a/python/examples/deeplabv3/N0060.jpg
+++ b/python/examples/deeplabv3/N0060.jpg
--- a/python/examples/deeplabv3/README.md
+++ b/python/examples/deeplabv3/README.md
+# Image Segmentation
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model deeplabv3
+tar -xzvf deeplabv3.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
+```
+
+### Client Prediction
+
+```
+python deeplabv3_client.py
+```
--- a/python/examples/deeplabv3/README_CN.md
+++ b/python/examples/deeplabv3/README_CN.md
+# 图像分割
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model deeplabv3
+tar -xzvf deeplabv3.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
+```
+
+### 客户端预测
+
+```
+python deeplabv3_client.py
--- a/python/examples/imagenet/image_classification_service.py
+++ b/python/examples/imagenet/image_classification_service.py
@@ -12,30 +12,23 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from paddle_serving_server.web_service import WebService
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, Resize, Transpose, BGR2RGB, SegPostprocess
 import sys
 import cv2
-import base64
-import numpy as np
-from paddle_serving_app import ImageReader

+client = Client()
+client.load_client_config("deeplabv3_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9494"])

-class ImageService(WebService):
-    def preprocess(self, feed={}, fetch=[]):
-        reader = ImageReader()
-        feed_batch = []
-        for ins in feed:
-            if "image" not in ins:
-                raise ("feed data error!")
-            sample = base64.b64decode(ins["image"])
-            img = reader.process_image(sample)
-            feed_batch.append({"image": img})
-        return feed_batch, fetch
+preprocess = Sequential(
+    [File2Image(), Resize(
+        (512, 512), interpolation=cv2.INTER_LINEAR)])

+postprocess = SegPostprocess(2)

-image_service = ImageService(name="image")
-image_service.load_model_config(sys.argv[1])
-image_service.prepare_server(
-    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-image_service.run_server()
-image_service.run_flask()
+filename = "N0060.jpg"
+im = preprocess(filename)
+fetch_map = client.predict(feed={"image": im}, fetch=["output"])
+fetch_map["filename"] = filename
+postprocess(fetch_map)
--- a/python/examples/faster_rcnn_model/README.md
+++ b/python/examples/faster_rcnn_model/README.md
@@ -12,8 +12,8 @@ If you want to have more detection models, please refer to [Paddle Detection Mod
 ### Start the service
 ```
 tar xf faster_rcnn_model.tar.gz
-mv faster_rcnn_model/pddet *.
-GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
+mv faster_rcnn_model/pddet* .
+GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```

 ### Perform prediction

--- a/python/examples/faster_rcnn_model/README_CN.md
+++ b/python/examples/faster_rcnn_model/README_CN.md
@@ -13,7 +13,7 @@ wget https://paddle-serving.bj.bcebos.com/pddet_demo/infer_cfg.yml
 ```
 tar xf faster_rcnn_model.tar.gz
 mv faster_rcnn_model/pddet* ./
-GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
+GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```

 ### 执行预测

--- a/python/examples/faster_rcnn_model/label_list.txt
+++ b/python/examples/faster_rcnn_model/label_list.txt
+background
+person
+bicycle
+car
+motorcycle
+airplane
+bus
+train
+truck
+boat
+traffic light
+fire hydrant
+stop sign
+parking meter
+bench
+bird
+cat
+dog
+horse
+sheep
+cow
+elephant
+bear
+zebra
+giraffe
+backpack
+umbrella
+handbag
+tie
+suitcase
+frisbee
+skis
+snowboard
+sports ball
+kite
+baseball bat
+baseball glove
+skateboard
+surfboard
+tennis racket
+bottle
+wine glass
+cup
+fork
+knife
+spoon
+bowl
+banana
+apple
+sandwich
+orange
+broccoli
+carrot
+hot dog
+pizza
+donut
+cake
+chair
+couch
+potted plant
+bed
+dining table
+toilet
+tv
+laptop
+mouse
+remote
+keyboard
+cell phone
+microwave
+oven
+toaster
+sink
+refrigerator
+book
+clock
+vase
+scissors
+teddy bear
+hair drier
+toothbrush
--- a/python/examples/faster_rcnn_model/test_client.py
+++ b/python/examples/faster_rcnn_model/test_client.py
@@ -13,21 +13,29 @@
 # limitations under the License.

 from paddle_serving_client import Client
+from paddle_serving_app.reader import *
 import sys
-import os
-import time
-from paddle_serving_app.reader.pddet import Detection
 import numpy as np

-py_version = sys.version_info[0]
+preprocess = Sequential([
+    File2Image(), BGR2RGB(), Div(255.0),
+    Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
+    Resize(640, 640), Transpose((2, 0, 1))
+])

-feed_var_names = ['image', 'im_shape', 'im_info']
-fetch_var_names = ['multiclass_nms']
-pddet = Detection(config_path=sys.argv[2], output_dir="./output")
-feed_dict = pddet.preprocess(feed_var_names, sys.argv[3])
+postprocess = RCNNPostprocess("label_list.txt", "output")
 client = Client()
+
 client.load_client_config(sys.argv[1])
 client.connect(['127.0.0.1:9494'])
-fetch_map = client.predict(feed=feed_dict, fetch=fetch_var_names)
-outs = fetch_map.values()
-pddet.postprocess(fetch_map, fetch_var_names)
+
+im = preprocess(sys.argv[3])
+fetch_map = client.predict(
+    feed={
+        "image": im,
+        "im_info": np.array(list(im.shape[1:]) + [1.0]),
+        "im_shape": np.array(list(im.shape[1:]) + [1.0])
+    },
+    fetch=["multiclass_nms"])
+fetch_map["image"] = sys.argv[3]
+postprocess(fetch_map)
--- a/python/examples/fit_a_line/test_multi_process_client.py
+++ b/python/examples/fit_a_line/test_multi_process_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+from paddle_serving_client.utils import MultiThreadRunner
+import paddle
+
+
+def single_func(idx, resource):
+    client = Client()
+    client.load_client_config(
+        "./uci_housing_client/serving_client_conf.prototxt")
+    client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
+    x = [
+        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
+        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
+    ]
+    for i in range(1000):
+        fetch_map = client.predict(feed={"x": x}, fetch=["price"])
+        if fetch_map is None:
+            return [[None]]
+    return [[0]]
+
+
+multi_thread_runner = MultiThreadRunner()
+thread_num = 4
+result = multi_thread_runner.run(single_func, thread_num, {})
+if None in result[0]:
+    exit(1)
--- a/python/examples/imagenet/README.md
+++ b/python/examples/imagenet/README.md
@@ -8,34 +8,42 @@ The example uses the ResNet50_vd model to perform the imagenet 1000 classificati
 ```
 sh get_model.sh
 ```
-### HTTP Infer
+
+### Install preprocess module
+
+```
+pip install paddle_serving_app
+```
+
+### HTTP Service

 launch server side
 ```
-python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu inference service
+python resnet50_web_service.py ResNet50_vd_model cpu 9696 #cpu inference service
 ```
 ```
-python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu inference service
+python resnet50_web_service.py ResNet50_vd_model gpu 9696 #gpu inference service
 ```


 client send inference request
 ```
-python image_http_client.py
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9696/image/prediction
 ```
-### RPC Infer
+
+### RPC Service

 launch server side
 ```
-python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu inference service
+python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9696 #cpu inference service
 ```

 ```
-python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu inference service
+python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9696 --gpu_ids 0 #gpu inference service
 ```

 client send inference request
 ```
-python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
+python resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
 ```
-*the port of server side in this example is 9393, the sample data used by client side is in the folder ./data. These parameter can be modified in practice*
+*the port of server side in this example is 9696
--- a/python/examples/imagenet/README_CN.md
+++ b/python/examples/imagenet/README_CN.md
@@ -8,34 +8,42 @@
 ```
 sh get_model.sh
 ```
-### 执行HTTP预测服务
+
+### 安装数据预处理模块
+
+```
+pip install paddle_serving_app
+```
+
+### HTTP服务

 启动server端
 ```
-python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu预测服务
+python resnet50_web_service.py ResNet50_vd_model cpu 9696 #cpu预测服务
 ```
 ```
-python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu预测服务
+python resnet50_web_service.py ResNet50_vd_model gpu 9696 #gpu预测服务
 ```


-client端进行预测
+发送HTTP POST请求
 ```
-python image_http_client.py
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9696/image/prediction
 ```
-### 执行RPC预测服务
+
+### RPC服务

 启动server端
 ```
-python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu预测服务
+python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9696 #cpu预测服务
 ```

 ```
-python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu预测服务
+python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9696 --gpu_ids 0 #gpu预测服务
 ```

 client端进行预测
 ```
-python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
+python resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
 ```
-*server端示例中服务端口为9393端口，client端示例中数据来自./data文件夹，server端地址为本地9393端口，可根据实际情况更改脚本。*
+*server端示例中服务端口为9696端口
--- a/python/examples/imagenet/benchmark.py
+++ b/python/examples/imagenet/benchmark.py
@@ -19,15 +19,22 @@ from __future__ import unicode_literals, absolute_import
 import os
 import sys
 import time
+import requests
+import json
+import base64
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import benchmark_args
-import requests
-import json
-from image_reader import ImageReader
+from paddle_serving_app.reader import Sequential, URL2Image, Resize
+from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize

 args = benchmark_args()

+seq_preprocess = Sequential([
+    URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+

 def single_func(idx, resource):
    file_list = []
@@ -36,6 +43,10 @@ def single_func(idx, resource):
    img_list = []
    for i in range(1000):
        img_list.append(open("./image_data/n01440764/" + file_list[i]).read())
+    profile_flags = False
+    if "FLAGS_profile_client" in os.environ and os.environ[
+            "FLAGS_profile_client"]:
+        profile_flags = True
    if args.request == "rpc":
        reader = ImageReader()
        fetch = ["score"]
@@ -46,16 +57,36 @@ def single_func(idx, resource):
        for i in range(1000):
            if args.batch_size >= 1:
                feed_batch = []
+                i_start = time.time()
                for bi in range(args.batch_size):
-                    img = reader.process_image(img_list[i])
-                    img = img.reshape(-1)
+                    img = seq_preprocess(img_list[i])
                    feed_batch.append({"image": img})
+                i_end = time.time()
+                if profile_flags:
+                    print("PROFILE\tpid:{}\timage_pre_0:{} image_pre_1:{}".
+                          format(os.getpid(),
+                                 int(round(i_start * 1000000)),
+                                 int(round(i_end * 1000000))))
+
                result = client.predict(feed=feed_batch, fetch=fetch)
            else:
                print("unsupport batch size {}".format(args.batch_size))

    elif args.request == "http":
-        raise ("no batch predict for http")
+        py_version = 2
+        server = "http://" + resource["endpoint"][idx % len(resource[
+            "endpoint"])] + "/image/prediction"
+        start = time.time()
+        for i in range(1000):
+            if py_version == 2:
+                image = base64.b64encode(
+                    open("./image_data/n01440764/" + file_list[i]).read())
+            else:
+                image = base64.b64encode(open(image_path, "rb").read()).decode(
+                    "utf-8")
+            req = json.dumps({"feed": [{"image": image}], "fetch": ["score"]})
+            r = requests.post(
+                server, data=req, headers={"Content-Type": "application/json"})
    end = time.time()
    return [[end - start]]


--- a/python/examples/imagenet/benchmark_batch.py.lprof
+++ b/python/examples/imagenet/benchmark_batch.py.lprof
--- a/python/examples/imagenet/daisy.jpg
+++ b/python/examples/imagenet/daisy.jpg
--- a/python/examples/imagenet/flower.jpg
+++ b/python/examples/imagenet/flower.jpg
--- a/python/examples/imagenet/image_reader.py
+++ b/python/examples/imagenet/image_reader.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import cv2
-import numpy as np
-
-
-class ImageReader():
-    def __init__(self):
-        self.image_mean = [0.485, 0.456, 0.406]
-        self.image_std = [0.229, 0.224, 0.225]
-        self.image_shape = [3, 224, 224]
-        self.resize_short_size = 256
-        self.interpolation = None
-
-    def resize_short(self, img, target_size, interpolation=None):
-        """resize image
-
-        Args:
-            img: image data
-            target_size: resize short target size
-            interpolation: interpolation mode
-
-        Returns:
-            resized image data
-        """
-        percent = float(target_size) / min(img.shape[0], img.shape[1])
-        resized_width = int(round(img.shape[1] * percent))
-        resized_height = int(round(img.shape[0] * percent))
-        if interpolation:
-            resized = cv2.resize(
-                img, (resized_width, resized_height),
-                interpolation=interpolation)
-        else:
-            resized = cv2.resize(img, (resized_width, resized_height))
-        return resized
-
-    def crop_image(self, img, target_size, center):
-        """crop image
-
-        Args:
-            img: images data
-            target_size: crop target size
-            center: crop mode
-
-        Returns:
-            img: cropped image data
-        """
-        height, width = img.shape[:2]
-        size = target_size
-        if center == True:
-            w_start = (width - size) // 2
-            h_start = (height - size) // 2
-        else:
-            w_start = np.random.randint(0, width - size + 1)
-            h_start = np.random.randint(0, height - size + 1)
-        w_end = w_start + size
-        h_end = h_start + size
-        img = img[h_start:h_end, w_start:w_end, :]
-        return img
-
-    def process_image(self, sample):
-        """ process_image """
-        mean = self.image_mean
-        std = self.image_std
-        crop_size = self.image_shape[1]
-
-        data = np.fromstring(sample, np.uint8)
-        img = cv2.imdecode(data, cv2.IMREAD_COLOR)
-
-        if img is None:
-            print("img is None, pass it.")
-            return None
-
-        if crop_size > 0:
-            target_size = self.resize_short_size
-            img = self.resize_short(
-                img, target_size, interpolation=self.interpolation)
-            img = self.crop_image(img, target_size=crop_size, center=True)
-
-        img = img[:, :, ::-1]
-
-        img = img.astype('float32').transpose((2, 0, 1)) / 255
-        img_mean = np.array(mean).reshape((3, 1, 1))
-        img_std = np.array(std).reshape((3, 1, 1))
-        img -= img_mean
-        img /= img_std
-        return img
--- a/python/examples/imagenet/imagenet.label
+++ b/python/examples/imagenet/imagenet.label
--- a/python/examples/imagenet/resnet50_rpc_client.py
+++ b/python/examples/imagenet/resnet50_rpc_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, URL2Image, Resize
+from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
+import time
+
+client = Client()
+client.load_client_config(sys.argv[1])
+client.connect(["127.0.0.1:9696"])
+
+label_dict = {}
+label_idx = 0
+with open("imagenet.label") as fin:
+    for line in fin:
+        label_dict[label_idx] = line.strip()
+        label_idx += 1
+
+seq = Sequential([
+    URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+
+start = time.time()
+image_file = "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"
+for i in range(10):
+    img = seq(image_file)
+    fetch_map = client.predict(feed={"image": img}, fetch=["score"])
+    prob = max(fetch_map["score"][0])
+    label = label_dict[fetch_map["score"][0].tolist().index(prob)].strip(
+    ).replace(",", "")
+    print("prediction: {}, probability: {}".format(label, prob))
+
+end = time.time()
+print(end - start)
--- a/python/examples/imagenet/resnet50_web_service.py
+++ b/python/examples/imagenet/resnet50_web_service.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import sys
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, URL2Image, Resize, CenterCrop, RGB2BGR, Transpose, Div, Normalize
+
+if len(sys.argv) != 4:
+    print("python resnet50_web_service.py model device port")
+    sys.exit(-1)
+
+device = sys.argv[2]
+
+if device == "cpu":
+    from paddle_serving_server.web_service import WebService
+else:
+    from paddle_serving_server_gpu.web_service import WebService
+
+
+class ImageService(WebService):
+    def init_imagenet_setting(self):
+        self.seq = Sequential([
+            URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose(
+                (2, 0, 1)), Div(255), Normalize([0.485, 0.456, 0.406],
+                                                [0.229, 0.224, 0.225], True)
+        ])
+        self.label_dict = {}
+        label_idx = 0
+        with open("imagenet.label") as fin:
+            for line in fin:
+                self.label_dict[label_idx] = line.strip()
+                label_idx += 1
+
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        for ins in feed:
+            if "image" not in ins:
+                raise ("feed data error!")
+            img = self.seq(ins["image"])
+            feed_batch.append({"image": img})
+        return feed_batch, fetch
+
+    def postprocess(self, feed=[], fetch=[], fetch_map={}):
+        score_list = fetch_map["score"]
+        result = {"label": [], "prob": []}
+        for score in score_list:
+            max_score = max(score)
+            result["label"].append(self.label_dict[score.index(max_score)]
+                                   .strip().replace(",", ""))
+            result["prob"].append(max_score)
+        return result
+
+
+image_service = ImageService(name="image")
+image_service.load_model_config(sys.argv[1])
+image_service.init_imagenet_setting()
+if device == "gpu":
+    image_service.set_gpus("0,1")
+image_service.prepare_server(
+    workdir="workdir", port=int(sys.argv[3]), device=device)
+image_service.run_rpc_service()
+image_service.run_web_service()
--- a/python/examples/imdb/README.md
+++ b/python/examples/imdb/README.md
@@ -30,27 +30,3 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
 ```
 curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
 ```
-
-### Benchmark
-
-CPU ：Intel(R) Xeon(R)  Gold 6271 CPU @ 2.60GHz * 48
-
-Model ：[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
-
-server thread num ： 16
-
-In this test, client sends 25000 test samples totally, the bar chart given later is the latency of single thread, the unit is second, from which we know the predict efficiency is improved greatly by multi-thread compared to single-thread. 8.7 times improvement is made by 16 threads prediction.
-
-| client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | total |
-| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
-| 1                  | 1.09   | 28.79        | 0.094  | 20.59 | 0.047  | 0.034   | 31.41 |
-| 4                  | 0.22   | 7.41         | 0.023  | 5.01  | 0.011  | 0.0098  | 8.01  |
-| 8                  | 0.11   | 4.7          | 0.012  | 2.61  | 0.0062 | 0.0049  | 5.01  |
-| 12                 | 0.081  | 4.69         | 0.0078 | 1.72  | 0.0042 | 0.0035  | 4.91  |
-| 16                 | 0.058  | 3.46         | 0.0061 | 1.32  | 0.0033 | 0.003   | 3.63  |
-| 20                 | 0.049  | 3.77         | 0.0047 | 1.03  | 0.0025 | 0.0022  | 3.91  |
-| 24                 | 0.041  | 3.86         | 0.0039 | 0.85  | 0.002  | 0.0017  | 3.98  |
-
-The thread-latency bar chart is as follow：
-
-![total cost](../../../doc/imdb-benchmark-server-16.png)
--- a/python/examples/imdb/README_CN.md
+++ b/python/examples/imdb/README_CN.md
@@ -29,27 +29,3 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
 ```
 curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
 ```
-
-### Benchmark
-
-设备 ：Intel(R) Xeon(R)  Gold 6271 CPU @ 2.60GHz * 48
-
-模型 ：[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
-
-server thread num ： 16
-
-测试中，client共发送25000条测试样本，图中数据为单个线程的耗时，时间单位为秒。可以看出，client端多线程的预测速度相比单线程有明显提升，在16线程时预测速度是单线程的8.7倍。
-
-| client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | total |
-| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
-| 1                  | 1.09   | 28.79        | 0.094  | 20.59 | 0.047  | 0.034   | 31.41 |
-| 4                  | 0.22   | 7.41         | 0.023  | 5.01  | 0.011  | 0.0098  | 8.01  |
-| 8                  | 0.11   | 4.7          | 0.012  | 2.61  | 0.0062 | 0.0049  | 5.01  |
-| 12                 | 0.081  | 4.69         | 0.0078 | 1.72  | 0.0042 | 0.0035  | 4.91  |
-| 16                 | 0.058  | 3.46         | 0.0061 | 1.32  | 0.0033 | 0.003   | 3.63  |
-| 20                 | 0.049  | 3.77         | 0.0047 | 1.03  | 0.0025 | 0.0022  | 3.91  |
-| 24                 | 0.041  | 3.86         | 0.0039 | 0.85  | 0.002  | 0.0017  | 3.98  |
-
-预测总耗时变化规律如下：
-
-![total cost](../../../doc/imdb-benchmark-server-16.png)
--- a/python/examples/imdb/benchmark.py
+++ b/python/examples/imdb/benchmark.py
@@ -16,7 +16,7 @@
 import sys
 import time
 import requests
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import benchmark_args
@@ -37,26 +37,39 @@ def single_func(idx, resource):
        client.load_client_config(args.model)
        client.connect([args.endpoint])
        for i in range(1000):
-            if args.batch_size == 1:
-                word_ids, label = imdb_dataset.get_words_and_label(line)
-                fetch_map = client.predict(
-                    feed={"words": word_ids}, fetch=["prediction"])
+            if args.batch_size >= 1:
+                feed_batch = []
+                for bi in range(args.batch_size):
+                    word_ids, label = imdb_dataset.get_words_and_label(dataset[
+                        bi])
+                    feed_batch.append({"words": word_ids})
+                result = client.predict(feed=feed_batch, fetch=["prediction"])
+                if result is None:
+                    raise ("predict failed.")
            else:
                print("unsupport batch size {}".format(args.batch_size))

    elif args.request == "http":
-        for fn in filelist:
-            fin = open(fn)
-            for line in fin:
-                word_ids, label = imdb_dataset.get_words_and_label(line)
-                r = requests.post(
-                    "http://{}/imdb/prediction".format(args.endpoint),
-                    data={"words": word_ids,
-                          "fetch": ["prediction"]})
+        if args.batch_size >= 1:
+            feed_batch = []
+            for bi in range(args.batch_size):
+                feed_batch.append({"words": dataset[bi]})
+            r = requests.post(
+                "http://{}/imdb/prediction".format(args.endpoint),
+                json={"feed": feed_batch,
+                      "fetch": ["prediction"]})
+            if r.status_code != 200:
+                print('HTTP status code -ne 200')
+                raise ("predict failed.")
+        else:
+            print("unsupport batch size {}".format(args.batch_size))
    end = time.time()
    return [[end - start]]


 multi_thread_runner = MultiThreadRunner()
 result = multi_thread_runner.run(single_func, args.thread, {})
-print(result)
+avg_cost = 0
+for cost in result[0]:
+    avg_cost += cost
+print("total cost {} s of each thread".format(avg_cost / args.thread))
--- a/python/examples/imdb/benchmark.sh
+++ b/python/examples/imdb/benchmark.sh
 rm profile_log
 for thread_num in 1 2 4 8 16
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --model imdbo_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
+for batch_size in 1 2 4 8 16 32 64 128 256 512
+do
+    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model imdb_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
    echo "========================================"
    echo "batch size : $batch_size" >> profile_log
    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
    tail -n 1 profile >> profile_log
 done
+done
--- a/python/examples/imdb/benchmark_batch.py
+++ b/python/examples/imdb/benchmark_batch.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import sys
-import time
-import requests
-from imdb_reader import IMDBDataset
-from paddle_serving_client import Client
-from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
-
-args = benchmark_args()
-
-
-def single_func(idx, resource):
-    imdb_dataset = IMDBDataset()
-    imdb_dataset.load_resource("./imdb.vocab")
-    dataset = []
-    with open("./test_data/part-0") as fin:
-        for line in fin:
-            dataset.append(line.strip())
-    start = time.time()
-    if args.request == "rpc":
-        client = Client()
-        client.load_client_config(args.model)
-        client.connect([args.endpoint])
-        for i in range(1000):
-            if args.batch_size >= 1:
-                feed_batch = []
-                for bi in range(args.batch_size):
-                    word_ids, label = imdb_dataset.get_words_and_label(dataset[
-                        bi])
-                    feed_batch.append({"words": word_ids})
-                result = client.predict(feed=feed_batch, fetch=["prediction"])
-                if result is None:
-                    raise ("predict failed.")
-            else:
-                print("unsupport batch size {}".format(args.batch_size))
-
-    elif args.request == "http":
-        if args.batch_size >= 1:
-            feed_batch = []
-            for bi in range(args.batch_size):
-                feed_batch.append({"words": dataset[bi]})
-            r = requests.post(
-                "http://{}/imdb/prediction".format(args.endpoint),
-                json={"feed": feed_batch,
-                      "fetch": ["prediction"]})
-            if r.status_code != 200:
-                print('HTTP status code -ne 200')
-                raise ("predict failed.")
-        else:
-            print("unsupport batch size {}".format(args.batch_size))
-    end = time.time()
-    return [[end - start]]
-
-
-multi_thread_runner = MultiThreadRunner()
-result = multi_thread_runner.run(single_func, args.thread, {})
-avg_cost = 0
-for cost in result[0]:
-    avg_cost += cost
-print("total cost {} s of each thread".format(avg_cost / args.thread))
--- a/python/examples/imdb/benchmark_batch.sh
+++ b/python/examples/imdb/benchmark_batch.sh
-rm profile_log
-for thread_num in 1 2 4 8 16
-do
-for batch_size in 1 2 4 8 16 32 64 128 256 512
-do
-    $PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model imdb_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
-    echo "batch size : $batch_size" >> profile_log
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
-done
-done
--- a/python/examples/imdb/test_client.py
+++ b/python/examples/imdb/test_client.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
 from paddle_serving_client import Client
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 import sys

 client = Client()

--- a/python/examples/imdb/test_client_batch.py
+++ b/python/examples/imdb/test_client_batch.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from paddle_serving_client import Client
-import sys
-import subprocess
-from multiprocessing import Pool
-import time
-
-
-def batch_predict(batch_size=4):
-    client = Client()
-    client.load_client_config(conf_file)
-    client.connect(["127.0.0.1:9292"])
-    fetch = ["acc", "cost", "prediction"]
-    feed_batch = []
-    for line in sys.stdin:
-        group = line.strip().split()
-        words = [int(x) for x in group[1:int(group[0])]]
-        label = [int(group[-1])]
-        feed = {"words": words, "label": label}
-        feed_batch.append(feed)
-        if len(feed_batch) == batch_size:
-            fetch_batch = client.batch_predict(
-                feed_batch=feed_batch, fetch=fetch)
-            for i in range(batch_size):
-                print("{} {}".format(fetch_batch[i]["prediction"][1],
-                                     feed_batch[i]["label"][0]))
-            feed_batch = []
-    if len(feed_batch) > 0:
-        fetch_batch = client.batch_predict(feed_batch=feed_batch, fetch=fetch)
-        for i in range(len(feed_batch)):
-            print("{} {}".format(fetch_batch[i]["prediction"][1], feed_batch[i][
-                "label"][0]))
-
-
-if __name__ == '__main__':
-    conf_file = sys.argv[1]
-    batch_size = int(sys.argv[2])
-    batch_predict(batch_size)
--- a/python/examples/imdb/text_classify_service.py
+++ b/python/examples/imdb/text_classify_service.py
@@ -14,7 +14,7 @@
 # pylint: disable=doc-string-missing

 from paddle_serving_server.web_service import WebService
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 import sys


@@ -37,5 +37,5 @@ imdb_service.load_model_config(sys.argv[1])
 imdb_service.prepare_server(
    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
 imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
-imdb_service.run_server()
-imdb_service.run_flask()
+imdb_service.run_rpc_service()
+imdb_service.run_web_service()
--- a/python/examples/lac/README.md
+++ b/python/examples/lac/README.md
@@ -2,28 +2,27 @@

 ([简体中文](./README_CN.md)|English)

-### Get model files and sample data
+### Get Model
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
 ```

-the package downloaded contains lac model config along with lac dictionary.
-
 #### Start RPC inference service

 ```
-python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
+python -m paddle_serving_server.serve --model lac_model/ --port 9292
 ```
 ### RPC Infer
 ```
-echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
+echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
 ```

-it will get the segmentation result
+It will get the segmentation result. 

 ### Start HTTP inference service
 ```
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
+python lac_web_service.py lac_model/ lac_workdir 9292
 ```
 ### HTTP Infer


--- a/python/examples/lac/README_CN.md
+++ b/python/examples/lac/README_CN.md
@@ -2,28 +2,27 @@

 (简体中文|[English](./README.md))

-### 获取模型和字典文件
+### 获取模型
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
 ```

-下载包里包含了lac模型和lac模型预测需要的字典文件
-
 #### 开启RPC预测服务

 ```
-python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
+python -m paddle_serving_server.serve --model lac_model/ --port 9292
 ```
 ### 执行RPC预测
 ```
-echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
+echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
 ```

 我们就能得到分词结果

 ### 开启HTTP预测服务
 ```
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
+python lac_web_service.py lac_model/ lac_workdir 9292
 ```
 ### 执行HTTP预测


--- a/python/examples/lac/benchmark.py
+++ b/python/examples/lac/benchmark.py
@@ -16,7 +16,7 @@
 import sys
 import time
 import requests
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import benchmark_args
@@ -25,7 +25,7 @@ args = benchmark_args()


 def single_func(idx, resource):
-    reader = LACReader("lac_dict")
+    reader = LACReader()
    start = time.time()
    if args.request == "rpc":
        client = Client()

--- a/python/examples/lac/lac_client.py
+++ b/python/examples/lac/lac_client.py
@@ -15,7 +15,7 @@
 # pylint: disable=doc-string-missing

 from paddle_serving_client import Client
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 import sys
 import os
 import io
@@ -24,7 +24,7 @@ client = Client()
 client.load_client_config(sys.argv[1])
 client.connect(["127.0.0.1:9292"])

-reader = LACReader(sys.argv[2])
+reader = LACReader()
 for line in sys.stdin:
    if len(line) <= 0:
        continue
@@ -32,4 +32,7 @@ for line in sys.stdin:
    if len(feed_data) <= 0:
        continue
    fetch_map = client.predict(feed={"words": feed_data}, fetch=["crf_decode"])
-    print(fetch_map)
+    begin = fetch_map['crf_decode.lod'][0]
+    end = fetch_map['crf_decode.lod'][1]
+    segs = reader.parse_result(line, fetch_map["crf_decode"][begin:end])
+    print("word_seg: " + "|".join(str(words) for words in segs))
--- a/python/examples/lac/lac_reader.py
+++ b/python/examples/lac/lac_reader.py
@@ -14,8 +14,10 @@

 from paddle_serving_client import Client
 import sys
-reload(sys)
-sys.setdefaultencoding('utf-8')
+py_version = sys.version_info[0]
+if py_version == 2:
+    reload(sys)
+    sys.setdefaultencoding('utf-8')
 import os
 import io


--- a/python/examples/lac/lac_web_service.py
+++ b/python/examples/lac/lac_web_service.py
@@ -14,12 +14,12 @@

 from paddle_serving_server.web_service import WebService
 import sys
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader


 class LACService(WebService):
    def load_reader(self):
-        self.reader = LACReader("lac_dict")
+        self.reader = LACReader()

    def preprocess(self, feed={}, fetch=[]):
        feed_batch = []
@@ -47,5 +47,5 @@ lac_service.load_model_config(sys.argv[1])
 lac_service.load_reader()
 lac_service.prepare_server(
    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-lac_service.run_server()
-lac_service.run_flask()
+lac_service.run_rpc_service()
+lac_service.run_web_service()
--- a/python/examples/mobilenet/README.md
+++ b/python/examples/mobilenet/README.md
+# Image Classification
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
+tar -xzvf mobilenet_v2_imagenet.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### Client Prediction
+
+```
+python mobilenet_tutorial.py
+```
--- a/python/examples/mobilenet/README_CN.md
+++ b/python/examples/mobilenet/README_CN.md
+# 图像分类
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
+tar -xzvf mobilenet_v2_imagenet.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### 客户端预测
+
+```
+python mobilenet_tutorial.py
+```
--- a/python/examples/mobilenet/daisy.jpg
+++ b/python/examples/mobilenet/daisy.jpg
--- a/python/examples/mobilenet/mobilenet_tutorial.py
+++ b/python/examples/mobilenet/mobilenet_tutorial.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, Resize
+from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
+
+client = Client()
+client.load_client_config(
+    "mobilenet_v2_imagenet_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])
+
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"image": img}, fetch=["feature_map"])
+print(fetch_map["feature_map"].reshape(-1))
--- a/python/examples/ocr/README.md
+++ b/python/examples/ocr/README.md
+# OCR 
+
+## Get Model
+```
+python -m paddle_serving_app.package --get_model ocr_rec
+tar -xzvf ocr_rec.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server.serve --model ocr_rec_model --port 9292
+```
+
+### Client Prediction
+
+```
+python test_ocr_rec_client.py
+```
--- a/python/examples/imagenet/image_rpc_client.py
+++ b/python/examples/imagenet/image_rpc_client.py
@@ -12,23 +12,20 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-import sys
-from image_reader import ImageReader
 from paddle_serving_client import Client
-import time
+from paddle_serving_app.reader import OCRReader
+import cv2

 client = Client()
-client.load_client_config(sys.argv[1])
-client.connect(["127.0.0.1:9393"])
-reader = ImageReader()
+client.load_client_config("ocr_rec_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])

-start = time.time()
-for i in range(1000):
-    with open("./data/n01440764_10026.JPEG", "rb") as f:
-        img = f.read()
-    img = reader.process_image(img)
-    fetch_map = client.predict(feed={"image": img}, fetch=["score"])
-end = time.time()
-print(end - start)
-
-#print(fetch_map["score"])
+image_file_list = ["./test_rec.jpg"]
+img = cv2.imread(image_file_list[0])
+ocr_reader = OCRReader()
+feed = {"image": ocr_reader.preprocess([img])}
+fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
+fetch_map = client.predict(feed=feed, fetch=fetch)
+rec_res = ocr_reader.postprocess(fetch_map)
+print(image_file_list[0])
+print(rec_res[0][0])
--- a/python/examples/ocr/test_rec.jpg
+++ b/python/examples/ocr/test_rec.jpg
--- a/python/examples/resnet_v2_50/README.md
+++ b/python/examples/resnet_v2_50/README.md
+# Image Classification
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### Client Prediction
+
+```
+python resnet50_v2_tutorial.py
+```
--- a/python/examples/resnet_v2_50/README_CN.md
+++ b/python/examples/resnet_v2_50/README_CN.md
+# 图像分类
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### 客户端预测
+
+```
+python resnet50_v2_tutorial.py
+```
--- a/python/examples/resnet_v2_50/daisy.jpg
+++ b/python/examples/resnet_v2_50/daisy.jpg
--- a/python/examples/resnet_v2_50/resnet50_debug.py
+++ b/python/examples/resnet_v2_50/resnet50_debug.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+from paddle_serving_app.local_predict import Debugger
+import sys
+
+debugger = Debugger()
+debugger.load_model_config(sys.argv[1], gpu=True)
+
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = debugger.predict(feed={"image": img}, fetch=["feature_map"])
+print(fetch_map["feature_map"].reshape(-1))
--- a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
+++ b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+
+client = Client()
+client.load_client_config(
+    "resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9393"])
+
+seq = Sequential([
+    File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
+    Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
+])
+
+image_file = "daisy.jpg"
+img = seq(image_file)
+fetch_map = client.predict(feed={"image": img}, fetch=["score"])
+print(fetch_map["score"].reshape(-1))
--- a/python/examples/senta/README.md
+++ b/python/examples/senta/README.md
-# Chinese sentence sentiment classification
+# Chinese Sentence Sentiment Classification
 ([简体中文](./README_CN.md)|English)
-## Get model files and sample data
+
+## Get Model
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model senta_bilstm
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf senta_bilstm.tar.gz
+tar -xzvf lac.tar.gz
 ```
-## Start http service
+
+## Start HTTP Service
 ```
-python senta_web_service.py senta_bilstm_model/ workdir 9292
+python -m paddle_serving_server.serve --model lac_model --port 9300
+python senta_web_service.py
 ```
-In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). Set model path by ```lac_model_path``` and dictionary path by ```lac_dict_path```. 
-In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task. The LAC prediction service is deployed on the CPU, and the sentiment classification task is deployed on the GPU, which can be changed according to the actual situation.
+In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). 
+In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task.
+
 ## Client prediction
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
--- a/python/examples/senta/README_CN.md
+++ b/python/examples/senta/README_CN.md
 # 中文语句情感分类
 (简体中文|[English](./README.md))
-## 获取模型文件和样例数据
+
+## 获取模型文件
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model senta_bilstm
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
+tar -xzvf senta_bilstm.tar.gz
 ```
+
 ## 启动HTTP服务
 ```
-python senta_web_service.py senta_bilstm_model/ workdir 9292
+python -m paddle_serving_server.serve --model lac_model --port 9300
+python senta_web_service.py
 ```
-中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词，在脚本中通过```lac_model_path```参数配置LAC任务的模型文件路径,```lac_dict_path```参数配置LAC任务词典路径。
-示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分，LAC预测服务部署在CPU上，情感分类任务部署在GPU上,可以根据实际情况进行更改。
+中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词。
+示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分。

 ## 客户端预测
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
--- a/python/examples/senta/get_data.sh
+++ b/python/examples/senta/get_data.sh
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/senta_bilstm.tar.gz --no-check-certificate
 tar -xzvf senta_bilstm.tar.gz
-wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/lac_model.tar.gz --no-check-certificate
-tar -xzvf lac_model.tar.gz
+wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/lac.tar.gz --no-check-certificate
+tar -xzvf lac.tar.gz
 wget https://paddle-serving.bj.bcebos.com/reader/lac/lac_dict.tar.gz  --no-check-certificate
 tar -xzvf lac_dict.tar.gz
 wget https://paddle-serving.bj.bcebos.com/reader/senta/vocab.txt --no-check-certificate
--- a/python/examples/senta/senta_web_service.py
+++ b/python/examples/senta/senta_web_service.py
+#encoding=utf-8
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -12,97 +13,49 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from paddle_serving_server_gpu.web_service import WebService
+from paddle_serving_server.web_service import WebService
 from paddle_serving_client import Client
-from paddle_serving_app import LACReader, SentaReader
-import numpy as np
+from paddle_serving_app.reader import LACReader, SentaReader
 import os
-import io
 import sys
-import subprocess
-from multiprocessing import Process, Queue

+#senta_web_service.py
+from paddle_serving_server.web_service import WebService
+from paddle_serving_client import Client
+from paddle_serving_app.reader import LACReader, SentaReader

-class SentaService(WebService):
-    def set_config(
-            self,
-            lac_model_path,
-            lac_dict_path,
-            senta_dict_path, ):
-        self.lac_model_path = lac_model_path
-        self.lac_client_config_path = lac_model_path + "/serving_server_conf.prototxt"
-        self.lac_dict_path = lac_dict_path
-        self.senta_dict_path = senta_dict_path
-        self.show = False
-
-    def show_detail(self, show=False):
-        self.show = show
-
-    def start_lac_service(self):
-        if not os.path.exists('./lac_serving'):
-            os.mkdir("./lac_serving")
-        os.chdir('./lac_serving')
-        self.lac_port = self.port + 100
-        r = os.popen(
-            "python -m paddle_serving_server.serve --model {} --port {} &".
-            format("../" + self.lac_model_path, self.lac_port))
-        os.chdir('..')
-
-    def init_lac_service(self):
-        ps = Process(target=self.start_lac_service())
-        ps.start()
-        #self.init_lac_client()

-    def lac_predict(self, feed_data):
-        self.init_lac_client()
-        lac_result = self.lac_client.predict(
-            feed={"words": feed_data}, fetch=["crf_decode"])
-        self.lac_client.release()
-        return lac_result
-
-    def init_lac_client(self):
+class SentaService(WebService):
+    #初始化lac模型预测服务
+    def init_lac_client(self, lac_port, lac_client_config):
+        self.lac_reader = LACReader()
+        self.senta_reader = SentaReader()
        self.lac_client = Client()
-        self.lac_client.load_client_config(self.lac_client_config_path)
-        self.lac_client.connect(["127.0.0.1:{}".format(self.lac_port)])
-
-    def init_lac_reader(self):
-        self.lac_reader = LACReader(self.lac_dict_path)
-
-    def init_senta_reader(self):
-        self.senta_reader = SentaReader(vocab_path=self.senta_dict_path)
+        self.lac_client.load_client_config(lac_client_config)
+        self.lac_client.connect(["127.0.0.1:{}".format(lac_port)])

+    #定义senta模型预测服务的预处理，调用顺序：lac reader->lac模型预测->预测结果后处理->senta reader
    def preprocess(self, feed=[], fetch=[]):
-        feed_data = self.lac_reader.process(feed[0]["words"])
-        if self.show:
-            print("---- lac reader ----")
-            print(feed_data)
-        lac_result = self.lac_predict(feed_data)
-        if self.show:
-            print("---- lac out ----")
-            print(lac_result)
-        segs = self.lac_reader.parse_result(feed[0]["words"],
-                                            lac_result["crf_decode"])
-        if self.show:
-            print("---- lac parse ----")
-            print(segs)
-        feed_data = self.senta_reader.process(segs)
-        if self.show:
-            print("---- senta reader ----")
-            print("feed_data", feed_data)
-        return [{"words": feed_data}], fetch
+        feed_data = [{
+            "words": self.lac_reader.process(x["words"])
+        } for x in feed]
+        lac_result = self.lac_client.predict(
+            feed=feed_data, fetch=["crf_decode"])
+        feed_batch = []
+        result_lod = lac_result["crf_decode.lod"]
+        for i in range(len(feed)):
+            segs = self.lac_reader.parse_result(
+                feed[i]["words"],
+                lac_result["crf_decode"][result_lod[i]:result_lod[i + 1]])
+            feed_data = self.senta_reader.process(segs)
+            feed_batch.append({"words": feed_data})
+        return feed_batch, fetch


 senta_service = SentaService(name="senta")
-#senta_service.show_detail(True)
-senta_service.set_config(
-    lac_model_path="./lac_model",
-    lac_dict_path="./lac_dict",
-    senta_dict_path="./vocab.txt")
-senta_service.load_model_config(sys.argv[1])
-senta_service.prepare_server(
-    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-senta_service.init_lac_reader()
-senta_service.init_senta_reader()
-senta_service.init_lac_service()
-senta_service.run_server()
-senta_service.run_flask()
+senta_service.load_model_config("senta_bilstm_model")
+senta_service.prepare_server(workdir="workdir")
+senta_service.init_lac_client(
+    lac_port=9300, lac_client_config="lac_model/serving_server_conf.prototxt")
+senta_service.run_rpc_service()
+senta_service.run_web_service()
--- a/python/examples/unet_for_image_seg/N0060.jpg
+++ b/python/examples/unet_for_image_seg/N0060.jpg
--- a/python/examples/unet_for_image_seg/README.md
+++ b/python/examples/unet_for_image_seg/README.md
--- a/python/examples/unet_for_image_seg/README_CN.md
+++ b/python/examples/unet_for_image_seg/README_CN.md
--- a/python/examples/imagenet/image_classification_service_gpu.py
+++ b/python/examples/imagenet/image_classification_service_gpu.py
--- a/python/paddle_serving_app/README.md
+++ b/python/paddle_serving_app/README.md
--- a/python/paddle_serving_app/README_CN.md
+++ b/python/paddle_serving_app/README_CN.md
--- a/python/paddle_serving_app/__init__.py
+++ b/python/paddle_serving_app/__init__.py
--- a/python/paddle_serving_app/local_predict.py
+++ b/python/paddle_serving_app/local_predict.py
--- a/python/paddle_serving_app/models/model_list.py
+++ b/python/paddle_serving_app/models/model_list.py
--- a/python/paddle_serving_app/package.py
+++ b/python/paddle_serving_app/package.py
--- a/python/paddle_serving_app/reader/__init__.py
+++ b/python/paddle_serving_app/reader/__init__.py
--- a/python/paddle_serving_app/reader/daisy.jpg
+++ b/python/paddle_serving_app/reader/daisy.jpg
--- a/python/paddle_serving_app/reader/functional.py
+++ b/python/paddle_serving_app/reader/functional.py
--- a/python/paddle_serving_app/reader/image_reader.py
+++ b/python/paddle_serving_app/reader/image_reader.py
--- a/python/paddle_serving_app/reader/imdb_reader.py
+++ b/python/paddle_serving_app/reader/imdb_reader.py
--- a/python/paddle_serving_app/reader/lac_reader.py
+++ b/python/paddle_serving_app/reader/lac_reader.py
--- a/python/paddle_serving_app/reader/ocr_reader.py
+++ b/python/paddle_serving_app/reader/ocr_reader.py
--- a/python/paddle_serving_app/reader/senta_reader.py
+++ b/python/paddle_serving_app/reader/senta_reader.py
--- a/python/paddle_serving_app/reader/test_image_reader.py
+++ b/python/paddle_serving_app/reader/test_image_reader.py
--- a/python/paddle_serving_app/trace.py
+++ b/python/paddle_serving_app/trace.py
--- a/python/paddle_serving_app/version.py
+++ b/python/paddle_serving_app/version.py
--- a/python/paddle_serving_client/__init__.py
+++ b/python/paddle_serving_client/__init__.py
--- a/python/paddle_serving_client/io/__init__.py
+++ b/python/paddle_serving_client/io/__init__.py
--- a/python/paddle_serving_client/utils/__init__.py
+++ b/python/paddle_serving_client/utils/__init__.py
--- a/python/paddle_serving_client/version.py
+++ b/python/paddle_serving_client/version.py
--- a/python/paddle_serving_server/__init__.py
+++ b/python/paddle_serving_server/__init__.py
--- a/python/paddle_serving_server/monitor.py
+++ b/python/paddle_serving_server/monitor.py
--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
--- a/python/paddle_serving_server/version.py
+++ b/python/paddle_serving_server/version.py
--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
--- a/python/paddle_serving_server_gpu/__init__.py
+++ b/python/paddle_serving_server_gpu/__init__.py
--- a/python/paddle_serving_server_gpu/monitor.py
+++ b/python/paddle_serving_server_gpu/monitor.py
--- a/python/paddle_serving_server_gpu/serve.py
+++ b/python/paddle_serving_server_gpu/serve.py
--- a/python/paddle_serving_server_gpu/version.py
+++ b/python/paddle_serving_server_gpu/version.py
--- a/python/paddle_serving_server_gpu/web_service.py
+++ b/python/paddle_serving_server_gpu/web_service.py
--- a/python/setup.py.app.in
+++ b/python/setup.py.app.in
--- a/python/setup.py.client.in
+++ b/python/setup.py.client.in
--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
--- a/tools/Dockerfile
+++ b/tools/Dockerfile
--- a/tools/Dockerfile.centos6.devel
+++ b/tools/Dockerfile.centos6.devel
--- a/tools/Dockerfile.centos6.gpu.devel
+++ b/tools/Dockerfile.centos6.gpu.devel
--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
--- a/tools/Dockerfile.gpu
+++ b/tools/Dockerfile.gpu
--- a/tools/Dockerfile.gpu.devel
+++ b/tools/Dockerfile.gpu.devel
--- a/tools/python_tag.py
+++ b/tools/python_tag.py
--- a/tools/serving_build.sh
+++ b/tools/serving_build.sh