diff --git a/README.md b/README.md
index 2fdc83db96d1b418aabddabe35f9709e2c72f810..04634afebfc699708b681a99257eabc0898f7356 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,12 @@
+([简体中文](./README_CN.md)|English)
+
 <p align="center">
     <br>
 <img src='doc/serving_logo.png' width = "600" height = "130">
     <br>
 <p>
 
+
 <p align="center">
     <br>
     <a href="https://travis-ci.com/PaddlePaddle/Serving">
@@ -23,14 +26,6 @@ We consider deploying deep learning inference service online to be a user-facing
     <img src="doc/demo.gif" width="700">
 </p>
 
-<h2 align="center">Some Key Features</h2>
-
-- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
-- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
-- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
-- **Highly concurrent and efficient communication** between clients and servers supported.
-- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
-- **Extensible framework design** which can support model serving beyond Paddle.
 
 <h2 align="center">Installation</h2>
 
@@ -58,10 +53,42 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
 
 If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
 
-Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
+Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client.
+
+
+<h2 align="center"> Pre-built services with Paddle Serving</h2>
+
+<h3 align="center">Chinese Word Segmentation</h4>
+
+``` shell
+> python -m paddle_serving_app.package --get_model lac
+> tar -xzf lac.tar.gz
+> python lac_web_service.py lac_model/ lac_workdir 9393 &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
+{"result":[{"word_seg":"我|爱|北京|天安门"}]}
+```
+
+<h3 align="center">Image Classification</h4>
+
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+    
+``` shell
+> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+> tar -xzf resnet_v2_50_imagenet.tar.gz
+> python resnet50_imagenet_classify.py resnet50_serving_model &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
+```
+
 
 <h2 align="center">Quick Start Example</h2>
 
+This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
+
 ### Boston House Price Prediction model
 ``` shell
 wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
@@ -84,9 +111,9 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `port` | int | `9292` | Exposed port of current service to users|
 | `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
-| `mem_optim` | bool | `False` | Enable memory / graphic memory optimization |
-| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
-| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
+| `mem_optim` | - | - | Enable memory / graphic memory optimization |
+| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
+| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 
 Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
 </center>
@@ -117,138 +144,13 @@ print(fetch_map)
 ```
 Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
 
-<h2 align="center"> Pre-built services with Paddle Serving</h2>
-
-<h3 align="center">Chinese Word Segmentation</h4>
-
-- **Description**: 
-``` shell
-Chinese word segmentation HTTP service that can be deployed with one line command.
-```
-
-- **Download Servable Package**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-```
-- **Host web service**: 
-``` shell
-tar -xzf lac_model_jieba_web.tar.gz
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
-```
-- **Request sample**: 
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
-```
-- **Request result**: 
-``` shell
-{"word_seg":"我|爱|北京|天安门"}
-```
-
-<h3 align="center">Image Classification</h4>
-
-- **Description**: 
-``` shell
-Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
-Note: This demo needs paddle-serving-server-gpu. 
-```
-
-- **Download Servable Package**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
-```
-- **Host web service**: 
-``` shell
-tar -xzf imagenet_demo.tar.gz
-python image_classification_service_demo.py resnet50_serving_model
-```
-- **Request sample**: 
-
-<p align="center">
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-```
-- **Request result**: 
-``` shell
-{"label":"daisy","prob":0.9341403245925903}
-```
-
-<h3 align="center">More Demos</h3>
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Bert-Base-Baike                                              |
-| URL                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
-| Description        | Get semantic representation from a Chinese Sentence          |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Resnet50-Imagenet                                            |
-| URL                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| Description        | Get image semantic representation from an image              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Resnet101-Imagenet                                           |
-| URL                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| Description        | Get image semantic representation from an image              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | CNN-IMDB                                                     |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | LSTM-IMDB                                                    |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | BOW-IMDB                                                     |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Jieba-LAC                                                    |
-| URL                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
-| Description        | Get word segmentation from a Chinese Sentence                |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | DNN-CTR                                                      |
-| URL                | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz                            |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
-| Description        | Get click probability from a feature vector of item          |
+<h2 align="center">Some Key Features of Paddle Serving</h2>
 
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
+- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
+- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
+- **Highly concurrent and efficient communication** between clients and servers supported.
+- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
 
 <h2 align="center">Document</h2>
 
@@ -268,13 +170,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
 
 ### About Efficiency
 - [How to profile Paddle Serving latency?](python/examples/util)
-- [How to optimize performance?(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
+- [How to optimize performance?](doc/PERFORMANCE_OPTIM.md)
 - [Deploy multi-services on one GPU(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
 - [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
 - [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
 
 ### FAQ
-- [FAQ(Chinese)](doc/deprecated/FAQ.md)
+- [FAQ(Chinese)](doc/FAQ.md)
 
 
 ### Design
diff --git a/README_CN.md b/README_CN.md
index 547a50e7a430f3afb3a69eae211ed87cb248a268..7a42e6cd9c02fa6c51cba7a3228cd0916dd64de2 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -1,9 +1,12 @@
+(简体中文|[English](./README.md))
+
 <p align="center">
     <br>
 <img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130">
     <br>
 <p>
 
+
 <p align="center">
     <br>
     <a href="https://travis-ci.com/PaddlePaddle/Serving">
@@ -24,14 +27,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
     <img src="doc/demo.gif" width="700">
 </p>
 
-<h2 align="center">核心功能</h2>
 
-- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
-- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
-- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
-- 支持客户端和服务端之间 **高并发和高效通信**.
-- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
-- **可伸缩框架设计** 可支持不限于Paddle的模型服务.
 
 <h2 align="center">安装</h2>
 
@@ -59,9 +55,40 @@ pip install paddle-serving-server-gpu # GPU
 
 如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。
 
-客户端安装包支持Centos 7和Ubuntu 18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
+Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
+
+<h2 align="center"> Paddle Serving预装的服务 </h2>
+
+<h3 align="center">中文分词</h4>
+
+``` shell
+> python -m paddle_serving_app.package --get_model lac
+> tar -xzf lac.tar.gz
+> python lac_web_service.py lac_model/ lac_workdir 9393 &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
+{"result":[{"word_seg":"我|爱|北京|天安门"}]}
+```
+
+<h3 align="center">图像分类</h4>
+
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+    
+``` shell
+> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+> tar -xzf resnet_v2_50_imagenet.tar.gz
+> python resnet50_imagenet_classify.py resnet50_serving_model &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
+```
+
 
-<h2 align="center">快速启动示例</h2>
+<h2 align="center">快速开始示例</h2>
+
+这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的，而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程，请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)
 
 <h3 align="center">波士顿房价预测</h3>
 
@@ -88,9 +115,9 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `port` | int | `9292` | Exposed port of current service to users|
 | `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
-| `mem_optim` | bool | `False` | Enable memory optimization |
-| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
-| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
+| `mem_optim` | - | - | Enable memory optimization |
+| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
+| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 
 我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
 </center>
@@ -122,139 +149,13 @@ print(fetch_map)
 ```
 在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。
 
-<h2 align="center">Paddle Serving预装的服务</h2>
-
-<h3 align="center">中文分词模型</h4>
-
-- **介绍**: 
-``` shell
-本示例为中文分词HTTP服务一键部署
-```
-
-- **下载服务包**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-```
-- **启动web服务**: 
-``` shell
-tar -xzf lac_model_jieba_web.tar.gz
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
-```
-- **客户端请求示例**: 
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
-```
-- **返回结果示例**: 
-``` shell
-{"word_seg":"我|爱|北京|天安门"}
-```
-
-<h3 align="center">图像分类模型</h4>
-
-- **介绍**: 
-``` shell
-图像分类模型由Imagenet数据集训练而成，该服务会返回一个标签及其概率
-注意：本示例需要安装paddle-serving-server-gpu
-```
-
-- **下载服务包**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
-```
-- **启动web服务**: 
-``` shell
-tar -xzf imagenet_demo.tar.gz
-python image_classification_service_demo.py resnet50_serving_model
-```
-- **客户端请求示例**: 
-
-<p align="center">
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-```
-- **返回结果示例**: 
-``` shell
-{"label":"daisy","prob":0.9341403245925903}
-```
-
-<h3 align="center">更多示例</h3>
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名              | Bert-Base-Baike                                              |
-| 下载链接                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
-| 客户端/服务端代码     | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
-| 介绍                | 获得一个中文语句的语义表示          |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | Resnet50-Imagenet                                            |
-| 下载链接                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| 介绍        | 获得一张图片的图像语义表示              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名       | Resnet101-Imagenet                                           |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| 介绍      | 获得一张图片的图像语义表示              |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名        | CNN-IMDB                                                     |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍       | 从一个中文语句获得类别及其概率           |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | LSTM-IMDB                                                    |
-| 下载链接               | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍        | 从一个英文语句获得类别及其概率            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | BOW-IMDB                                                     |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍       | 从一个英文语句获得类别及其概率            |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | Jieba-LAC                                                    |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
-| 介绍       | 获取中文语句的分词                |
-
-
-
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | DNN-CTR                                                      |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz                    |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
-| 介绍        | 从项目的特征向量中获得点击概率        |
-
+<h2 align="center">Paddle Serving的核心功能</h2>
 
+- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
+- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
+- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
+- 支持客户端和服务端之间 **高并发和高效通信**.
+- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
 
 <h2 align="center">文档</h2>
 
@@ -280,7 +181,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
 - [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
 
 ### FAQ
-- [常见问答](doc/deprecated/FAQ.md)
+- [常见问答](doc/FAQ.md)
 
 ### 设计文档
 - [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
diff --git a/cmake/external/protobuf.cmake b/cmake/external/protobuf.cmake
index fd4b7c5898b1128c6a73f00e678e96f117f0d91e..c72a5cac52ccf1c03a0c132083e3ac43c83fb868 100644
--- a/cmake/external/protobuf.cmake
+++ b/cmake/external/protobuf.cmake
@@ -86,6 +86,63 @@ function(protobuf_generate_python SRCS)
     set(${SRCS} ${${SRCS}} PARENT_SCOPE)
 endfunction()
 
+function(grpc_protobuf_generate_python SRCS)
+    # shameless copy from https://github.com/Kitware/CMake/blob/master/Modules/FindProtobuf.cmake
+    if(NOT ARGN)
+        message(SEND_ERROR "Error: GRPC_PROTOBUF_GENERATE_PYTHON() called without any proto files")
+        return()
+    endif()
+
+    if(PROTOBUF_GENERATE_CPP_APPEND_PATH)
+        # Create an include path for each file specified
+        foreach(FIL ${ARGN})
+            get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
+            get_filename_component(ABS_PATH ${ABS_FIL} PATH)
+            list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
+            if(${_contains_already} EQUAL -1)
+                list(APPEND _protobuf_include_path -I ${ABS_PATH})
+            endif()
+        endforeach()
+    else()
+        set(_protobuf_include_path -I ${CMAKE_CURRENT_SOURCE_DIR})
+    endif()
+    if(DEFINED PROTOBUF_IMPORT_DIRS AND NOT DEFINED Protobuf_IMPORT_DIRS)
+        set(Protobuf_IMPORT_DIRS "${PROTOBUF_IMPORT_DIRS}")
+    endif()
+
+    if(DEFINED Protobuf_IMPORT_DIRS)
+        foreach(DIR ${Protobuf_IMPORT_DIRS})
+            get_filename_component(ABS_PATH ${DIR} ABSOLUTE)
+            list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
+            if(${_contains_already} EQUAL -1)
+                list(APPEND _protobuf_include_path -I ${ABS_PATH})
+            endif()
+        endforeach()
+    endif()
+
+    set(${SRCS})
+    foreach(FIL ${ARGN})
+        get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
+        get_filename_component(FIL_WE ${FIL} NAME_WE)
+        if(NOT PROTOBUF_GENERATE_CPP_APPEND_PATH)
+            get_filename_component(FIL_DIR ${FIL} DIRECTORY)
+            if(FIL_DIR)
+                set(FIL_WE "${FIL_DIR}/${FIL_WE}")
+            endif()
+        endif()
+        
+        list(APPEND ${SRCS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2_grpc.py")
+        add_custom_command(
+                OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2_grpc.py"
+                COMMAND ${PYTHON_EXECUTABLE} -m grpc_tools.protoc --python_out ${CMAKE_CURRENT_BINARY_DIR} --grpc_python_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
+                DEPENDS ${ABS_FIL}
+                COMMENT "Running Python grpc protocol buffer compiler on ${FIL}"
+                VERBATIM )
+    endforeach()
+
+    set(${SRCS} ${${SRCS}} PARENT_SCOPE)
+endfunction()
+
 # Print and set the protobuf library information,
 # finish this cmake process and exit from this file.
 macro(PROMPT_PROTOBUF_LIB)
diff --git a/cmake/generic.cmake b/cmake/generic.cmake
index 861889266b0132b8812d2d958dd6675dc631fd33..dd2fe4dc94e7213d6ad15d37f74ab1c6d41d660a 100644
--- a/cmake/generic.cmake
+++ b/cmake/generic.cmake
@@ -704,6 +704,15 @@ function(py_proto_compile TARGET_NAME)
   add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs})
 endfunction()
 
+function(py_grpc_proto_compile TARGET_NAME)
+  set(oneValueArgs "")
+  set(multiValueArgs SRCS)
+  cmake_parse_arguments(py_grpc_proto_compile "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
+  set(py_srcs)
+  grpc_protobuf_generate_python(py_srcs ${py_grpc_proto_compile_SRCS})
+  add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs})
+endfunction()
+
 function(py_test TARGET_NAME)
   if(WITH_TESTING)
     set(options "")
diff --git a/core/configure/CMakeLists.txt b/core/configure/CMakeLists.txt
index d3e5b75da96ad7a0789866a4a2c474fad988c21b..3c4bb29e9c09d12949f5b9c86f7093772b4ab8ab 100644
--- a/core/configure/CMakeLists.txt
+++ b/core/configure/CMakeLists.txt
@@ -35,6 +35,13 @@ py_proto_compile(general_model_config_py_proto SRCS proto/general_model_config.p
 add_custom_target(general_model_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
 add_dependencies(general_model_config_py_proto general_model_config_py_proto_init)
 
+py_grpc_proto_compile(multi_lang_general_model_service_py_proto SRCS proto/multi_lang_general_model_service.proto)
+add_custom_target(multi_lang_general_model_service_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
+add_dependencies(multi_lang_general_model_service_py_proto multi_lang_general_model_service_py_proto_init)
+
+py_grpc_proto_compile(general_python_service_py_proto SRCS proto/general_python_service.proto)
+add_custom_target(general_python_service_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
+add_dependencies(general_python_service_py_proto general_python_service_py_proto_init)
 if (CLIENT)
 py_proto_compile(sdk_configure_py_proto SRCS proto/sdk_configure.proto)
 add_custom_target(sdk_configure_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
@@ -51,6 +58,17 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
                 COMMENT "Copy generated general_model_config proto file into directory paddle_serving_client/proto."
                 WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 
+add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+                COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_client/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+
+add_custom_command(TARGET general_python_service_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+        COMMENT "Copy generated general_python_service proto file into directory paddle_serving_client/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 endif()
 
 if (APP)
@@ -65,6 +83,11 @@ if (SERVER)
 py_proto_compile(server_config_py_proto SRCS proto/server_configure.proto)
 add_custom_target(server_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
 add_dependencies(server_config_py_proto server_config_py_proto_init)
+
+py_proto_compile(pyserving_channel_py_proto SRCS proto/pyserving_channel.proto)
+add_custom_target(pyserving_channel_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
+add_dependencies(pyserving_channel_py_proto pyserving_channel_py_proto_init)
+
 if (NOT WITH_GPU)
 add_custom_command(TARGET server_config_py_proto POST_BUILD
 		COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
@@ -77,6 +100,24 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
 		COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
 		COMMENT "Copy generated general_model_config proto file into directory paddle_serving_server/proto."
 		WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+
+add_custom_command(TARGET general_python_service_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMENT "Copy generated general_python_service proto file into directory paddle_serving_server/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+            
+add_custom_command(TARGET pyserving_channel_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMENT "Copy generated pyserving_channel proto file into directory paddle_serving_server/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+
+add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+                COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_server/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 else()
 add_custom_command(TARGET server_config_py_proto POST_BUILD
 		COMMAND ${CMAKE_COMMAND} -E make_directory
@@ -95,5 +136,23 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
 		COMMENT "Copy generated general_model_config proto file into directory
         paddle_serving_server_gpu/proto."
 		WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+
+add_custom_command(TARGET general_python_service_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMENT "Copy generated general_python_service proto file into directory paddle_serving_server_gpu/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+
+add_custom_command(TARGET pyserving_channel_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMENT "Copy generated pyserving_channel proto file into directory paddle_serving_server_gpu/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+
+add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+                COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_server_gpu/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 endif()
 endif()
diff --git a/python/paddle_serving_client/general_python_service.proto b/core/configure/proto/general_python_service.proto
similarity index 84%
rename from python/paddle_serving_client/general_python_service.proto
rename to core/configure/proto/general_python_service.proto
index 7f3af66df8d011b9a0a4fbcd9fb14a704f0c4bb2..4ced29e6c2c416c895ca2431f81938a02cc31106 100644
--- a/python/paddle_serving_client/general_python_service.proto
+++ b/core/configure/proto/general_python_service.proto
@@ -13,6 +13,7 @@
 // limitations under the License.
 
 syntax = "proto2";
+package baidu.paddle_serving.pyserving;
 
 service GeneralPythonService {
   rpc inference(Request) returns (Response) {}
@@ -21,11 +22,15 @@ service GeneralPythonService {
 message Request {
   repeated bytes feed_insts = 1;
   repeated string feed_var_names = 2;
+  repeated bytes shape = 3;
+  repeated string type = 4;
 }
 
 message Response {
   repeated bytes fetch_insts = 1;
   repeated string fetch_var_names = 2;
-  required int32 is_error = 3;
+  required int32 ecode = 3;
   optional string error_info = 4;
+  repeated bytes shape = 5;
+  repeated string type = 6;
 }
diff --git a/core/configure/proto/multi_lang_general_model_service.proto b/core/configure/proto/multi_lang_general_model_service.proto
new file mode 100644
index 0000000000000000000000000000000000000000..6e1764b23b3e6f7d9eb9a33925bcd83cfb1810bb
--- /dev/null
+++ b/core/configure/proto/multi_lang_general_model_service.proto
@@ -0,0 +1,50 @@
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+syntax = "proto2";
+
+message Tensor {
+  optional bytes data = 1;
+  repeated int32 int_data = 2;
+  repeated int64 int64_data = 3;
+  repeated float float_data = 4;
+  optional int32 elem_type = 5;
+  repeated int32 shape = 6;
+  repeated int32 lod = 7; // only for fetch tensor currently
+};
+
+message FeedInst { repeated Tensor tensor_array = 1; };
+
+message FetchInst { repeated Tensor tensor_array = 1; };
+
+message Request {
+  repeated FeedInst insts = 1;
+  repeated string feed_var_names = 2;
+  repeated string fetch_var_names = 3;
+  required bool is_python = 4 [ default = false ];
+};
+
+message Response {
+  repeated ModelOutput outputs = 1;
+  optional string tag = 2;
+};
+
+message ModelOutput {
+  repeated FetchInst insts = 1;
+  optional string engine_name = 2;
+}
+
+service MultiLangGeneralModelService {
+  rpc inference(Request) returns (Response) {}
+};
diff --git a/python/paddle_serving_server/python_service_channel.proto b/core/configure/proto/pyserving_channel.proto
similarity index 83%
rename from python/paddle_serving_server/python_service_channel.proto
rename to core/configure/proto/pyserving_channel.proto
index 76a0d99c5cfb9f34e7478e66a89c416f135b73c1..060f4d72cd710ea9b0d9e1a6ee31cdc3d6845867 100644
--- a/python/paddle_serving_server/python_service_channel.proto
+++ b/core/configure/proto/pyserving_channel.proto
@@ -13,17 +13,19 @@
 // limitations under the License.
 
 syntax = "proto2";
+package baidu.paddle_serving.pyserving;
 
 message ChannelData {
   repeated Inst insts = 1;
   required int32 id = 2;
-  optional string type = 3
-      [ default = "CD" ]; // CD(channel data), CF(channel futures)
-  required int32 is_error = 4;
+  required int32 type = 3 [ default = 0 ];
+  required int32 ecode = 4;
   optional string error_info = 5;
 }
 
 message Inst {
   required bytes data = 1;
   required string name = 2;
+  required bytes shape = 3;
+  required string type = 4;
 }
diff --git a/doc/ABTEST_IN_PADDLE_SERVING.md b/doc/ABTEST_IN_PADDLE_SERVING.md
index f2302e611bc68607ed68f45f81cd833a91938ae6..3ae23504bff2621c9a814a3ac15e5157626f8999 100644
--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -21,7 +21,7 @@ The following Python code will process the data `test_data/part-0` and write to
 
 [//file]:#process.py
 ``` python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')
 
@@ -78,7 +78,7 @@ with open('processed.data') as f:
         feed = {"words": word_ids}
         fetch = ["acc", "cost", "prediction"]
         [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
             cnt[tag]['acc'] += 1
         cnt[tag]['total'] += 1
 
@@ -88,7 +88,7 @@ with open('processed.data') as f:
 
 In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
 
-When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
+When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
 
 ### Expected Results
 
diff --git a/doc/ABTEST_IN_PADDLE_SERVING_CN.md b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
index 7ba4e5d7dbe643d87fc15e783afea2955b98fa1e..43bb702bd8b0317d7449313c0e1362953ed87744 100644
--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -20,7 +20,7 @@ sh get_data.sh
 下面Python代码将处理`test_data/part-0`的数据，写入`processed.data`文件中。
 
 ```python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')
 
@@ -76,7 +76,7 @@ with open('processed.data') as f:
         feed = {"words": word_ids}
         fetch = ["acc", "cost", "prediction"]
         [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
             cnt[tag]['acc'] += 1
         cnt[tag]['total'] += 1
 
diff --git a/doc/BERT_10_MINS.md b/doc/BERT_10_MINS.md
index 71f6f065f4101aae01e077910fc5b6bd6b039b46..53e51768d3eaee6a1faac8d9ae2c62e7f1aa63ee 100644
--- a/doc/BERT_10_MINS.md
+++ b/doc/BERT_10_MINS.md
@@ -59,7 +59,7 @@ the script of client side bert_client.py is as follow:
 import os
 import sys
 from paddle_serving_client import Client
-from paddle_serving_app import ChineseBertReader
+from paddle_serving_app.reader import ChineseBertReader
 
 reader = ChineseBertReader()
 fetch = ["pooled_output"]
diff --git a/doc/BERT_10_MINS_CN.md b/doc/BERT_10_MINS_CN.md
index b7a5180da1bae2dafc431251f2b98c8a2041856a..e4904d86b6a056ba74b6ed85b47745575b749279 100644
--- a/doc/BERT_10_MINS_CN.md
+++ b/doc/BERT_10_MINS_CN.md
@@ -52,7 +52,7 @@ pip install paddle_serving_app
 ``` python
 import sys
 from paddle_serving_client import Client
-from paddle_serving_app import ChineseBertReader
+from paddle_serving_app.reader import ChineseBertReader
 
 reader = ChineseBertReader()
 fetch = ["pooled_output"]
diff --git a/doc/COMPILE.md b/doc/COMPILE.md
index 411620af2ee10a769384c36cebc3aa3ecb93ea49..f4a6639bdb38fac97662084f7d927d24b6179717 100644
--- a/doc/COMPILE.md
+++ b/doc/COMPILE.md
@@ -20,7 +20,7 @@ This document will take Python2 as an example to show how to compile Paddle Serv
 
 - Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
 - Set  `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
-- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3`
+- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`
 
 ## Get Code
 
@@ -36,6 +36,8 @@ cd Serving && git submodule update --init --recursive
 export PYTHONROOT=/usr/
 ```
 
+In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
+
 ## Compile Server
 
 ### Integrated CPU version paddle inference library
diff --git a/doc/COMPILE_CN.md b/doc/COMPILE_CN.md
index 44802260719d37a3140ca15f6a2ccc15479e32d6..d8fd277131d7d169c1a47689e15556e5d10a0fdb 100644
--- a/doc/COMPILE_CN.md
+++ b/doc/COMPILE_CN.md
@@ -20,7 +20,7 @@
 
 - 将`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
 - 将`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
-- 将`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3`
+- 将`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`
 
 ## 获取代码
 
@@ -36,6 +36,8 @@ cd Serving && git submodule update --init --recursive
 export PYTHONROOT=/usr/
 ```
 
+我们提供默认Centos7的Python路径为`/usr/bin/python`，如果您要使用我们的Centos6镜像，需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`。
+
 ## 编译Server部分
 
 ### 集成CPU版本Paddle Inference Library
diff --git a/doc/FAQ.md b/doc/FAQ.md
new file mode 100644
index 0000000000000000000000000000000000000000..3bdd2dfd4739b54bf39b6b3f561c43bab3edabde
--- /dev/null
+++ b/doc/FAQ.md
@@ -0,0 +1,15 @@
+# FAQ
+
+- Q：如何调整RPC服务的等待时间，避免超时？ 
+
+  A：使用set_rpc_timeout_ms设置更长的等待时间，单位为毫秒，默认时间为20秒。
+  
+  示例：
+  ```
+  from paddle_serving_client import Client
+
+  client = Client()
+  client.load_client_config(sys.argv[1])
+  client.set_rpc_timeout_ms(100000)
+  client.connect(["127.0.0.1:9393"])
+  ```
diff --git a/doc/HOT_LOADING_IN_SERVING.md b/doc/HOT_LOADING_IN_SERVING.md
index 299b49d4c9b58af413e5507b5523e93a02acc7d1..94575ca51368e4b9d03cdc65ce391a0ae43f0175 100644
--- a/doc/HOT_LOADING_IN_SERVING.md
+++ b/doc/HOT_LOADING_IN_SERVING.md
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
 
 ### Product model
 
-Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
+Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
 
 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())
 
 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
-    os.system('{} fs -put -f {} {}'.format(
-      hadoop_bin, local_file_path, remote_path))
+    afs = 'afs://***.***.***.***:***' # User needs to change
+    uci = '***,***' # User needs to change
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
 
 name = "uci_housing"
 for pass_id in range(30):
diff --git a/doc/HOT_LOADING_IN_SERVING_CN.md b/doc/HOT_LOADING_IN_SERVING_CN.md
index 83cb20a3f661c6aa4bbcc3312ac131da1bb5038e..97a2272cffed18e7753859e9991757a5cccb7439 100644
--- a/doc/HOT_LOADING_IN_SERVING_CN.md
+++ b/doc/HOT_LOADING_IN_SERVING_CN.md
@@ -46,7 +46,7 @@ Paddle Serving提供了一个自动监控脚本，远端地址更新模型后会
 
 ### 生产模型
 
-在`product_path`下运行下面的Python代码生产模型，每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
+在`product_path`下运行下面的Python代码生产模型（运行前需要修改hadoop相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
 
 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())
 
 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
-    os.system('{} fs -put -f {} {}'.format(
-      hadoop_bin, local_file_path, remote_path))
+    afs = 'afs://***.***.***.***:***' # User needs to change
+    uci = '***,***' # User needs to change
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
 
 name = "uci_housing"
 for pass_id in range(30):
diff --git a/doc/IMDB_GO_CLIENT_CN.md b/doc/IMDB_GO_CLIENT_CN.md
index 86355bd538d0abd995b7c47e34a5062fe1c09406..5067d1ef79218d176aee0c0d7d41506a0b6dc428 100644
--- a/doc/IMDB_GO_CLIENT_CN.md
+++ b/doc/IMDB_GO_CLIENT_CN.md
@@ -99,7 +99,7 @@ func main() {
 ### 基于IMDB测试集的预测
 
 ```python
-go run imdb_client.go serving_client_conf / serving_client_conf.stream.prototxt test.data> result
+go run imdb_client.go serving_client_conf/serving_client_conf.stream.prototxt test.data> result
 ```
 
 ### 计算精度
diff --git a/doc/PERFORMANCE_OPTIM.md b/doc/PERFORMANCE_OPTIM.md
index 0de06c16988d14d8f92eced491db7dc423831afe..651be1c139b5960fa287fc3e981f3039f9f098a2 100644
--- a/doc/PERFORMANCE_OPTIM.md
+++ b/doc/PERFORMANCE_OPTIM.md
@@ -2,9 +2,9 @@
 
 ([简体中文](./PERFORMANCE_OPTIM_CN.md)|English)
 
-Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computationa-intensive services.
+Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
 
-For a prediction service, the easiest way to determine what type it is is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
+For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
 
 For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
 
@@ -16,5 +16,5 @@ Parameters for performance optimization:
 
 | Parameters | Type | Default | Description                                                  |
 | ---------- | ---- | ------- | ------------------------------------------------------------ |
-| mem_optim  | bool | False   | Enable memory / graphic memory optimization                                   |
-| ir_optim   | bool | Fasle   | Enable analysis and optimization of calculation graph,including OP fusion, etc |
+| mem_optim  | - | - | Enable memory / graphic memory optimization                                   |
+| ir_optim   | - | -  | Enable analysis and optimization of calculation graph,including OP fusion, etc |
diff --git a/doc/PERFORMANCE_OPTIM_CN.md b/doc/PERFORMANCE_OPTIM_CN.md
index 1a2c3840942930060a1805bcb999f01b5780cbae..c35ea7a11c40ad2a5752d9add8fd8d9f8ddb2b64 100644
--- a/doc/PERFORMANCE_OPTIM_CN.md
+++ b/doc/PERFORMANCE_OPTIM_CN.md
@@ -16,5 +16,5 @@
 
 | 参数      | 类型 | 默认值 | 含义                      |
 | --------- | ---- | ------ | -------------------------------- |
-| mem_optim | bool | False  | 开启内存/显存优化                |
-| ir_optim  | bool | Fasle  | 开启计算图分析优化，包括OP融合等 |
+| mem_optim | - | -  | 开启内存/显存优化                |
+| ir_optim  | - | -  | 开启计算图分析优化，包括OP融合等 |
diff --git a/doc/SAVE.md b/doc/SAVE.md
index 4fcdfa438574fac7de21c963f5bb173c69261210..54800fa06ab4b8c20c0ffe75d417e1b42ab6ebe6 100644
--- a/doc/SAVE.md
+++ b/doc/SAVE.md
@@ -34,7 +34,7 @@ for line in sys.stdin:
 
 ## Export from saved model files
 If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
-```
+```python
 import paddle_serving_client.io as serving_io
 serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
 ```
diff --git a/doc/SAVE_CN.md b/doc/SAVE_CN.md
index 3ca715c024a38b6fdce5c973844e7d023eebffcc..aaf0647fd1c4e95584bb7aa42a6671620adeb6d0 100644
--- a/doc/SAVE_CN.md
+++ b/doc/SAVE_CN.md
@@ -35,7 +35,7 @@ for line in sys.stdin:
 
 ## 从已保存的模型文件中导出
 如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型，则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
-```
+```python
 import paddle_serving_client.io as serving_io
 serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client",  model_filename=None, params_filename=None)
 ```
diff --git a/doc/UWSGI_DEPLOY.md b/doc/UWSGI_DEPLOY.md
index cb3fb506bf6fd4461240ebe43234fa3bed3d4784..1aa9c1fce452d8f3525d3646133d90356fce25e6 100644
--- a/doc/UWSGI_DEPLOY.md
+++ b/doc/UWSGI_DEPLOY.md
@@ -18,7 +18,7 @@ http://10.127.3.150:9393/uci/prediction
 Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. 
 The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
 
-Next, we will show how to use the [uWSGI] (https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
+Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
 
 
 ```python
@@ -29,7 +29,7 @@ from paddle_serving_server.web_service import WebService
 uci_service = WebService(name = "uci")
 uci_service.load_model_config("./uci_housing_model")
 uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_server()
+uci_service.run_rpc_service()
 #Get flask application
 app_instance = uci_service.get_app_instance()
 ```
diff --git a/doc/UWSGI_DEPLOY_CN.md b/doc/UWSGI_DEPLOY_CN.md
index 5bb87e26bbae729f8c21b4681413a4c9f5c4e7c8..966155162f5ff90e88f9b743a3047b5d86440a46 100644
--- a/doc/UWSGI_DEPLOY_CN.md
+++ b/doc/UWSGI_DEPLOY_CN.md
@@ -29,7 +29,7 @@ from paddle_serving_server.web_service import WebService
 uci_service = WebService(name = "uci")
 uci_service.load_model_config("./uci_housing_model")
 uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_server()
+uci_service.run_rpc_service()
 #获取flask服务
 app_instance = uci_service.get_app_instance()
 ```
diff --git a/python/examples/bert/benchmark.py b/python/examples/bert/benchmark.py
index af75b718b78b2bc130c2411d05d190fc0d298006..3ac9d07625e881b43550578c4a6346e4ac874063 100644
--- a/python/examples/bert/benchmark.py
+++ b/python/examples/bert/benchmark.py
@@ -19,13 +19,11 @@ from __future__ import unicode_literals, absolute_import
 import os
 import sys
 import time
+import json
+import requests
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
+from paddle_serving_client.utils import benchmark_args, show_latency
 from paddle_serving_app.reader import ChineseBertReader
 
 args = benchmark_args()
@@ -36,42 +34,105 @@ def single_func(idx, resource):
     dataset = []
     for line in fin:
         dataset.append(line.strip())
+
+    profile_flags = False
+    latency_flags = False
+    if os.getenv("FLAGS_profile_client"):
+        profile_flags = True
+    if os.getenv("FLAGS_serving_latency"):
+        latency_flags = True
+        latency_list = []
+
     if args.request == "rpc":
-        reader = ChineseBertReader(vocab_file="vocab.txt", max_seq_len=20)
+        reader = ChineseBertReader({"max_seq_len": 128})
         fetch = ["pooled_output"]
         client = Client()
         client.load_client_config(args.model)
         client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
-
         start = time.time()
-        for i in range(1000):
-            if args.batch_size == 1:
-                feed_dict = reader.process(dataset[i])
-                result = client.predict(feed=feed_dict, fetch=fetch)
+        for i in range(turns):
+            if args.batch_size >= 1:
+                l_start = time.time()
+                feed_batch = []
+                b_start = time.time()
+                for bi in range(args.batch_size):
+                    feed_batch.append(reader.process(dataset[bi]))
+                b_end = time.time()
+
+                if profile_flags:
+                    sys.stderr.write(
+                        "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
+                            os.getpid(),
+                            int(round(b_start * 1000000)),
+                            int(round(b_end * 1000000))))
+                result = client.predict(feed=feed_batch, fetch=fetch)
+
+                l_end = time.time()
+                if latency_flags:
+                    latency_list.append(l_end * 1000 - l_start * 1000)
             else:
                 print("unsupport batch size {}".format(args.batch_size))
 
     elif args.request == "http":
+        reader = ChineseBertReader({"max_seq_len": 128})
+        fetch = ["pooled_output"]
+        server = "http://" + resource["endpoint"][idx % len(resource[
+            "endpoint"])] + "/bert/prediction"
         start = time.time()
-        header = {"Content-Type": "application/json"}
-        for i in range(1000):
-            dict_data = {"words": dataset[i], "fetch": ["pooled_output"]}
-            r = requests.post(
-                'http://{}/bert/prediction'.format(resource["endpoint"][
-                    idx % len(resource["endpoint"])]),
-                data=json.dumps(dict_data),
-                headers=header)
+        for i in range(turns):
+            if args.batch_size >= 1:
+                l_start = time.time()
+                feed_batch = []
+                b_start = time.time()
+                for bi in range(args.batch_size):
+                    feed_batch.append({"words": dataset[bi]})
+                req = json.dumps({"feed": feed_batch, "fetch": fetch})
+                b_end = time.time()
+
+                if profile_flags:
+                    sys.stderr.write(
+                        "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
+                            os.getpid(),
+                            int(round(b_start * 1000000)),
+                            int(round(b_end * 1000000))))
+                result = requests.post(
+                    server,
+                    data=req,
+                    headers={"Content-Type": "application/json"})
+                l_end = time.time()
+                if latency_flags:
+                    latency_list.append(l_end * 1000 - l_start * 1000)
+            else:
+                print("unsupport batch size {}".format(args.batch_size))
+
+    else:
+        raise ValueError("not implemented {} request".format(args.request))
     end = time.time()
-    return [[end - start]]
+    if latency_flags:
+        return [[end - start], latency_list]
+    else:
+        return [[end - start]]
 
 
 if __name__ == '__main__':
     multi_thread_runner = MultiThreadRunner()
     endpoint_list = ["127.0.0.1:9292"]
-    result = multi_thread_runner.run(single_func, args.thread,
-                                     {"endpoint": endpoint_list})
+    turns = 10
+    start = time.time()
+    result = multi_thread_runner.run(
+        single_func, args.thread, {"endpoint": endpoint_list,
+                                   "turns": turns})
+    end = time.time()
+    total_cost = end - start
+
     avg_cost = 0
     for i in range(args.thread):
         avg_cost += result[0][i]
     avg_cost = avg_cost / args.thread
-    print("average total cost {} s.".format(avg_cost))
+
+    print("total cost :{} s".format(total_cost))
+    print("each thread cost :{} s. ".format(avg_cost))
+    print("qps :{} samples/s".format(args.batch_size * args.thread * turns /
+                                     total_cost))
+    if os.getenv("FLAGS_serving_latency"):
+        show_latency(result[1])
diff --git a/python/examples/bert/benchmark.sh b/python/examples/bert/benchmark.sh
index 7f9e2325f3b8f7db288d2b7d82d0d412e05417cb..7ee5f32e9e5d89a836f8962a256bcdf7bf0b62e2 100644
--- a/python/examples/bert/benchmark.sh
+++ b/python/examples/bert/benchmark.sh
@@ -1,9 +1,30 @@
 rm profile_log
-for thread_num in 1 2 4 8 16
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+export FLAGS_serving_latency=1
+python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog &
+
+sleep 5
+
+#warm up
+python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+
+for thread_num in 4 8 16
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
-    echo "batch size : $batch_size" >> profile_log
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
+for batch_size in 1 4 16 64 256
+do
+    python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+    echo "model name :" $1
+    echo "thread num :" $thread_num
+    echo "batch size :" $batch_size
+    echo "=================Done===================="
+    echo "model name :$1" >> profile_log_$1
+    echo "batch size :$batch_size" >> profile_log_$1
+    python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
+    tail -n 8 profile >> profile_log_$1
+    echo "" >> profile_log_$1
+done
 done
+
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
diff --git a/python/examples/bert/benchmark_batch.py b/python/examples/bert/benchmark_batch.py
deleted file mode 100644
index 7cedb6aa451e0e4a128f0fedbfde1a896977f601..0000000000000000000000000000000000000000
--- a/python/examples/bert/benchmark_batch.py
+++ /dev/null
@@ -1,79 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from __future__ import unicode_literals, absolute_import
-import os
-import sys
-import time
-from paddle_serving_client import Client
-from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
-from bert_reader import BertReader
-args = benchmark_args()
-
-
-def single_func(idx, resource):
-    fin = open("data-c.txt")
-    dataset = []
-    for line in fin:
-        dataset.append(line.strip())
-    profile_flags = False
-    if os.environ["FLAGS_profile_client"]:
-        profile_flags = True
-    if args.request == "rpc":
-        reader = BertReader(vocab_file="vocab.txt", max_seq_len=20)
-        fetch = ["pooled_output"]
-        client = Client()
-        client.load_client_config(args.model)
-        client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
-        start = time.time()
-        for i in range(1000):
-            if args.batch_size >= 1:
-                feed_batch = []
-                b_start = time.time()
-                for bi in range(args.batch_size):
-                    feed_batch.append(reader.process(dataset[bi]))
-                b_end = time.time()
-                if profile_flags:
-                    print("PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}".format(
-                        os.getpid(),
-                        int(round(b_start * 1000000)),
-                        int(round(b_end * 1000000))))
-                result = client.predict(feed=feed_batch, fetch=fetch)
-            else:
-                print("unsupport batch size {}".format(args.batch_size))
-
-    elif args.request == "http":
-        raise ("no batch predict for http")
-    end = time.time()
-    return [[end - start]]
-
-
-if __name__ == '__main__':
-    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9292"]
-    result = multi_thread_runner.run(single_func, args.thread,
-                                     {"endpoint": endpoint_list})
-    avg_cost = 0
-    for i in range(args.thread):
-        avg_cost += result[0][i]
-    avg_cost = avg_cost / args.thread
-    print("average total cost {} s.".format(avg_cost))
diff --git a/python/examples/bert/benchmark_batch.sh b/python/examples/bert/benchmark_batch.sh
deleted file mode 100644
index 272923776d6640880175745920a8fad9e84972fd..0000000000000000000000000000000000000000
--- a/python/examples/bert/benchmark_batch.sh
+++ /dev/null
@@ -1,19 +0,0 @@
-rm profile_log
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9295 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
-
-sleep 5
-
-for thread_num in 1 2 4 8 16
-do
-for batch_size in 1 2 4 8 16 32 64 128 256 512
-do
-    $PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
-    echo "thread num: ", $thread_num
-    echo "batch size: ", $batch_size
-    echo "batch size : $batch_size" >> profile_log
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
-done
-done
diff --git a/python/examples/bert/bert_client.py b/python/examples/bert/bert_client.py
index b72d17f142c65bafe8ef13e1a963aacce6b3e821..362ac67915870af9d11209520daa61daa95082c1 100644
--- a/python/examples/bert/bert_client.py
+++ b/python/examples/bert/bert_client.py
@@ -14,15 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import os
 import sys
-import numpy as np
-import paddlehub as hub
-import ujson
-import random
-import time
-from paddlehub.common.logger import logger
-import socket
 from paddle_serving_client import Client
 from paddle_serving_client.utils import benchmark_args
 from paddle_serving_app.reader import ChineseBertReader
diff --git a/python/examples/bert/bert_web_service.py b/python/examples/bert/bert_web_service.py
index d72150878c51d4f95bbc5d2263ad00fb1ed2c387..b1898b2cc0ee690dd075958944a56fed27dce29a 100644
--- a/python/examples/bert/bert_web_service.py
+++ b/python/examples/bert/bert_web_service.py
@@ -21,7 +21,10 @@ import os
 
 class BertService(WebService):
     def load(self):
-        self.reader = ChineseBertReader(vocab_file="vocab.txt", max_seq_len=128)
+        self.reader = ChineseBertReader({
+            "vocab_file": "vocab.txt",
+            "max_seq_len": 128
+        })
 
     def preprocess(self, feed=[], fetch=[]):
         feed_res = [
diff --git a/python/examples/criteo_ctr_with_cube/cube_prepare.sh b/python/examples/criteo_ctr_with_cube/cube_prepare.sh
index 2d0efaa56f06e9ad8d1590f1316e64bcc65f268d..1417254a54e2194ab3a0194f2ec970f480787acd 100755
--- a/python/examples/criteo_ctr_with_cube/cube_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_prepare.sh
@@ -17,6 +17,6 @@
 mkdir -p cube_model
 mkdir -p cube/data
 ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
 mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
 cd cube && ./cube 
diff --git a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
index 7c794e103baa3a97d09966c470dd48eb56579500..0db6575ab307fb81cdd0336a20bb9a8ec30d446d 100755
--- a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
@@ -17,6 +17,6 @@
 mkdir -p cube_model
 mkdir -p cube/data
 ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
 mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
 cd cube && ./cube 
diff --git a/python/examples/deeplabv3/README.md b/python/examples/deeplabv3/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3eb5c84e2d5be7c7a1448940c758e60d77bd56e6
--- /dev/null
+++ b/python/examples/deeplabv3/README.md
@@ -0,0 +1,22 @@
+# Image Segmentation
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model deeplabv3
+tar -xzvf deeplabv3.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
+```
+
+### Client Prediction
+
+```
+python deeplabv3_client.py
+```
diff --git a/python/examples/deeplabv3/README_CN.md b/python/examples/deeplabv3/README_CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..a25bb2d059df49568056664493c1c96b999005b2
--- /dev/null
+++ b/python/examples/deeplabv3/README_CN.md
@@ -0,0 +1,21 @@
+# 图像分割
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model deeplabv3
+tar -xzvf deeplabv3.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
+```
+
+### 客户端预测
+
+```
+python deeplabv3_client.py
diff --git a/python/examples/deeplabv3/deeplabv3_client.py b/python/examples/deeplabv3/deeplabv3_client.py
index 75ea6b0a01868af30c94fb0686159571c2c1c966..77e25d5f5a24d0aa1dad8939c1e7845eaf5e4122 100644
--- a/python/examples/deeplabv3/deeplabv3_client.py
+++ b/python/examples/deeplabv3/deeplabv3_client.py
@@ -18,7 +18,7 @@ import sys
 import cv2
 
 client = Client()
-client.load_client_config("seg_client/serving_client_conf.prototxt")
+client.load_client_config("deeplabv3_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9494"])
 
 preprocess = Sequential(
diff --git a/python/examples/faster_rcnn_model/README.md b/python/examples/faster_rcnn_model/README.md
index c1d3d40b054fb362bd20c59a9a7fc4d09e89f31b..e31f734e2b8f04ee4cd35258f9da81672b2caf88 100644
--- a/python/examples/faster_rcnn_model/README.md
+++ b/python/examples/faster_rcnn_model/README.md
@@ -12,8 +12,8 @@ If you want to have more detection models, please refer to [Paddle Detection Mod
 ### Start the service
 ```
 tar xf faster_rcnn_model.tar.gz
-mv faster_rcnn_model/pddet *.
-GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
+mv faster_rcnn_model/pddet* .
+GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
 
 ### Perform prediction
diff --git a/python/examples/faster_rcnn_model/README_CN.md b/python/examples/faster_rcnn_model/README_CN.md
index a2c3618f071a3650d50c791595bc04ba0c1d378a..3ddccf9e63043e797c9e261c1f26ebe774adb81c 100644
--- a/python/examples/faster_rcnn_model/README_CN.md
+++ b/python/examples/faster_rcnn_model/README_CN.md
@@ -13,7 +13,7 @@ wget https://paddle-serving.bj.bcebos.com/pddet_demo/infer_cfg.yml
 ```
 tar xf faster_rcnn_model.tar.gz
 mv faster_rcnn_model/pddet* ./
-GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
+GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
 
 ### 执行预测
diff --git a/python/examples/fit_a_line/test_multi_process_client.py b/python/examples/fit_a_line/test_multi_process_client.py
index 46ba3b60b5ae09b568868531d32234ade50d8556..5272d095df5e74f25ce0e36ca22c8d6d1884f5f0 100644
--- a/python/examples/fit_a_line/test_multi_process_client.py
+++ b/python/examples/fit_a_line/test_multi_process_client.py
@@ -22,15 +22,19 @@ def single_func(idx, resource):
     client.load_client_config(
         "./uci_housing_client/serving_client_conf.prototxt")
     client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
-    test_reader = paddle.batch(
-        paddle.reader.shuffle(
-            paddle.dataset.uci_housing.test(), buf_size=500),
-        batch_size=1)
-    for data in test_reader():
-        fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["price"])
+    x = [
+        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
+        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
+    ]
+    for i in range(1000):
+        fetch_map = client.predict(feed={"x": x}, fetch=["price"])
+        if fetch_map is None:
+            return [[None]]
     return [[0]]
 
 
 multi_thread_runner = MultiThreadRunner()
 thread_num = 4
 result = multi_thread_runner.run(single_func, thread_num, {})
+if None in result[0]:
+    exit(1)
diff --git a/python/examples/fit_a_line/test_multilang_client.py b/python/examples/fit_a_line/test_multilang_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..c2c58378e523afb9724bc54a25228598d529dd7a
--- /dev/null
+++ b/python/examples/fit_a_line/test_multilang_client.py
@@ -0,0 +1,32 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+from paddle_serving_client import MultiLangClient
+import sys
+
+client = MultiLangClient()
+client.load_client_config(sys.argv[1])
+client.connect(["127.0.0.1:9393"])
+
+import paddle
+test_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.test(), buf_size=500),
+    batch_size=1)
+
+for data in test_reader():
+    future = client.predict(feed={"x": data[0][0]}, fetch=["price"], asyn=True)
+    fetch_map = future.result()
+    print("{} {}".format(fetch_map["price"][0], data[0][1][0]))
diff --git a/python/examples/fit_a_line/test_multilang_server.py b/python/examples/fit_a_line/test_multilang_server.py
new file mode 100644
index 0000000000000000000000000000000000000000..23eb938f0ee1bf6b195509816dea5221bbfa9218
--- /dev/null
+++ b/python/examples/fit_a_line/test_multilang_server.py
@@ -0,0 +1,36 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+import os
+import sys
+from paddle_serving_server import OpMaker
+from paddle_serving_server import OpSeqMaker
+from paddle_serving_server import MultiLangServer
+
+op_maker = OpMaker()
+read_op = op_maker.create('general_reader')
+general_infer_op = op_maker.create('general_infer')
+response_op = op_maker.create('general_response')
+
+op_seq_maker = OpSeqMaker()
+op_seq_maker.add_op(read_op)
+op_seq_maker.add_op(general_infer_op)
+op_seq_maker.add_op(response_op)
+
+server = MultiLangServer()
+server.set_op_sequence(op_seq_maker.get_op_sequence())
+server.load_model_config(sys.argv[1])
+server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
+server.run_server()
diff --git a/python/examples/fit_a_line/test_py_server.py b/python/examples/fit_a_line/test_py_server.py
deleted file mode 100644
index ff4542560a455af4b8256c8daf9b2c62ac5d8568..0000000000000000000000000000000000000000
--- a/python/examples/fit_a_line/test_py_server.py
+++ /dev/null
@@ -1,116 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from paddle_serving_server.pyserver import Op
-from paddle_serving_server.pyserver import Channel
-from paddle_serving_server.pyserver import PyServer
-from paddle_serving_server import python_service_channel_pb2
-import numpy as np
-import logging
-
-logging.basicConfig(
-    format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s',
-    datefmt='%Y-%m-%d %H:%M',
-    level=logging.INFO)
-
-# channel data: {name(str): data(bytes)}
-
-
-class CombineOp(Op):
-    def preprocess(self, input_data):
-        cnt = 0
-        for op_name, data in input_data.items():
-            logging.debug("CombineOp preprocess: {}".format(op_name))
-            cnt += np.frombuffer(data.insts[0].data, dtype='float')
-        data = python_service_channel_pb2.ChannelData()
-        inst = python_service_channel_pb2.Inst()
-        inst.data = np.ndarray.tobytes(cnt)
-        inst.name = "combine_op_output"
-        data.insts.append(inst)
-        return data
-
-    def postprocess(self, output_data):
-        return output_data
-
-
-class UciOp(Op):
-    def postprocess(self, output_data):
-        data = python_service_channel_pb2.ChannelData()
-        inst = python_service_channel_pb2.Inst()
-        pred = np.array(output_data["price"][0][0], dtype='float')
-        inst.data = np.ndarray.tobytes(pred)
-        inst.name = "prediction"
-        data.insts.append(inst)
-        return data
-
-
-read_channel = Channel(name="read_channel")
-combine_channel = Channel(name="combine_channel")
-out_channel = Channel(name="out_channel")
-
-cnn_op = UciOp(
-    name="cnn",
-    input=read_channel,
-    in_dtype='float',
-    outputs=[combine_channel],
-    out_dtype='float',
-    server_model="./uci_housing_model",
-    server_port="9393",
-    device="cpu",
-    client_config="uci_housing_client/serving_client_conf.prototxt",
-    server_name="127.0.0.1:9393",
-    fetch_names=["price"],
-    concurrency=1,
-    timeout=0.01,
-    retry=2)
-
-bow_op = UciOp(
-    name="bow",
-    input=read_channel,
-    in_dtype='float',
-    outputs=[combine_channel],
-    out_dtype='float',
-    server_model="./uci_housing_model",
-    server_port="9292",
-    device="cpu",
-    client_config="uci_housing_client/serving_client_conf.prototxt",
-    server_name="127.0.0.1:9393",
-    fetch_names=["price"],
-    concurrency=1,
-    timeout=-1,
-    retry=1)
-
-combine_op = CombineOp(
-    name="combine",
-    input=combine_channel,
-    in_dtype='float',
-    outputs=[out_channel],
-    out_dtype='float',
-    concurrency=1,
-    timeout=-1,
-    retry=1)
-
-logging.info(read_channel.debug())
-logging.info(combine_channel.debug())
-logging.info(out_channel.debug())
-pyserver = PyServer(profile=False, retry=1)
-pyserver.add_channel(read_channel)
-pyserver.add_channel(combine_channel)
-pyserver.add_channel(out_channel)
-pyserver.add_op(cnn_op)
-pyserver.add_op(bow_op)
-pyserver.add_op(combine_op)
-pyserver.prepare_server(port=8080, worker_num=2)
-pyserver.run_server()
diff --git a/python/examples/imagenet/README_CN.md b/python/examples/imagenet/README_CN.md
index 77ade579ba17ad8247b2f118242642a1d3c79927..081cff528c393ecb5534ec679d6e63739f720f20 100644
--- a/python/examples/imagenet/README_CN.md
+++ b/python/examples/imagenet/README_CN.md
@@ -19,10 +19,10 @@ pip install paddle_serving_app
 
 启动server端
 ```
-python image_classification_service.py ResNet50_vd_model cpu 9696 #cpu预测服务
+python resnet50_web_service.py ResNet50_vd_model cpu 9696 #cpu预测服务
 ```
 ```
-python image_classification_service.py ResNet50_vd_model gpu 9696 #gpu预测服务
+python resnet50_web_service.py ResNet50_vd_model gpu 9696 #gpu预测服务
 ```
 
 
diff --git a/python/examples/imagenet/benchmark.py b/python/examples/imagenet/benchmark.py
index caa952f121fbd8725c2a6bfe36f0dd84b6a82707..5c4c44cc1bd091af6c4d343d2b7f0f436cca2e7e 100644
--- a/python/examples/imagenet/benchmark.py
+++ b/python/examples/imagenet/benchmark.py
@@ -73,7 +73,7 @@ def single_func(idx, resource):
                 print("unsupport batch size {}".format(args.batch_size))
 
     elif args.request == "http":
-        py_version = 2
+        py_version = sys.version_info[0]
         server = "http://" + resource["endpoint"][idx % len(resource[
             "endpoint"])] + "/image/prediction"
         start = time.time()
@@ -93,7 +93,7 @@ def single_func(idx, resource):
 
 if __name__ == '__main__':
     multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9696"]
+    endpoint_list = ["127.0.0.1:9393"]
     #endpoint_list = endpoint_list + endpoint_list + endpoint_list
     result = multi_thread_runner.run(single_func, args.thread,
                                      {"endpoint": endpoint_list})
diff --git a/python/examples/imagenet/benchmark.sh b/python/examples/imagenet/benchmark.sh
index 618a62c063c0bc4955baf8516bc5bc93e4832394..84885908fa89d050b3ca71386fe2a21533ce0809 100644
--- a/python/examples/imagenet/benchmark.sh
+++ b/python/examples/imagenet/benchmark.sh
@@ -1,12 +1,28 @@
 rm profile_log
-for thread_num in 1 2 4 8
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
+
+sleep 5
+
+#warm up
+$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+
+for thread_num in 4 8 16
 do
-for batch_size in 1 2 4 8 16 32 64 128
+for batch_size in 1 4 16 64 256
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model ResNet50_vd_client_config/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
-    echo "batch size : $batch_size" >> profile_log
+    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+    echo "model name :" $1
+    echo "thread num :" $thread_num
+    echo "batch size :" $batch_size
+    echo "=================Done===================="
+    echo "model name :$1" >> profile_log
+    echo "batch size :$batch_size" >> profile_log
     $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
+    tail -n 8 profile >> profile_log
 done
 done
+
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
diff --git a/python/examples/fit_a_line/test_py_client.py b/python/examples/imdb/test_py_client.py
similarity index 72%
rename from python/examples/fit_a_line/test_py_client.py
rename to python/examples/imdb/test_py_client.py
index 76fee2804daf6e98d0cb12a562e8b69ffa97742b..3f811e16817df271e6ce8f4cef56f7ab47255d7c 100644
--- a/python/examples/fit_a_line/test_py_client.py
+++ b/python/examples/imdb/test_py_client.py
@@ -13,27 +13,23 @@
 # limitations under the License.
 from paddle_serving_client.pyclient import PyClient
 import numpy as np
-
+from paddle_serving_app.reader import IMDBDataset
 from line_profiler import LineProfiler
 
 client = PyClient()
 client.connect('localhost:8080')
 
-x = np.array(
-    [
-        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
-        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
-    ],
-    dtype='float')
-
 lp = LineProfiler()
 lp_wrapper = lp(client.predict)
 
+words = 'i am very sad | 0'
+imdb_dataset = IMDBDataset()
+imdb_dataset.load_resource('imdb.vocab')
+
 for i in range(1):
+    word_ids, label = imdb_dataset.get_words_and_label(words)
     fetch_map = lp_wrapper(
-        feed={"x": x}, fetch_with_type={"combine_op_output": "float"})
-    # fetch_map = client.predict(
-    # feed={"x": x}, fetch_with_type={"combine_op_output": "float"})
+        feed={"words": word_ids}, fetch=["combined_prediction"])
     print(fetch_map)
 
 #lp.print_stats()
diff --git a/python/examples/imdb/test_py_server.py b/python/examples/imdb/test_py_server.py
new file mode 100644
index 0000000000000000000000000000000000000000..d887956400411dec6ec8f806727e8f75f552119c
--- /dev/null
+++ b/python/examples/imdb/test_py_server.py
@@ -0,0 +1,69 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+from paddle_serving_server.pyserver import Op
+from paddle_serving_server.pyserver import Channel
+from paddle_serving_server.pyserver import PyServer
+import numpy as np
+import logging
+
+logging.basicConfig(
+    format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s',
+    datefmt='%Y-%m-%d %H:%M',
+    #level=logging.DEBUG)
+    level=logging.INFO)
+
+
+class CombineOp(Op):
+    def preprocess(self, input_data):
+        combined_prediction = 0
+        for op_name, channeldata in input_data.items():
+            data = channeldata.parse()
+            logging.info("{}: {}".format(op_name, data["prediction"]))
+            combined_prediction += data["prediction"]
+        data = {"combined_prediction": combined_prediction / 2}
+        return data
+
+
+read_op = Op(name="read", inputs=None)
+bow_op = Op(name="bow",
+            inputs=[read_op],
+            server_model="imdb_bow_model",
+            server_port="9393",
+            device="cpu",
+            client_config="imdb_bow_client_conf/serving_client_conf.prototxt",
+            server_name="127.0.0.1:9393",
+            fetch_names=["prediction"],
+            concurrency=1,
+            timeout=0.1,
+            retry=2)
+cnn_op = Op(name="cnn",
+            inputs=[read_op],
+            server_model="imdb_cnn_model",
+            server_port="9292",
+            device="cpu",
+            client_config="imdb_cnn_client_conf/serving_client_conf.prototxt",
+            server_name="127.0.0.1:9292",
+            fetch_names=["prediction"],
+            concurrency=1,
+            timeout=-1,
+            retry=1)
+combine_op = CombineOp(
+    name="combine", inputs=[bow_op, cnn_op], concurrency=1, timeout=-1, retry=1)
+
+pyserver = PyServer(profile=False, retry=1)
+pyserver.add_ops([read_op, bow_op, cnn_op, combine_op])
+pyserver.prepare_server(port=8080, worker_num=2)
+pyserver.run_server()
diff --git a/python/examples/lac/README.md b/python/examples/lac/README.md
index bc420186a09dfd0066c1abf0c0d95063e9cb0699..8d7adfb583f8e8e1fde0681a73f2bba65452fa87 100644
--- a/python/examples/lac/README.md
+++ b/python/examples/lac/README.md
@@ -2,28 +2,27 @@
 
 ([简体中文](./README_CN.md)|English)
 
-### Get model files and sample data
+### Get Model
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
 ```
 
-the package downloaded contains lac model config along with lac dictionary.
-
 #### Start RPC inference service
 
 ```
-python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
+python -m paddle_serving_server.serve --model lac_model/ --port 9292
 ```
 ### RPC Infer
 ```
-echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
+echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
 ```
 
-it will get the segmentation result
+It will get the segmentation result. 
 
 ### Start HTTP inference service
 ```
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
+python lac_web_service.py lac_model/ lac_workdir 9292
 ```
 ### HTTP Infer
 
diff --git a/python/examples/lac/README_CN.md b/python/examples/lac/README_CN.md
index 449f474ca291053eb6880166c52814c9d4180f36..2379aa8ed69c026c6afd94b8b791774882eaf567 100644
--- a/python/examples/lac/README_CN.md
+++ b/python/examples/lac/README_CN.md
@@ -2,28 +2,27 @@
 
 (简体中文|[English](./README.md))
 
-### 获取模型和字典文件
+### 获取模型
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
 ```
 
-下载包里包含了lac模型和lac模型预测需要的字典文件
-
 #### 开启RPC预测服务
 
 ```
-python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
+python -m paddle_serving_server.serve --model lac_model/ --port 9292
 ```
 ### 执行RPC预测
 ```
-echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
+echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
 ```
 
 我们就能得到分词结果
 
 ### 开启HTTP预测服务
 ```
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
+python lac_web_service.py lac_model/ lac_workdir 9292
 ```
 ### 执行HTTP预测
 
diff --git a/python/examples/lac/benchmark.py b/python/examples/lac/benchmark.py
index 53d0881ed74e5e19104a70fb93d6872141d27afd..64e935a608477d5841df1b64abf7b6eb35dd1a4b 100644
--- a/python/examples/lac/benchmark.py
+++ b/python/examples/lac/benchmark.py
@@ -16,7 +16,7 @@
 import sys
 import time
 import requests
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import benchmark_args
@@ -25,7 +25,7 @@ args = benchmark_args()
 
 
 def single_func(idx, resource):
-    reader = LACReader("lac_dict")
+    reader = LACReader()
     start = time.time()
     if args.request == "rpc":
         client = Client()
diff --git a/python/examples/lac/get_data.sh b/python/examples/lac/get_data.sh
deleted file mode 100644
index 29e6a6b2b3e995f78c37e15baf2f9a3b627ca9ef..0000000000000000000000000000000000000000
--- a/python/examples/lac/get_data.sh
+++ /dev/null
@@ -1,2 +0,0 @@
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-tar -zxvf lac_model_jieba_web.tar.gz
diff --git a/python/examples/lac/lac_client.py b/python/examples/lac/lac_client.py
index 9c485a923e4d42b72af41f7b9ad45c5702ca93a1..22f3c511dcd2540365623ef9428b60cfcb5e5a34 100644
--- a/python/examples/lac/lac_client.py
+++ b/python/examples/lac/lac_client.py
@@ -15,7 +15,7 @@
 # pylint: disable=doc-string-missing
 
 from paddle_serving_client import Client
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 import sys
 import os
 import io
@@ -24,7 +24,7 @@ client = Client()
 client.load_client_config(sys.argv[1])
 client.connect(["127.0.0.1:9292"])
 
-reader = LACReader(sys.argv[2])
+reader = LACReader()
 for line in sys.stdin:
     if len(line) <= 0:
         continue
@@ -32,4 +32,7 @@ for line in sys.stdin:
     if len(feed_data) <= 0:
         continue
     fetch_map = client.predict(feed={"words": feed_data}, fetch=["crf_decode"])
-    print(fetch_map)
+    begin = fetch_map['crf_decode.lod'][0]
+    end = fetch_map['crf_decode.lod'][1]
+    segs = reader.parse_result(line, fetch_map["crf_decode"][begin:end])
+    print("word_seg: " + "|".join(str(words) for words in segs))
diff --git a/python/examples/lac/lac_web_service.py b/python/examples/lac/lac_web_service.py
index 62a7148b230029bc781fa550597df25471a7fc8d..bed89f54b626c0cce55767f8edacc3dd33f0104c 100644
--- a/python/examples/lac/lac_web_service.py
+++ b/python/examples/lac/lac_web_service.py
@@ -14,12 +14,12 @@
 
 from paddle_serving_server.web_service import WebService
 import sys
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 
 
 class LACService(WebService):
     def load_reader(self):
-        self.reader = LACReader("lac_dict")
+        self.reader = LACReader()
 
     def preprocess(self, feed={}, fetch=[]):
         feed_batch = []
diff --git a/python/examples/mobilenet/README.md b/python/examples/mobilenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..496ebdbe2e244af8091cb28cdcdecf7627088ba3
--- /dev/null
+++ b/python/examples/mobilenet/README.md
@@ -0,0 +1,22 @@
+# Image Classification
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
+tar -xzvf mobilenet_v2_imagenet.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### Client Prediction
+
+```
+python mobilenet_tutorial.py
+```
diff --git a/python/examples/mobilenet/README_CN.md b/python/examples/mobilenet/README_CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..7c721b4bd161fbf7c400f1a73ddb7be69c449871
--- /dev/null
+++ b/python/examples/mobilenet/README_CN.md
@@ -0,0 +1,22 @@
+# 图像分类
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
+tar -xzvf mobilenet_v2_imagenet.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### 客户端预测
+
+```
+python mobilenet_tutorial.py
+```
diff --git a/python/examples/ocr/README.md b/python/examples/ocr/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..04c4fd3eaa304e55d980a2cf4fc34dda50f5009c
--- /dev/null
+++ b/python/examples/ocr/README.md
@@ -0,0 +1,21 @@
+# OCR 
+
+## Get Model
+```
+python -m paddle_serving_app.package --get_model ocr_rec
+tar -xzvf ocr_rec.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server.serve --model ocr_rec_model --port 9292
+```
+
+### Client Prediction
+
+```
+python test_ocr_rec_client.py
+```
diff --git a/python/examples/ocr/test_ocr_rec_client.py b/python/examples/ocr/test_ocr_rec_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..b61256d03202374ada5b0d50a075fef156eca2ea
--- /dev/null
+++ b/python/examples/ocr/test_ocr_rec_client.py
@@ -0,0 +1,31 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_client import Client
+from paddle_serving_app.reader import OCRReader
+import cv2
+
+client = Client()
+client.load_client_config("ocr_rec_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+
+image_file_list = ["./test_rec.jpg"]
+img = cv2.imread(image_file_list[0])
+ocr_reader = OCRReader()
+feed = {"image": ocr_reader.preprocess([img])}
+fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
+fetch_map = client.predict(feed=feed, fetch=fetch)
+rec_res = ocr_reader.postprocess(fetch_map)
+print(image_file_list[0])
+print(rec_res[0][0])
diff --git a/python/examples/ocr/test_rec.jpg b/python/examples/ocr/test_rec.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2c34cd33eac5766a072fde041fa6c9b1d612f1db
Binary files /dev/null and b/python/examples/ocr/test_rec.jpg differ
diff --git a/python/examples/ocr_detection/7.jpg b/python/examples/ocr_detection/7.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a9483bb74f66d88699b09545366c32a4fe108e54
Binary files /dev/null and b/python/examples/ocr_detection/7.jpg differ
diff --git a/python/examples/ocr_detection/text_det_client.py b/python/examples/ocr_detection/text_det_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..aaa1c5b1179fcbf1d010bb9f6335ef2886435a83
--- /dev/null
+++ b/python/examples/ocr_detection/text_det_client.py
@@ -0,0 +1,47 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, ResizeByFactor
+from paddle_serving_app.reader import Div, Normalize, Transpose
+from paddle_serving_app.reader import DBPostProcess, FilterBoxes
+
+client = Client()
+client.load_client_config("ocr_det_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9494"])
+
+read_image_file = File2Image()
+preprocess = Sequential([
+    ResizeByFactor(32, 960), Div(255),
+    Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
+        (2, 0, 1))
+])
+post_func = DBPostProcess({
+    "thresh": 0.3,
+    "box_thresh": 0.5,
+    "max_candidates": 1000,
+    "unclip_ratio": 1.5,
+    "min_size": 3
+})
+filter_func = FilterBoxes(10, 10)
+
+img = read_image_file(name)
+ori_h, ori_w, _ = img.shape
+img = preprocess(img)
+new_h, new_w, _ = img.shape
+ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
+outputs = client.predict(feed={"image": img}, fetch=["concat_1.tmp_0"])
+dt_boxes_list = post_func(outputs["concat_1.tmp_0"], [ratio_list])
+dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
diff --git a/python/examples/resnet_v2_50/README.md b/python/examples/resnet_v2_50/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fd86074c73177a06cd59ebb3bd0c28c7f22e95f2
--- /dev/null
+++ b/python/examples/resnet_v2_50/README.md
@@ -0,0 +1,22 @@
+# Image Classification
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### Client Prediction
+
+```
+python resnet50_v2_tutorial.py
+```
diff --git a/python/examples/resnet_v2_50/README_CN.md b/python/examples/resnet_v2_50/README_CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..bda2916eb43d55d718af1095c21869e00fb27093
--- /dev/null
+++ b/python/examples/resnet_v2_50/README_CN.md
@@ -0,0 +1,22 @@
+# 图像分类
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
+```
+
+### 客户端预测
+
+```
+python resnet50_v2_tutorial.py
+```
diff --git a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
index 8d916cbd8145cdc73424a05fdb2855412f4d4fe2..b249d2a6df85f87258f66c96aaa779eb2e299613 100644
--- a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
+++ b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
@@ -14,7 +14,7 @@
 
 from paddle_serving_client import Client
 from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
-from apddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
 
 client = Client()
 client.load_client_config(
@@ -28,5 +28,5 @@ seq = Sequential([
 
 image_file = "daisy.jpg"
 img = seq(image_file)
-fetch_map = client.predict(feed={"image": img}, fetch=["feature_map"])
-print(fetch_map["feature_map"].reshape(-1))
+fetch_map = client.predict(feed={"image": img}, fetch=["score"])
+print(fetch_map["score"].reshape(-1))
diff --git a/python/examples/senta/README.md b/python/examples/senta/README.md
index 88aac352110850a71ae0f9a28c1a98293f8e0ab9..8929a9312c17264800f299f77afb583221006068 100644
--- a/python/examples/senta/README.md
+++ b/python/examples/senta/README.md
@@ -1,22 +1,23 @@
-# Chinese sentence sentiment classification
+# Chinese Sentence Sentiment Classification
 ([简体中文](./README_CN.md)|English)
-## Get model files and sample data
-```
-sh get_data.sh
-```
-## Install preprocess module
 
+## Get Model
 ```
-pip install paddle_serving_app
+python -m paddle_serving_app.package --get_model senta_bilstm
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf senta_bilstm.tar.gz
+tar -xzvf lac.tar.gz
 ```
 
-## Start http service
+## Start HTTP Service
 ```
-python senta_web_service.py senta_bilstm_model/ workdir 9292
+python -m paddle_serving_server.serve --model lac_model --port 9300
+python senta_web_service.py
 ```
-In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). Set model path by ```lac_model_path``` and dictionary path by ```lac_dict_path```. 
-In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task. The LAC prediction service is deployed on the CPU, and the sentiment classification task is deployed on the GPU, which can be changed according to the actual situation.
+In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). 
+In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task.
+
 ## Client prediction
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
diff --git a/python/examples/senta/README_CN.md b/python/examples/senta/README_CN.md
index f5011334db768c5f0869c296769ead7cb38613d8..e5624dc975e6bc00de219f68cbf74dea7cac8360 100644
--- a/python/examples/senta/README_CN.md
+++ b/python/examples/senta/README_CN.md
@@ -1,22 +1,23 @@
 # 中文语句情感分类
 (简体中文|[English](./README.md))
-## 获取模型文件和样例数据
-```
-sh get_data.sh
-```
-## 安装数据预处理模块
+
+## 获取模型文件
 ```
-pip install paddle_serving_app
+python -m paddle_serving_app.package --get_model senta_bilstm
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
+tar -xzvf senta_bilstm.tar.gz
 ```
 
 ## 启动HTTP服务
 ```
-python senta_web_service.py senta_bilstm_model/ workdir 9292
+python -m paddle_serving_server.serve --model lac_model --port 9300
+python senta_web_service.py
 ```
-中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词，在脚本中通过```lac_model_path```参数配置LAC任务的模型文件路径,```lac_dict_path```参数配置LAC任务词典路径。
-示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分，LAC预测服务部署在CPU上，情感分类任务部署在GPU上,可以根据实际情况进行更改。
+中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词。
+示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分。
 
 ## 客户端预测
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
diff --git a/python/examples/senta/senta_web_service.py b/python/examples/senta/senta_web_service.py
index 0621ece74173596a1820f1b09258ecf5bb727f29..25c880ef8877aed0f3f9d394d1780855130f365b 100644
--- a/python/examples/senta/senta_web_service.py
+++ b/python/examples/senta/senta_web_service.py
@@ -1,3 +1,4 @@
+#encoding=utf-8
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -12,56 +13,28 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from paddle_serving_server_gpu.web_service import WebService
+from paddle_serving_server.web_service import WebService
 from paddle_serving_client import Client
 from paddle_serving_app.reader import LACReader, SentaReader
 import os
 import sys
-from multiprocessing import Process
 
+#senta_web_service.py
+from paddle_serving_server.web_service import WebService
+from paddle_serving_client import Client
+from paddle_serving_app.reader import LACReader, SentaReader
 
-class SentaService(WebService):
-    def set_config(
-            self,
-            lac_model_path,
-            lac_dict_path,
-            senta_dict_path, ):
-        self.lac_model_path = lac_model_path
-        self.lac_client_config_path = lac_model_path + "/serving_server_conf.prototxt"
-        self.lac_dict_path = lac_dict_path
-        self.senta_dict_path = senta_dict_path
-
-    def start_lac_service(self):
-        if not os.path.exists('./lac_serving'):
-            os.mkdir("./lac_serving")
-        os.chdir('./lac_serving')
-        self.lac_port = self.port + 100
-        r = os.popen(
-            "python -m paddle_serving_server.serve --model {} --port {} &".
-            format("../" + self.lac_model_path, self.lac_port))
-        os.chdir('..')
-
-    def init_lac_service(self):
-        ps = Process(target=self.start_lac_service())
-        ps.start()
-        self.init_lac_client()
-
-    def lac_predict(self, feed_data):
-        lac_result = self.lac_client.predict(
-            feed={"words": feed_data}, fetch=["crf_decode"])
-        return lac_result
-
-    def init_lac_client(self):
-        self.lac_client = Client()
-        self.lac_client.load_client_config(self.lac_client_config_path)
-        self.lac_client.connect(["127.0.0.1:{}".format(self.lac_port)])
 
-    def init_lac_reader(self):
+class SentaService(WebService):
+    #初始化lac模型预测服务
+    def init_lac_client(self, lac_port, lac_client_config):
         self.lac_reader = LACReader()
-
-    def init_senta_reader(self):
         self.senta_reader = SentaReader()
+        self.lac_client = Client()
+        self.lac_client.load_client_config(lac_client_config)
+        self.lac_client.connect(["127.0.0.1:{}".format(lac_port)])
 
+    #定义senta模型预测服务的预处理，调用顺序：lac reader->lac模型预测->预测结果后处理->senta reader
     def preprocess(self, feed=[], fetch=[]):
         feed_data = [{
             "words": self.lac_reader.process(x["words"])
@@ -80,15 +53,9 @@ class SentaService(WebService):
 
 
 senta_service = SentaService(name="senta")
-senta_service.set_config(
-    lac_model_path="./lac_model",
-    lac_dict_path="./lac_dict",
-    senta_dict_path="./vocab.txt")
-senta_service.load_model_config(sys.argv[1])
-senta_service.prepare_server(
-    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-senta_service.init_lac_reader()
-senta_service.init_senta_reader()
-senta_service.init_lac_service()
+senta_service.load_model_config("senta_bilstm_model")
+senta_service.prepare_server(workdir="workdir")
+senta_service.init_lac_client(
+    lac_port=9300, lac_client_config="lac_model/serving_server_conf.prototxt")
 senta_service.run_rpc_service()
 senta_service.run_web_service()
diff --git a/python/examples/unet_for_image_seg/README.md b/python/examples/unet_for_image_seg/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7936ad43cbc3b53719babdf6f91ea46e74a827da
--- /dev/null
+++ b/python/examples/unet_for_image_seg/README.md
@@ -0,0 +1,22 @@
+# Image Segmentation
+
+## Get Model
+
+```
+python -m paddle_serving_app.package --get_model unet
+tar -xzvf unet.tar.gz
+```
+
+## RPC Service
+
+### Start Service
+
+```
+python -m paddle_serving_server_gpu.serve --model unet_model --gpu_ids 0 --port 9494
+```
+
+### Client Prediction
+
+```
+python seg_client.py
+```
diff --git a/python/examples/unet_for_image_seg/README_CN.md b/python/examples/unet_for_image_seg/README_CN.md
new file mode 100644
index 0000000000000000000000000000000000000000..f4b91aaff5697ff8ea3901e0a8084152f6007ff4
--- /dev/null
+++ b/python/examples/unet_for_image_seg/README_CN.md
@@ -0,0 +1,22 @@
+# 图像分割
+
+## 获取模型
+
+```
+python -m paddle_serving_app.package --get_model unet
+tar -xzvf unet.tar.gz
+```
+
+## RPC 服务
+
+### 启动服务端
+
+```
+python -m paddle_serving_server_gpu.serve --model unet_model --gpu_ids 0 --port 9494
+```
+
+### 客户端预测
+
+```
+python seg_client.py
+```
diff --git a/python/examples/unet_for_image_seg/seg_client.py b/python/examples/unet_for_image_seg/seg_client.py
index 9e76b060955ec74492312c8896efaf3946a3f7ab..44f634b6090159ee1bd37c176eebb7d2b7f37065 100644
--- a/python/examples/unet_for_image_seg/seg_client.py
+++ b/python/examples/unet_for_image_seg/seg_client.py
@@ -27,7 +27,8 @@ preprocess = Sequential(
 
 postprocess = SegPostprocess(2)
 
-im = preprocess("N0060.jpg")
+filename = "N0060.jpg"
+im = preprocess(filename)
 fetch_map = client.predict(feed={"image": im}, fetch=["output"])
 fetch_map["filename"] = filename
 postprocess(fetch_map)
diff --git a/python/examples/util/show_profile.py b/python/examples/util/show_profile.py
index 9153d939338f0ee171af539b9f955d51802ad547..1581dda19bb0abefe6eb21592bda7fc97d8fb7cd 100644
--- a/python/examples/util/show_profile.py
+++ b/python/examples/util/show_profile.py
@@ -31,7 +31,7 @@ with open(profile_file) as f:
         if line[0] == "PROFILE":
             prase(line[2])
 
-print("thread num {}".format(thread_num))
+print("thread num :{}".format(thread_num))
 for name in time_dict:
-    print("{} cost {} s in each thread ".format(name, time_dict[name] / (
+    print("{} cost :{} s in each thread ".format(name, time_dict[name] / (
         1000000.0 * float(thread_num))))
diff --git a/python/paddle_serving_app/README.md b/python/paddle_serving_app/README.md
index a0fd35b7f02ce165f878238a757613c62d2fea26..cb48ae376086ec4021af617337e43934dd5e5f6e 100644
--- a/python/paddle_serving_app/README.md
+++ b/python/paddle_serving_app/README.md
@@ -12,7 +12,7 @@ pip install paddle_serving_app
 ## Get model list
 
 ```shell
-python -m paddle_serving_app.package --model_list
+python -m paddle_serving_app.package --list_model
 ```
 
 ## Download pre-training model
@@ -21,16 +21,16 @@ python -m paddle_serving_app.package --model_list
 python -m paddle_serving_app.package --get_model senta_bilstm
 ```
 
-11 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks.
+1 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks.
 The model files can be directly used for deployment, and the `--tutorial` argument can be added to obtain the deployment method.
 
 | Prediction task | Model name                                         |
 | ------------ | ------------------------------------------------ |
 | SentimentAnalysis | 'senta_bilstm', 'senta_bow', 'senta_cnn'         |
-| SemanticRepresentation | 'ernie_base'                                     |
+| SemanticRepresentation | 'ernie'                                     |
 | ChineseWordSegmentation     | 'lac'                                            |
-| ObjectDetection     | 'faster_rcnn', 'yolov3'                          |
-| ImageSegmentation     | 'unet', 'deeplabv3'                              |
+| ObjectDetection     | 'faster_rcnn'                         |
+| ImageSegmentation     | 'unet', 'deeplabv3','deeplabv3+cityscapes'      |
 | ImageClassification     | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' |
 
 ## Data preprocess API
@@ -38,7 +38,8 @@ The model files can be directly used for deployment, and the `--tutorial` argume
 paddle_serving_app provides a variety of data preprocessing methods for prediction tasks in the field of CV and NLP.
 
 - class ChineseBertReader 
-    
+  
+
 Preprocessing for Chinese semantic representation task.
 
   - `__init__(vocab_file, max_seq_len=20)`
@@ -54,7 +55,8 @@ Preprocessing for Chinese semantic representation task.
   [example](../examples/bert/bert_client.py)
 
 - class LACReader 
-    
+  
+
 Preprocessing for Chinese word segmentation task.
 
   - `__init__(dict_floder)`
@@ -65,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
     - words（st ）：Original text input.
     - crf_decode（np.array）：CRF code predicted by model.
 
-  [example](../examples/bert/lac_web_service.py)
+  [example](../examples/lac/lac_web_service.py)
 
 - class SentaReader
 
@@ -76,7 +78,7 @@ Preprocessing for Chinese word segmentation task.
 
   [example](../examples/senta/senta_web_service.py)
 
-- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes，[example](../examples/imagenet/image_rpc_client.py)
+- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes，[example](../examples/imagenet/resnet50_rpc_client.py)
 
 - class Sequentia
 
diff --git a/python/paddle_serving_app/README_CN.md b/python/paddle_serving_app/README_CN.md
index 2624c238e2dc212f1d10a251ee742891cae6a08c..181037c55a2aae578cb189525030ccba87146f6e 100644
--- a/python/paddle_serving_app/README_CN.md
+++ b/python/paddle_serving_app/README_CN.md
@@ -11,7 +11,7 @@ pip install paddle_serving_app
 ## 获取模型列表
 
 ```shell
-python -m paddle_serving_app.package --model_list
+python -m paddle_serving_app.package --list_model
 ```
 
 ## 下载预训练模型
@@ -20,15 +20,15 @@ python -m paddle_serving_app.package --model_list
 python -m paddle_serving_app.package --get_model senta_bilstm
 ```
 
-paddle_serving_app中内置了11中预训练模型，涵盖了6种预测任务。获取到的模型文件可以直接用于部署，添加`--tutorial`参数可以获取对应的部署方式。
+paddle_serving_app中内置了11种预训练模型，涵盖了6种预测任务。获取到的模型文件可以直接用于部署，添加`--tutorial`参数可以获取对应的部署方式。
 
 | 预测服务类型 | 模型名称                                         |
 | ------------ | ------------------------------------------------ |
 | 中文情感分析 | 'senta_bilstm', 'senta_bow', 'senta_cnn'         |
-| 语义理解     | 'ernie_base'                                     |
+| 语义理解     | 'ernie'                                          |
 | 中文分词     | 'lac'                                            |
-| 图像检测     | 'faster_rcnn', 'yolov3'                          |
-| 图像分割     | 'unet', 'deeplabv3'                              |
+| 图像检测     | 'faster_rcnn'                                    |
+| 图像分割     | 'unet', 'deeplabv3', 'deeplabv3+cityscapes'                              |
 | 图像分类     | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' |
 
 ## 数据预处理API
@@ -36,7 +36,7 @@ paddle_serving_app中内置了11中预训练模型，涵盖了6种预测任务
 paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的数据预处理方法。
 
 - class ChineseBertReader 
-    
+  
     中文语义理解模型预处理
 
   - `__init__(vocab_file, max_seq_len=20)`
@@ -71,7 +71,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的
 
   [参考示例](../examples/senta/senta_web_service.py)
 
-- 图像的预处理方法相比于上述的方法更加灵活多变，可以通过以下的多个类进行组合，[参考示例](../examples/imagenet/image_rpc_client.py)
+- 图像的预处理方法相比于上述的方法更加灵活多变，可以通过以下的多个类进行组合，[参考示例](../examples/imagenet/resnet50_rpc_client.py)
 
 - class Sequentia
 
diff --git a/python/paddle_serving_app/models/model_list.py b/python/paddle_serving_app/models/model_list.py
index 3d08f2fea95cc07e0cb1b57b005f72b95c6a4bcd..0c26a59f6f0537b9c910f21062938d4720d4f9f4 100644
--- a/python/paddle_serving_app/models/model_list.py
+++ b/python/paddle_serving_app/models/model_list.py
@@ -22,22 +22,26 @@ class ServingModels(object):
         self.model_dict = OrderedDict()
         self.model_dict[
             "SentimentAnalysis"] = ["senta_bilstm", "senta_bow", "senta_cnn"]
-        self.model_dict["SemanticRepresentation"] = ["ernie_base"]
+        self.model_dict["SemanticRepresentation"] = ["ernie"]
         self.model_dict["ChineseWordSegmentation"] = ["lac"]
-        self.model_dict["ObjectDetection"] = ["faster_rcnn", "yolov3"]
+        self.model_dict["ObjectDetection"] = ["faster_rcnn"]
         self.model_dict["ImageSegmentation"] = [
             "unet", "deeplabv3", "deeplabv3+cityscapes"
         ]
         self.model_dict["ImageClassification"] = [
             "resnet_v2_50_imagenet", "mobilenet_v2_imagenet"
         ]
+        self.model_dict["TextDetection"] = ["ocr_detection"]
+        self.model_dict["OCR"] = ["ocr_rec"]
 
         image_class_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/"
         image_seg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageSegmentation/"
         object_detection_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ObjectDetection/"
+        ocr_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/OCR/"
         senta_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/"
-        semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticRepresentation/"
+        semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/"
         wordseg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/"
+        ocr_det_url = "https://paddle-serving.bj.bcebos.com/ocr/"
 
         self.url_dict = {}
 
@@ -52,6 +56,8 @@ class ServingModels(object):
         pack_url(self.model_dict, "ObjectDetection", object_detection_url)
         pack_url(self.model_dict, "ImageSegmentation", image_seg_url)
         pack_url(self.model_dict, "ImageClassification", image_class_url)
+        pack_url(self.model_dict, "OCR", ocr_url)
+        pack_url(self.model_dict, "TextDetection", ocr_det_url)
 
     def get_model_list(self):
         return self.model_dict
diff --git a/python/paddle_serving_app/reader/__init__.py b/python/paddle_serving_app/reader/__init__.py
index 0eee878284e2028657a660acd38a21934bb5ccd7..e15a93084cbd437531129b48b51fe852ce17d19b 100644
--- a/python/paddle_serving_app/reader/__init__.py
+++ b/python/paddle_serving_app/reader/__init__.py
@@ -12,7 +12,11 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .chinese_bert_reader import ChineseBertReader
-from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize, CenterCrop, Resize, Transpose, Div, RGB2BGR, BGR2RGB, RCNNPostprocess, SegPostprocess, PadStride
+from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize
+from .image_reader import CenterCrop, Resize, Transpose, Div, RGB2BGR, BGR2RGB, ResizeByFactor
+from .image_reader import RCNNPostprocess, SegPostprocess, PadStride
+from .image_reader import DBPostProcess, FilterBoxes
 from .lac_reader import LACReader
 from .senta_reader import SentaReader
 from .imdb_reader import IMDBDataset
+from .ocr_reader import OCRReader
diff --git a/python/paddle_serving_app/reader/image_reader.py b/python/paddle_serving_app/reader/image_reader.py
index 7988bf447b5a0a075171d93d22dd1933aa8532b8..dc029bf0409179f1d392ce05d007565cd3007085 100644
--- a/python/paddle_serving_app/reader/image_reader.py
+++ b/python/paddle_serving_app/reader/image_reader.py
@@ -11,6 +11,9 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import cv2
 import os
 import numpy as np
@@ -18,6 +21,8 @@ import base64
 import sys
 from . import functional as F
 from PIL import Image, ImageDraw
+from shapely.geometry import Polygon
+import pyclipper
 import json
 
 _cv2_interpolation_to_str = {cv2.INTER_LINEAR: "cv2.INTER_LINEAR", None: "None"}
@@ -43,6 +48,196 @@ def generate_colormap(num_classes):
     return color_map
 
 
+class DBPostProcess(object):
+    """
+    The post process for Differentiable Binarization (DB).
+    """
+
+    def __init__(self, params):
+        self.thresh = params['thresh']
+        self.box_thresh = params['box_thresh']
+        self.max_candidates = params['max_candidates']
+        self.unclip_ratio = params['unclip_ratio']
+        self.min_size = 3
+
+    def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
+        '''
+        _bitmap: single map with shape (1, H, W),
+                whose values are binarized as {0, 1}
+        '''
+
+        bitmap = _bitmap
+        height, width = bitmap.shape
+
+        outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
+                                cv2.CHAIN_APPROX_SIMPLE)
+        if len(outs) == 3:
+            img, contours, _ = outs[0], outs[1], outs[2]
+        elif len(outs) == 2:
+            contours, _ = outs[0], outs[1]
+
+        num_contours = min(len(contours), self.max_candidates)
+        boxes = np.zeros((num_contours, 4, 2), dtype=np.int16)
+        scores = np.zeros((num_contours, ), dtype=np.float32)
+
+        for index in range(num_contours):
+            contour = contours[index]
+            points, sside = self.get_mini_boxes(contour)
+            if sside < self.min_size:
+                continue
+            points = np.array(points)
+            score = self.box_score_fast(pred, points.reshape(-1, 2))
+            if self.box_thresh > score:
+                continue
+
+            box = self.unclip(points).reshape(-1, 1, 2)
+            box, sside = self.get_mini_boxes(box)
+            if sside < self.min_size + 2:
+                continue
+            box = np.array(box)
+            if not isinstance(dest_width, int):
+                dest_width = dest_width.item()
+                dest_height = dest_height.item()
+
+            box[:, 0] = np.clip(
+                np.round(box[:, 0] / width * dest_width), 0, dest_width)
+            box[:, 1] = np.clip(
+                np.round(box[:, 1] / height * dest_height), 0, dest_height)
+            boxes[index, :, :] = box.astype(np.int16)
+            scores[index] = score
+        return boxes, scores
+
+    def unclip(self, box):
+        unclip_ratio = self.unclip_ratio
+        poly = Polygon(box)
+        distance = poly.area * unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded = np.array(offset.Execute(distance))
+        return expanded
+
+    def get_mini_boxes(self, contour):
+        bounding_box = cv2.minAreaRect(contour)
+        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+
+        index_1, index_2, index_3, index_4 = 0, 1, 2, 3
+        if points[1][1] > points[0][1]:
+            index_1 = 0
+            index_4 = 1
+        else:
+            index_1 = 1
+            index_4 = 0
+        if points[3][1] > points[2][1]:
+            index_2 = 2
+            index_3 = 3
+        else:
+            index_2 = 3
+            index_3 = 2
+
+        box = [
+            points[index_1], points[index_2], points[index_3], points[index_4]
+        ]
+        return box, min(bounding_box[1])
+
+    def box_score_fast(self, bitmap, _box):
+        h, w = bitmap.shape[:2]
+        box = _box.copy()
+        xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
+        xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
+        ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
+        ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+        box[:, 0] = box[:, 0] - xmin
+        box[:, 1] = box[:, 1] - ymin
+        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
+        return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
+
+    def __call__(self, pred, ratio_list):
+        pred = pred[:, 0, :, :]
+        segmentation = pred > self.thresh
+
+        boxes_batch = []
+        for batch_index in range(pred.shape[0]):
+            height, width = pred.shape[-2:]
+            tmp_boxes, tmp_scores = self.boxes_from_bitmap(
+                pred[batch_index], segmentation[batch_index], width, height)
+
+            boxes = []
+            for k in range(len(tmp_boxes)):
+                if tmp_scores[k] > self.box_thresh:
+                    boxes.append(tmp_boxes[k])
+            if len(boxes) > 0:
+                boxes = np.array(boxes)
+
+                ratio_h, ratio_w = ratio_list[batch_index]
+                boxes[:, :, 0] = boxes[:, :, 0] / ratio_w
+                boxes[:, :, 1] = boxes[:, :, 1] / ratio_h
+
+            boxes_batch.append(boxes)
+        return boxes_batch
+
+    def __repr__(self):
+        return self.__class__.__name__ + \
+            " thresh: {1}, box_thresh: {2}, max_candidates: {3}, unclip_ratio: {4}, min_size: {5}".format(
+                self.thresh, self.box_thresh, self.max_candidates, self.unclip_ratio, self.min_size)
+
+
+class FilterBoxes(object):
+    def __init__(self, width, height):
+        self.filter_width = width
+        self.filter_height = height
+
+    def order_points_clockwise(self, pts):
+        """
+        reference from: https://github.com/jrosebr1/imutils/blob/master/imutils/perspective.py
+        # sort the points based on their x-coordinates
+        """
+        xSorted = pts[np.argsort(pts[:, 0]), :]
+
+        # grab the left-most and right-most points from the sorted
+        # x-roodinate points
+        leftMost = xSorted[:2, :]
+        rightMost = xSorted[2:, :]
+
+        # now, sort the left-most coordinates according to their
+        # y-coordinates so we can grab the top-left and bottom-left
+        # points, respectively
+        leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
+        (tl, bl) = leftMost
+
+        rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
+        (tr, br) = rightMost
+
+        rect = np.array([tl, tr, br, bl], dtype="float32")
+        return rect
+
+    def clip_det_res(self, points, img_height, img_width):
+        for pno in range(4):
+            points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
+            points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1))
+        return points
+
+    def __call__(self, dt_boxes, image_shape):
+        img_height, img_width = image_shape[0:2]
+        dt_boxes_new = []
+        for box in dt_boxes:
+            box = self.order_points_clockwise(box)
+            box = self.clip_det_res(box, img_height, img_width)
+            rect_width = int(np.linalg.norm(box[0] - box[1]))
+            rect_height = int(np.linalg.norm(box[0] - box[3]))
+            if rect_width <= self.filter_width or \
+               rect_height <= self.filter_height:
+                continue
+            dt_boxes_new.append(box)
+        dt_boxes = np.array(dt_boxes_new)
+        return dt_boxes
+
+    def __repr__(self):
+        return self.__class__.__name__ + " filter_width: {1}, filter_height: {2}".format(
+            self.filter_width, self.filter_height)
+
+
 class SegPostprocess(object):
     def __init__(self, class_num):
         self.class_num = class_num
@@ -77,8 +272,7 @@ class SegPostprocess(object):
         result_png = score_png
 
         result_png = cv2.resize(
-            result_png,
-            ori_shape[:2],
+            result_png, (ori_shape[1], ori_shape[0]),
             fx=0,
             fy=0,
             interpolation=cv2.INTER_CUBIC)
@@ -296,7 +490,10 @@ class File2Image(object):
         pass
 
     def __call__(self, img_path):
-        fin = open(img_path)
+        if py_version == 2:
+            fin = open(img_path)
+        else:
+            fin = open(img_path, "rb")
         sample = fin.read()
         data = np.fromstring(sample, np.uint8)
         img = cv2.imdecode(data, cv2.IMREAD_COLOR)
@@ -470,6 +667,57 @@ class Resize(object):
             _cv2_interpolation_to_str[self.interpolation])
 
 
+class ResizeByFactor(object):
+    """Resize the input numpy array Image to a size multiple of factor which is usually required by a network
+
+    Args:
+        factor (int): Resize factor. make width and height multiple factor of the value of factor. Default is 32
+        max_side_len (int): max size of width and height. if width or height is larger than max_side_len, just resize the width or the height. Default is 2400
+    """
+
+    def __init__(self, factor=32, max_side_len=2400):
+        self.factor = factor
+        self.max_side_len = max_side_len
+
+    def __call__(self, img):
+        h, w, _ = img.shape
+        resize_w = w
+        resize_h = h
+        if max(resize_h, resize_w) > self.max_side_len:
+            if resize_h > resize_w:
+                ratio = float(self.max_side_len) / resize_h
+            else:
+                ratio = float(self.max_side_len) / resize_w
+        else:
+            ratio = 1.
+        resize_h = int(resize_h * ratio)
+        resize_w = int(resize_w * ratio)
+        if resize_h % self.factor == 0:
+            resize_h = resize_h
+        elif resize_h // self.factor <= 1:
+            resize_h = self.factor
+        else:
+            resize_h = (resize_h // 32 - 1) * 32
+        if resize_w % self.factor == 0:
+            resize_w = resize_w
+        elif resize_w // self.factor <= 1:
+            resize_w = self.factor
+        else:
+            resize_w = (resize_w // self.factor - 1) * self.factor
+        try:
+            if int(resize_w) <= 0 or int(resize_h) <= 0:
+                return None, (None, None)
+            im = cv2.resize(img, (int(resize_w), int(resize_h)))
+        except:
+            print(resize_w, resize_h)
+            sys.exit(0)
+        return im
+
+    def __repr__(self):
+        return self.__class__.__name__ + '(factor={0}, max_side_len={1})'.format(
+            self.factor, self.max_side_len)
+
+
 class PadStride(object):
     def __init__(self, stride):
         self.coarsest_stride = stride
diff --git a/python/paddle_serving_app/reader/lac_reader.py b/python/paddle_serving_app/reader/lac_reader.py
index 7e804ff371e2d90d79f7f663e83a854b1b0c9647..8f7d79a6a1e7ce8c4c86b689e2856eea6fa42158 100644
--- a/python/paddle_serving_app/reader/lac_reader.py
+++ b/python/paddle_serving_app/reader/lac_reader.py
@@ -111,6 +111,10 @@ class LACReader(object):
         return word_ids
 
     def parse_result(self, words, crf_decode):
+        try:
+            words = unicode(words, "utf-8")
+        except:
+            pass
         tags = [self.id2label_dict[str(x[0])] for x in crf_decode]
 
         sent_out = []
diff --git a/python/paddle_serving_app/reader/ocr_reader.py b/python/paddle_serving_app/reader/ocr_reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..e5dc88482bd5e0a7a26873fd5cb60c43dc5104c9
--- /dev/null
+++ b/python/paddle_serving_app/reader/ocr_reader.py
@@ -0,0 +1,203 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import copy
+import numpy as np
+import math
+import re
+import sys
+import argparse
+from paddle_serving_app.reader import Sequential, Resize, Transpose, Div, Normalize
+
+
+class CharacterOps(object):
+    """ Convert between text-label and text-index """
+
+    def __init__(self, config):
+        self.character_type = config['character_type']
+        self.loss_type = config['loss_type']
+        if self.character_type == "en":
+            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+            dict_character = list(self.character_str)
+        elif self.character_type == "ch":
+            character_dict_path = config['character_dict_path']
+            self.character_str = ""
+            with open(character_dict_path, "rb") as fin:
+                lines = fin.readlines()
+                for line in lines:
+                    line = line.decode('utf-8').strip("\n").strip("\r\n")
+                    self.character_str += line
+            dict_character = list(self.character_str)
+        elif self.character_type == "en_sensitive":
+            # same with ASTER setting (use 94 char).
+            self.character_str = string.printable[:-6]
+            dict_character = list(self.character_str)
+        else:
+            self.character_str = None
+        assert self.character_str is not None, \
+            "Nonsupport type of the character: {}".format(self.character_str)
+        self.beg_str = "sos"
+        self.end_str = "eos"
+        if self.loss_type == "attention":
+            dict_character = [self.beg_str, self.end_str] + dict_character
+        self.dict = {}
+        for i, char in enumerate(dict_character):
+            self.dict[char] = i
+        self.character = dict_character
+
+    def encode(self, text):
+        """convert text-label into text-index.
+        input:
+            text: text labels of each image. [batch_size]
+
+        output:
+            text: concatenated text index for CTCLoss.
+                    [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
+            length: length of each text. [batch_size]
+        """
+        if self.character_type == "en":
+            text = text.lower()
+
+        text_list = []
+        for char in text:
+            if char not in self.dict:
+                continue
+            text_list.append(self.dict[char])
+        text = np.array(text_list)
+        return text
+
+    def decode(self, text_index, is_remove_duplicate=False):
+        """ convert text-index into text-label. """
+        char_list = []
+        char_num = self.get_char_num()
+
+        if self.loss_type == "attention":
+            beg_idx = self.get_beg_end_flag_idx("beg")
+            end_idx = self.get_beg_end_flag_idx("end")
+            ignored_tokens = [beg_idx, end_idx]
+        else:
+            ignored_tokens = [char_num]
+
+        for idx in range(len(text_index)):
+            if text_index[idx] in ignored_tokens:
+                continue
+            if is_remove_duplicate:
+                if idx > 0 and text_index[idx - 1] == text_index[idx]:
+                    continue
+            char_list.append(self.character[text_index[idx]])
+        text = ''.join(char_list)
+        return text
+
+    def get_char_num(self):
+        return len(self.character)
+
+    def get_beg_end_flag_idx(self, beg_or_end):
+        if self.loss_type == "attention":
+            if beg_or_end == "beg":
+                idx = np.array(self.dict[self.beg_str])
+            elif beg_or_end == "end":
+                idx = np.array(self.dict[self.end_str])
+            else:
+                assert False, "Unsupport type %s in get_beg_end_flag_idx"\
+                    % beg_or_end
+            return idx
+        else:
+            err = "error in get_beg_end_flag_idx when using the loss %s"\
+                % (self.loss_type)
+            assert False, err
+
+
+class OCRReader(object):
+    def __init__(self):
+        args = self.parse_args()
+        image_shape = [int(v) for v in args.rec_image_shape.split(",")]
+        self.rec_image_shape = image_shape
+        self.character_type = args.rec_char_type
+        self.rec_batch_num = args.rec_batch_num
+        char_ops_params = {}
+        char_ops_params["character_type"] = args.rec_char_type
+        char_ops_params["character_dict_path"] = args.rec_char_dict_path
+        char_ops_params['loss_type'] = 'ctc'
+        self.char_ops = CharacterOps(char_ops_params)
+
+    def parse_args(self):
+        parser = argparse.ArgumentParser()
+        parser.add_argument("--rec_algorithm", type=str, default='CRNN')
+        parser.add_argument("--rec_model_dir", type=str)
+        parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
+        parser.add_argument("--rec_char_type", type=str, default='ch')
+        parser.add_argument("--rec_batch_num", type=int, default=1)
+        parser.add_argument(
+            "--rec_char_dict_path", type=str, default="./ppocr_keys_v1.txt")
+        return parser.parse_args()
+
+    def resize_norm_img(self, img, max_wh_ratio):
+        imgC, imgH, imgW = self.rec_image_shape
+        if self.character_type == "ch":
+            imgW = int(32 * max_wh_ratio)
+        h = img.shape[0]
+        w = img.shape[1]
+        ratio = w / float(h)
+        if math.ceil(imgH * ratio) > imgW:
+            resized_w = imgW
+        else:
+            resized_w = int(math.ceil(imgH * ratio))
+
+        seq = Sequential([
+            Resize(imgH, resized_w), Transpose((2, 0, 1)), Div(255),
+            Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5], True)
+        ])
+        resized_image = seq(img)
+        padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
+        padding_im[:, :, 0:resized_w] = resized_image
+
+        return padding_im
+
+    def preprocess(self, img_list):
+        img_num = len(img_list)
+        norm_img_batch = []
+        max_wh_ratio = 0
+        for ino in range(img_num):
+            h, w = img_list[ino].shape[0:2]
+            wh_ratio = w * 1.0 / h
+            max_wh_ratio = max(max_wh_ratio, wh_ratio)
+        for ino in range(img_num):
+            norm_img = self.resize_norm_img(img_list[ino], max_wh_ratio)
+            norm_img = norm_img[np.newaxis, :]
+            norm_img_batch.append(norm_img)
+        norm_img_batch = np.concatenate(norm_img_batch)
+        norm_img_batch = norm_img_batch.copy()
+
+        return norm_img_batch[0]
+
+    def postprocess(self, outputs):
+        rec_res = []
+        rec_idx_lod = outputs["ctc_greedy_decoder_0.tmp_0.lod"]
+        predict_lod = outputs["softmax_0.tmp_0.lod"]
+        rec_idx_batch = outputs["ctc_greedy_decoder_0.tmp_0"]
+        for rno in range(len(rec_idx_lod) - 1):
+            beg = rec_idx_lod[rno]
+            end = rec_idx_lod[rno + 1]
+            rec_idx_tmp = rec_idx_batch[beg:end, 0]
+            preds_text = self.char_ops.decode(rec_idx_tmp)
+            beg = predict_lod[rno]
+            end = predict_lod[rno + 1]
+            probs = outputs["softmax_0.tmp_0"][beg:end, :]
+            ind = np.argmax(probs, axis=1)
+            blank = probs.shape[1]
+            valid_ind = np.where(ind != (blank - 1))[0]
+            score = np.mean(probs[valid_ind, ind[valid_ind]])
+            rec_res.append([preds_text, score])
+        return rec_res
diff --git a/python/paddle_serving_client/__init__.py b/python/paddle_serving_client/__init__.py
index e3302c14239c8bfc37a6bafb39b112cfed5230fd..9e32926732ef1b396473dab2a748f24f63e19e7a 100644
--- a/python/paddle_serving_client/__init__.py
+++ b/python/paddle_serving_client/__init__.py
@@ -21,7 +21,12 @@ import google.protobuf.text_format
 import numpy as np
 import time
 import sys
-from .serving_client import PredictorRes
+
+import grpc
+from .proto import multi_lang_general_model_service_pb2
+sys.path.append(
+    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
+from .proto import multi_lang_general_model_service_pb2_grpc
 
 int_type = 0
 float_type = 1
@@ -61,13 +66,18 @@ class SDKConfig(object):
         self.tag_list = []
         self.cluster_list = []
         self.variant_weight_list = []
+        self.rpc_timeout_ms = 20000
+        self.load_balance_strategy = "la"
 
     def add_server_variant(self, tag, cluster, variant_weight):
         self.tag_list.append(tag)
         self.cluster_list.append(cluster)
         self.variant_weight_list.append(variant_weight)
 
-    def gen_desc(self):
+    def set_load_banlance_strategy(self, strategy):
+        self.load_balance_strategy = strategy
+
+    def gen_desc(self, rpc_timeout_ms):
         predictor_desc = sdk.Predictor()
         predictor_desc.name = "general_model"
         predictor_desc.service_name = \
@@ -86,7 +96,7 @@ class SDKConfig(object):
         self.sdk_desc.predictors.extend([predictor_desc])
         self.sdk_desc.default_variant_conf.tag = "default"
         self.sdk_desc.default_variant_conf.connection_conf.connect_timeout_ms = 2000
-        self.sdk_desc.default_variant_conf.connection_conf.rpc_timeout_ms = 20000
+        self.sdk_desc.default_variant_conf.connection_conf.rpc_timeout_ms = rpc_timeout_ms
         self.sdk_desc.default_variant_conf.connection_conf.connect_retry_count = 2
         self.sdk_desc.default_variant_conf.connection_conf.max_connection_per_host = 100
         self.sdk_desc.default_variant_conf.connection_conf.hedge_request_timeout_ms = -1
@@ -119,6 +129,9 @@ class Client(object):
         self.profile_ = _Profiler()
         self.all_numpy_input = True
         self.has_numpy_input = False
+        self.rpc_timeout_ms = 20000
+        from .serving_client import PredictorRes
+        self.predictorres_constructor = PredictorRes
 
     def load_client_config(self, path):
         from .serving_client import PredictorClient
@@ -171,13 +184,19 @@ class Client(object):
         self.predictor_sdk_.add_server_variant(tag, cluster,
                                                str(variant_weight))
 
+    def set_rpc_timeout_ms(self, rpc_timeout):
+        if not isinstance(rpc_timeout, int):
+            raise ValueError("rpc_timeout must be int type.")
+        else:
+            self.rpc_timeout_ms = rpc_timeout
+
     def connect(self, endpoints=None):
         # check whether current endpoint is available
         # init from client config
         # create predictor here
         if endpoints is None:
             if self.predictor_sdk_ is None:
-                raise SystemExit(
+                raise ValueError(
                     "You must set the endpoints parameter or use add_variant function to create a variant."
                 )
         else:
@@ -188,7 +207,7 @@ class Client(object):
                 print(
                     "parameter endpoints({}) will not take effect, because you use the add_variant function.".
                     format(endpoints))
-        sdk_desc = self.predictor_sdk_.gen_desc()
+        sdk_desc = self.predictor_sdk_.gen_desc(self.rpc_timeout_ms)
         self.client_handle_.create_predictor_by_desc(sdk_desc.SerializeToString(
         ))
 
@@ -203,7 +222,7 @@ class Client(object):
             return
         if isinstance(feed[key],
                       list) and len(feed[key]) != self.feed_tensor_len[key]:
-            raise SystemExit("The shape of feed tensor {} not match.".format(
+            raise ValueError("The shape of feed tensor {} not match.".format(
                 key))
         if type(feed[key]).__module__ == np.__name__ and np.size(feed[
                 key]) != self.feed_tensor_len[key]:
@@ -292,7 +311,7 @@ class Client(object):
         self.profile_.record('py_prepro_1')
         self.profile_.record('py_client_infer_0')
 
-        result_batch_handle = PredictorRes()
+        result_batch_handle = self.predictorres_constructor()
         if self.all_numpy_input:
             res = self.client_handle_.numpy_predict(
                 float_slot_batch, float_feed_names, float_shape, int_slot_batch,
@@ -304,7 +323,7 @@ class Client(object):
                 int_feed_names, int_shape, fetch_names, result_batch_handle,
                 self.pid)
         else:
-            raise SystemExit(
+            raise ValueError(
                 "Please make sure the inputs are all in list type or all in numpy.array type"
             )
 
@@ -360,3 +379,172 @@ class Client(object):
     def release(self):
         self.client_handle_.destroy_predictor()
         self.client_handle_ = None
+
+
+class MultiLangClient(object):
+    def __init__(self):
+        self.channel_ = None
+
+    def load_client_config(self, path):
+        if not isinstance(path, str):
+            raise Exception("GClient only supports multi-model temporarily")
+        self._parse_model_config(path)
+
+    def connect(self, endpoint):
+        self.channel_ = grpc.insecure_channel(endpoint[0])  #TODO
+        self.stub_ = multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelServiceStub(
+            self.channel_)
+
+    def _flatten_list(self, nested_list):
+        for item in nested_list:
+            if isinstance(item, (list, tuple)):
+                for sub_item in self._flatten_list(item):
+                    yield sub_item
+            else:
+                yield item
+
+    def _parse_model_config(self, model_config_path):
+        model_conf = m_config.GeneralModelConfig()
+        f = open(model_config_path, 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
+        self.feed_types_ = {}
+        self.feed_shapes_ = {}
+        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
+        self.fetch_types_ = {}
+        self.lod_tensor_set_ = set()
+        for i, var in enumerate(model_conf.feed_var):
+            self.feed_types_[var.alias_name] = var.feed_type
+            self.feed_shapes_[var.alias_name] = var.shape
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+            else:
+                counter = 1
+                for dim in self.feed_shapes_[var.alias_name]:
+                    counter *= dim
+        for i, var in enumerate(model_conf.fetch_var):
+            self.fetch_types_[var.alias_name] = var.fetch_type
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+
+    def _pack_feed_data(self, feed, fetch, is_python):
+        req = multi_lang_general_model_service_pb2.Request()
+        req.fetch_var_names.extend(fetch)
+        req.feed_var_names.extend(feed.keys())
+        req.is_python = is_python
+        feed_batch = None
+        if isinstance(feed, dict):
+            feed_batch = [feed]
+        elif isinstance(feed, list):
+            feed_batch = feed
+        else:
+            raise Exception("{} not support".format(type(feed)))
+        init_feed_names = False
+        for feed_data in feed_batch:
+            inst = multi_lang_general_model_service_pb2.FeedInst()
+            for name in req.feed_var_names:
+                tensor = multi_lang_general_model_service_pb2.Tensor()
+                var = feed_data[name]
+                v_type = self.feed_types_[name]
+                if is_python:
+                    data = None
+                    if isinstance(var, list):
+                        if v_type == 0:  # int64
+                            data = np.array(var, dtype="int64")
+                        elif v_type == 1:  # float32
+                            data = np.array(var, dtype="float32")
+                        else:
+                            raise Exception("error type.")
+                    else:
+                        data = var
+                        if var.dtype == "float64":
+                            data = data.astype("float32")
+                    tensor.data = data.tobytes()
+                else:
+                    if v_type == 0:  # int64
+                        if isinstance(var, np.ndarray):
+                            tensor.int64_data.extend(var.reshape(-1).tolist())
+                        else:
+                            tensor.int64_data.extend(self._flatten_list(var))
+                    elif v_type == 1:  # float32
+                        if isinstance(var, np.ndarray):
+                            tensor.float_data.extend(var.reshape(-1).tolist())
+                        else:
+                            tensor.float_data.extend(self._flatten_list(var))
+                    else:
+                        raise Exception("error type.")
+                if isinstance(var, np.ndarray):
+                    tensor.shape.extend(list(var.shape))
+                else:
+                    tensor.shape.extend(self.feed_shapes_[name])
+                inst.tensor_array.append(tensor)
+            req.insts.append(inst)
+        return req
+
+    def _unpack_resp(self, resp, fetch, is_python, need_variant_tag):
+        result_map = {}
+        inst = resp.outputs[0].insts[0]
+        tag = resp.tag
+        for i, name in enumerate(fetch):
+            var = inst.tensor_array[i]
+            v_type = self.fetch_types_[name]
+            if is_python:
+                if v_type == 0:  # int64
+                    result_map[name] = np.frombuffer(var.data, dtype="int64")
+                elif v_type == 1:  # float32
+                    result_map[name] = np.frombuffer(var.data, dtype="float32")
+                else:
+                    raise Exception("error type.")
+            else:
+                if v_type == 0:  # int64
+                    result_map[name] = np.array(
+                        list(var.int64_data), dtype="int64")
+                elif v_type == 1:  # float32
+                    result_map[name] = np.array(
+                        list(var.float_data), dtype="float32")
+                else:
+                    raise Exception("error type.")
+            result_map[name].shape = list(var.shape)
+            if name in self.lod_tensor_set_:
+                result_map["{}.lod".format(name)] = np.array(list(var.lod))
+        return result_map if not need_variant_tag else [result_map, tag]
+
+    def _done_callback_func(self, fetch, is_python, need_variant_tag):
+        def unpack_resp(resp):
+            return self._unpack_resp(resp, fetch, is_python, need_variant_tag)
+
+        return unpack_resp
+
+    def predict(self,
+                feed,
+                fetch,
+                need_variant_tag=False,
+                asyn=False,
+                is_python=True):
+        req = self._pack_feed_data(feed, fetch, is_python=is_python)
+        if not asyn:
+            resp = self.stub_.inference(req)
+            return self._unpack_resp(
+                resp,
+                fetch,
+                is_python=is_python,
+                need_variant_tag=need_variant_tag)
+        else:
+            call_future = self.stub_.inference.future(req)
+            return MultiLangPredictFuture(
+                call_future,
+                self._done_callback_func(
+                    fetch,
+                    is_python=is_python,
+                    need_variant_tag=need_variant_tag))
+
+
+class MultiLangPredictFuture(object):
+    def __init__(self, call_future, callback_func):
+        self.call_future_ = call_future
+        self.callback_func_ = callback_func
+
+    def result(self):
+        resp = self.call_future_.result()
+        return self.callback_func_(resp)
diff --git a/python/paddle_serving_client/pyclient.py b/python/paddle_serving_client/pyclient.py
index 29df85f045210d49703cc07c720a66f2b81697c0..1eb3e562c1821d9db97d96670631c13f7caaff9c 100644
--- a/python/paddle_serving_client/pyclient.py
+++ b/python/paddle_serving_client/pyclient.py
@@ -13,8 +13,8 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
 import grpc
-import general_python_service_pb2
-import general_python_service_pb2_grpc
+from .proto import general_python_service_pb2
+from .proto import general_python_service_pb2_grpc
 import numpy as np
 
 
@@ -30,27 +30,33 @@ class PyClient(object):
     def _pack_data_for_infer(self, feed_data):
         req = general_python_service_pb2.Request()
         for name, data in feed_data.items():
-            if not isinstance(data, np.ndarray):
-                raise TypeError(
-                    "only numpy array type is supported temporarily.")
-            data2bytes = np.ndarray.tobytes(data)
+            if isinstance(data, list):
+                data = np.array(data)
+            elif not isinstance(data, np.ndarray):
+                raise TypeError("only list and numpy array type is supported.")
             req.feed_var_names.append(name)
-            req.feed_insts.append(data2bytes)
+            req.feed_insts.append(data.tobytes())
+            req.shape.append(np.array(data.shape, dtype="int32").tobytes())
+            req.type.append(str(data.dtype))
         return req
 
-    def predict(self, feed, fetch_with_type):
+    def predict(self, feed, fetch):
         if not isinstance(feed, dict):
             raise TypeError(
                 "feed must be dict type with format: {name: value}.")
-        if not isinstance(fetch_with_type, dict):
+        if not isinstance(fetch, list):
             raise TypeError(
-                "fetch_with_type must be dict type with format: {name : type}.")
+                "fetch_with_type must be list type with format: [name].")
         req = self._pack_data_for_infer(feed)
         resp = self._stub.inference(req)
-        fetch_map = {}
+        if resp.ecode != 0:
+            return {"ecode": resp.ecode, "error_info": resp.error_info}
+        fetch_map = {"ecode": resp.ecode}
         for idx, name in enumerate(resp.fetch_var_names):
-            if name not in fetch_with_type:
+            if name not in fetch:
                 continue
             fetch_map[name] = np.frombuffer(
-                resp.fetch_insts[idx], dtype=fetch_with_type[name])
+                resp.fetch_insts[idx], dtype=resp.type[idx])
+            fetch_map[name].shape = np.frombuffer(
+                resp.shape[idx], dtype="int32")
         return fetch_map
diff --git a/python/paddle_serving_client/utils/__init__.py b/python/paddle_serving_client/utils/__init__.py
index 381da6bf9bade2bb0627f4c07851012360905de5..53f40726fbf21a0607b47bb29a20aa6ff50b6221 100644
--- a/python/paddle_serving_client/utils/__init__.py
+++ b/python/paddle_serving_client/utils/__init__.py
@@ -17,6 +17,7 @@ import sys
 import subprocess
 import argparse
 from multiprocessing import Pool
+import numpy as np
 
 
 def benchmark_args():
@@ -35,6 +36,17 @@ def benchmark_args():
     return parser.parse_args()
 
 
+def show_latency(latency_list):
+    latency_array = np.array(latency_list)
+    info = "latency:\n"
+    info += "mean :{} ms\n".format(np.mean(latency_array))
+    info += "median :{} ms\n".format(np.median(latency_array))
+    info += "80 percent :{} ms\n".format(np.percentile(latency_array, 80))
+    info += "90 percent :{} ms\n".format(np.percentile(latency_array, 90))
+    info += "99 percent :{} ms\n".format(np.percentile(latency_array, 99))
+    sys.stderr.write(info)
+
+
 class MultiThreadRunner(object):
     def __init__(self):
         pass
diff --git a/python/paddle_serving_server/__init__.py b/python/paddle_serving_server/__init__.py
index 3cb96a8f04922362fdb4b4c497f7679355e3879f..3a5c07011ace961fdfb61ebf3217ab1aab375e82 100644
--- a/python/paddle_serving_server/__init__.py
+++ b/python/paddle_serving_server/__init__.py
@@ -23,6 +23,17 @@ import paddle_serving_server as paddle_serving_server
 from .version import serving_server_version
 from contextlib import closing
 import collections
+import fcntl
+
+import numpy as np
+import grpc
+from .proto import multi_lang_general_model_service_pb2
+import sys
+sys.path.append(
+    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
+from .proto import multi_lang_general_model_service_pb2_grpc
+from multiprocessing import Pool, Process
+from concurrent import futures
 
 
 class OpMaker(object):
@@ -322,6 +333,10 @@ class Server(object):
         bin_url = "https://paddle-serving.bj.bcebos.com/bin/" + tar_name
         self.server_path = os.path.join(self.module_path, floder_name)
 
+        #acquire lock
+        version_file = open("{}/version.py".format(self.module_path), "r")
+        fcntl.flock(version_file, fcntl.LOCK_EX)
+
         if not os.path.exists(self.server_path):
             print('Frist time run, downloading PaddleServing components ...')
             r = os.system('wget ' + bin_url + ' --no-check-certificate')
@@ -345,6 +360,8 @@ class Server(object):
                         foemat(self.module_path))
                 finally:
                     os.remove(tar_name)
+        #release lock
+        version_file.close()
         os.chdir(self.cur_path)
         self.bin_path = self.server_path + "/serving"
 
@@ -421,3 +438,158 @@ class Server(object):
         print("Going to Run Command")
         print(command)
         os.system(command)
+
+
+class MultiLangServerService(
+        multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelService):
+    def __init__(self, model_config_path, endpoints):
+        from paddle_serving_client import Client
+        self._parse_model_config(model_config_path)
+        self.bclient_ = Client()
+        self.bclient_.load_client_config(
+            "{}/serving_server_conf.prototxt".format(model_config_path))
+        self.bclient_.connect(endpoints)
+
+    def _parse_model_config(self, model_config_path):
+        model_conf = m_config.GeneralModelConfig()
+        f = open("{}/serving_server_conf.prototxt".format(model_config_path),
+                 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
+        self.feed_types_ = {}
+        self.feed_shapes_ = {}
+        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
+        self.fetch_types_ = {}
+        self.lod_tensor_set_ = set()
+        for i, var in enumerate(model_conf.feed_var):
+            self.feed_types_[var.alias_name] = var.feed_type
+            self.feed_shapes_[var.alias_name] = var.shape
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+        for i, var in enumerate(model_conf.fetch_var):
+            self.fetch_types_[var.alias_name] = var.fetch_type
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+
+    def _flatten_list(self, nested_list):
+        for item in nested_list:
+            if isinstance(item, (list, tuple)):
+                for sub_item in self._flatten_list(item):
+                    yield sub_item
+            else:
+                yield item
+
+    def _unpack_request(self, request):
+        feed_names = list(request.feed_var_names)
+        fetch_names = list(request.fetch_var_names)
+        is_python = request.is_python
+        feed_batch = []
+        for feed_inst in request.insts:
+            feed_dict = {}
+            for idx, name in enumerate(feed_names):
+                var = feed_inst.tensor_array[idx]
+                v_type = self.feed_types_[name]
+                data = None
+                if is_python:
+                    if v_type == 0:
+                        data = np.frombuffer(var.data, dtype="int64")
+                    elif v_type == 1:
+                        data = np.frombuffer(var.data, dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                else:
+                    if v_type == 0:  # int64
+                        data = np.array(list(var.int64_data), dtype="int64")
+                    elif v_type == 1:  # float32
+                        data = np.array(list(var.float_data), dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                data.shape = list(feed_inst.tensor_array[idx].shape)
+                feed_dict[name] = data
+            feed_batch.append(feed_dict)
+        return feed_batch, fetch_names, is_python
+
+    def _pack_resp_package(self, result, fetch_names, is_python, tag):
+        resp = multi_lang_general_model_service_pb2.Response()
+        # Only one model is supported temporarily
+        model_output = multi_lang_general_model_service_pb2.ModelOutput()
+        inst = multi_lang_general_model_service_pb2.FetchInst()
+        for idx, name in enumerate(fetch_names):
+            tensor = multi_lang_general_model_service_pb2.Tensor()
+            v_type = self.fetch_types_[name]
+            if is_python:
+                tensor.data = result[name].tobytes()
+            else:
+                if v_type == 0:  # int64
+                    tensor.int64_data.extend(result[name].reshape(-1).tolist())
+                elif v_type == 1:  # float32
+                    tensor.float_data.extend(result[name].reshape(-1).tolist())
+                else:
+                    raise Exception("error type.")
+            tensor.shape.extend(list(result[name].shape))
+            if name in self.lod_tensor_set_:
+                tensor.lod.extend(result["{}.lod".format(name)].tolist())
+            inst.tensor_array.append(tensor)
+        model_output.insts.append(inst)
+        resp.outputs.append(model_output)
+        resp.tag = tag
+        return resp
+
+    def inference(self, request, context):
+        feed_dict, fetch_names, is_python = self._unpack_request(request)
+        data, tag = self.bclient_.predict(
+            feed=feed_dict, fetch=fetch_names, need_variant_tag=True)
+        return self._pack_resp_package(data, fetch_names, is_python, tag)
+
+
+class MultiLangServer(object):
+    def __init__(self, worker_num=2):
+        self.bserver_ = Server()
+        self.worker_num_ = worker_num
+
+    def set_op_sequence(self, op_seq):
+        self.bserver_.set_op_sequence(op_seq)
+
+    def load_model_config(self, model_config_path):
+        if not isinstance(model_config_path, str):
+            raise Exception(
+                "MultiLangServer only supports multi-model temporarily")
+        self.bserver_.load_model_config(model_config_path)
+        self.model_config_path_ = model_config_path
+
+    def prepare_server(self, workdir=None, port=9292, device="cpu"):
+        default_port = 12000
+        self.port_list_ = []
+        for i in range(1000):
+            if default_port + i != port and self._port_is_available(default_port
+                                                                    + i):
+                self.port_list_.append(default_port + i)
+                break
+        self.bserver_.prepare_server(
+            workdir=workdir, port=self.port_list_[0], device=device)
+        self.gport_ = port
+
+    def _launch_brpc_service(self, bserver):
+        bserver.run_server()
+
+    def _port_is_available(self, port):
+        with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
+            sock.settimeout(2)
+            result = sock.connect_ex(('0.0.0.0', port))
+        return result != 0
+
+    def run_server(self):
+        p_bserver = Process(
+            target=self._launch_brpc_service, args=(self.bserver_, ))
+        p_bserver.start()
+        server = grpc.server(
+            futures.ThreadPoolExecutor(max_workers=self.worker_num_))
+        multi_lang_general_model_service_pb2_grpc.add_MultiLangGeneralModelServiceServicer_to_server(
+            MultiLangServerService(self.model_config_path_,
+                                   ["0.0.0.0:{}".format(self.port_list_[0])]),
+            server)
+        server.add_insecure_port('[::]:{}'.format(self.gport_))
+        server.start()
+        p_bserver.join()
+        server.wait_for_termination()
diff --git a/python/paddle_serving_server/general_python_service.proto b/python/paddle_serving_server/general_python_service.proto
deleted file mode 100644
index 7f3af66df8d011b9a0a4fbcd9fb14a704f0c4bb2..0000000000000000000000000000000000000000
--- a/python/paddle_serving_server/general_python_service.proto
+++ /dev/null
@@ -1,31 +0,0 @@
-// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-//
-// Licensed under the Apache License, Version 2.0 (the "License");
-// you may not use this file except in compliance with the License.
-// You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-syntax = "proto2";
-
-service GeneralPythonService {
-  rpc inference(Request) returns (Response) {}
-}
-
-message Request {
-  repeated bytes feed_insts = 1;
-  repeated string feed_var_names = 2;
-}
-
-message Response {
-  repeated bytes fetch_insts = 1;
-  repeated string fetch_var_names = 2;
-  required int32 is_error = 3;
-  optional string error_info = 4;
-}
diff --git a/python/paddle_serving_server/monitor.py b/python/paddle_serving_server/monitor.py
index 3f1ff6436917b8ae7ff4ea06fcae1f55bd65e887..84146039c40794436030a8c5c6ba9d18ccbfda06 100644
--- a/python/paddle_serving_server/monitor.py
+++ b/python/paddle_serving_server/monitor.py
@@ -20,7 +20,7 @@ Usage:
 import os
 import time
 import argparse
-import commands
+import subprocess
 import datetime
 import shutil
 import tarfile
@@ -209,7 +209,7 @@ class HadoopMonitor(Monitor):
         remote_filepath = os.path.join(path, filename)
         cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath)
         _LOGGER.debug('check cmd: {}'.format(cmd))
-        [status, output] = commands.getstatusoutput(cmd)
+        [status, output] = subprocess.getstatusoutput(cmd)
         _LOGGER.debug('resp: {}'.format(output))
         if status == 0:
             [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split()
diff --git a/python/paddle_serving_server/pyserver.py b/python/paddle_serving_server/pyserver.py
index 420df4ef5079cd63ebc411fd157b1d5cf14b2a21..216a2db140aa19efabc1713db27228df8ef58521 100644
--- a/python/paddle_serving_server/pyserver.py
+++ b/python/paddle_serving_server/pyserver.py
@@ -18,17 +18,20 @@ import Queue
 import os
 import sys
 import paddle_serving_server
-from paddle_serving_client import Client
+from paddle_serving_client import MultiLangClient as Client
 from concurrent import futures
 import numpy as np
 import grpc
-import general_python_service_pb2
-import general_python_service_pb2_grpc
-import python_service_channel_pb2
+from .proto import general_model_config_pb2 as m_config
+from .proto import general_python_service_pb2 as pyservice_pb2
+from .proto import pyserving_channel_pb2 as channel_pb2
+from .proto import general_python_service_pb2_grpc
 import logging
 import random
 import time
 import func_timeout
+import enum
+import collections
 
 
 class _TimeProfiler(object):
@@ -71,6 +74,75 @@ class _TimeProfiler(object):
 _profiler = _TimeProfiler()
 
 
+class ChannelDataEcode(enum.Enum):
+    OK = 0
+    TIMEOUT = 1
+    NOT_IMPLEMENTED = 2
+    TYPE_ERROR = 3
+    UNKNOW = 4
+
+
+class ChannelDataType(enum.Enum):
+    CHANNEL_PBDATA = 0
+    CHANNEL_FUTURE = 1
+
+
+class ChannelData(object):
+    def __init__(self,
+                 future=None,
+                 pbdata=None,
+                 data_id=None,
+                 callback_func=None,
+                 ecode=None,
+                 error_info=None):
+        '''
+        There are several ways to use it:
+        
+        1. ChannelData(future, pbdata[, callback_func])
+        2. ChannelData(future, data_id[, callback_func])
+        3. ChannelData(pbdata)
+        4. ChannelData(ecode, error_info, data_id)
+        '''
+        if ecode is not None:
+            if data_id is None or error_info is None:
+                raise ValueError("data_id and error_info cannot be None")
+            pbdata = channel_pb2.ChannelData()
+            pbdata.ecode = ecode
+            pbdata.id = data_id
+            pbdata.error_info = error_info
+        else:
+            if pbdata is None:
+                if data_id is None:
+                    raise ValueError("data_id cannot be None")
+                pbdata = channel_pb2.ChannelData()
+                pbdata.type = ChannelDataType.CHANNEL_FUTURE.value
+                pbdata.ecode = ChannelDataEcode.OK.value
+                pbdata.id = data_id
+            elif not isinstance(pbdata, channel_pb2.ChannelData):
+                raise TypeError(
+                    "pbdata must be pyserving_channel_pb2.ChannelData type({})".
+                    format(type(pbdata)))
+        self.future = future
+        self.pbdata = pbdata
+        self.callback_func = callback_func
+
+    def parse(self):
+        # return narray
+        feed = {}
+        if self.pbdata.type == ChannelDataType.CHANNEL_PBDATA.value:
+            for inst in self.pbdata.insts:
+                feed[inst.name] = np.frombuffer(inst.data, dtype=inst.type)
+                feed[inst.name].shape = np.frombuffer(inst.shape, dtype="int32")
+        elif self.pbdata.type == ChannelDataType.CHANNEL_FUTURE.value:
+            feed = self.future.result()
+            if self.callback_func is not None:
+                feed = self.callback_func(feed)
+        else:
+            raise TypeError("Error type({}) in pbdata.type.".format(
+                self.pbdata.type))
+        return feed
+
+
 class Channel(Queue.Queue):
     """ 
     The channel used for communication between Ops.
@@ -93,7 +165,8 @@ class Channel(Queue.Queue):
         Queue.Queue.__init__(self, maxsize=maxsize)
         self._maxsize = maxsize
         self._timeout = timeout
-        self._name = name
+        self.name = name
+        self._stop = False
 
         self._cv = threading.Condition()
 
@@ -101,7 +174,6 @@ class Channel(Queue.Queue):
         self._producer_res_count = {}  # {data_id: count}
         self._push_res = {}  # {data_id: {op_name: data}}
 
-        self._front_wait_interval = 0.1  # second
         self._consumers = {}  # {op_name: idx}
         self._idx_consumer_num = {}  # {idx: num}
         self._consumer_base_idx = 0
@@ -114,7 +186,7 @@ class Channel(Queue.Queue):
         return self._consumers.keys()
 
     def _log(self, info_str):
-        return "[{}] {}".format(self._name, info_str)
+        return "[{}] {}".format(self.name, info_str)
 
     def debug(self):
         return self._log("p: {}, c: {}".format(self.get_producers(),
@@ -138,9 +210,10 @@ class Channel(Queue.Queue):
             self._idx_consumer_num[0] = 0
         self._idx_consumer_num[0] += 1
 
-    def push(self, data, op_name=None):
+    def push(self, channeldata, op_name=None):
         logging.debug(
-            self._log("{} try to push data: {}".format(op_name, data)))
+            self._log("{} try to push data: {}".format(op_name,
+                                                       channeldata.pbdata)))
         if len(self._producers) == 0:
             raise Exception(
                 self._log(
@@ -148,9 +221,9 @@ class Channel(Queue.Queue):
                 ))
         elif len(self._producers) == 1:
             with self._cv:
-                while True:
+                while self._stop is False:
                     try:
-                        self.put(data, timeout=0)
+                        self.put(channeldata, timeout=0)
                         break
                     except Queue.Empty:
                         self._cv.wait()
@@ -163,17 +236,17 @@ class Channel(Queue.Queue):
                     "There are multiple producers, so op_name cannot be None."))
 
         producer_num = len(self._producers)
-        data_id = data.id
+        data_id = channeldata.pbdata.id
         put_data = None
         with self._cv:
-            logging.debug(self._log("{} get lock ~".format(op_name)))
+            logging.debug(self._log("{} get lock".format(op_name)))
             if data_id not in self._push_res:
                 self._push_res[data_id] = {
                     name: None
                     for name in self._producers
                 }
                 self._producer_res_count[data_id] = 0
-            self._push_res[data_id][op_name] = data
+            self._push_res[data_id][op_name] = channeldata
             if self._producer_res_count[data_id] + 1 == producer_num:
                 put_data = self._push_res[data_id]
                 self._push_res.pop(data_id)
@@ -183,10 +256,10 @@ class Channel(Queue.Queue):
 
             if put_data is None:
                 logging.debug(
-                    self._log("{} push data succ, not not push to queue.".
+                    self._log("{} push data succ, but not push to queue.".
                               format(op_name)))
             else:
-                while True:
+                while self._stop is False:
                     try:
                         self.put(put_data, timeout=0)
                         break
@@ -208,7 +281,7 @@ class Channel(Queue.Queue):
         elif len(self._consumers) == 1:
             resp = None
             with self._cv:
-                while resp is None:
+                while self._stop is False and resp is None:
                     try:
                         resp = self.get(timeout=0)
                         break
@@ -223,11 +296,11 @@ class Channel(Queue.Queue):
 
         with self._cv:
             # data_idx = consumer_idx - base_idx
-            while self._consumers[op_name] - self._consumer_base_idx >= len(
-                    self._front_res):
+            while self._stop is False and self._consumers[
+                    op_name] - self._consumer_base_idx >= len(self._front_res):
                 try:
-                    data = self.get(timeout=0)
-                    self._front_res.append(data)
+                    channeldata = self.get(timeout=0)
+                    self._front_res.append(channeldata)
                     break
                 except Queue.Empty:
                     self._cv.wait()
@@ -256,14 +329,17 @@ class Channel(Queue.Queue):
         logging.debug(self._log("multi | {} get data succ!".format(op_name)))
         return resp  # reference, read only
 
+    def stop(self):
+        #TODO
+        self.close()
+        self._stop = True
+        self._cv.notify_all()
+
 
 class Op(object):
     def __init__(self,
                  name,
-                 input,
-                 in_dtype,
-                 outputs,
-                 out_dtype,
+                 inputs,
                  server_model=None,
                  server_port=None,
                  device=None,
@@ -274,25 +350,24 @@ class Op(object):
                  timeout=-1,
                  retry=2):
         self._run = False
-        # TODO: globally unique check
-        self._name = name  # to identify the type of OP, it must be globally unique
+        self.name = name  # to identify the type of OP, it must be globally unique
         self._concurrency = concurrency  # amount of concurrency
-        self.set_input(input)
-        self._in_dtype = in_dtype
-        self.set_outputs(outputs)
-        self._out_dtype = out_dtype
-        self._client = None
-        if client_config is not None and \
-                server_name is not None and \
-                fetch_names is not None:
-            self.set_client(client_config, server_name, fetch_names)
+        self.set_input_ops(inputs)
+        self.set_client(client_config, server_name, fetch_names)
         self._server_model = server_model
         self._server_port = server_port
         self._device = device
         self._timeout = timeout
-        self._retry = retry
+        self._retry = max(1, retry)
+        self._input = None
+        self._outputs = []
 
     def set_client(self, client_config, server_name, fetch_names):
+        self._client = None
+        if client_config is None or \
+                server_name is None or \
+                fetch_names is None:
+            return
         self._client = Client()
         self._client.load_client_config(client_config)
         self._client.connect([server_name])
@@ -301,38 +376,48 @@ class Op(object):
     def with_serving(self):
         return self._client is not None
 
-    def get_input(self):
+    def get_input_channel(self):
         return self._input
 
-    def set_input(self, channel):
+    def get_input_ops(self):
+        return self._input_ops
+
+    def set_input_ops(self, ops):
+        if not isinstance(ops, list):
+            ops = [] if ops is None else [ops]
+        self._input_ops = []
+        for op in ops:
+            if not isinstance(op, Op):
+                raise TypeError(
+                    self._log('input op must be Op type, not {}'.format(
+                        type(op))))
+            self._input_ops.append(op)
+
+    def add_input_channel(self, channel):
         if not isinstance(channel, Channel):
             raise TypeError(
                 self._log('input channel must be Channel type, not {}'.format(
                     type(channel))))
-        channel.add_consumer(self._name)
+        channel.add_consumer(self.name)
         self._input = channel
 
-    def get_outputs(self):
+    def get_output_channels(self):
         return self._outputs
 
-    def set_outputs(self, channels):
-        if not isinstance(channels, list):
+    def add_output_channel(self, channel):
+        if not isinstance(channel, Channel):
             raise TypeError(
-                self._log('output channels must be list type, not {}'.format(
-                    type(channels))))
-        for channel in channels:
-            channel.add_producer(self._name)
-        self._outputs = channels
-
-    def preprocess(self, data):
-        if isinstance(data, dict):
-            raise Exception(
-                self._log(
-                    'this Op has multiple previous inputs. Please override this method'
-                ))
-        feed = {}
-        for inst in data.insts:
-            feed[inst.name] = np.frombuffer(inst.data, dtype=self._in_dtype)
+                self._log('output channel must be Channel type, not {}'.format(
+                    type(channel))))
+        channel.add_producer(self.name)
+        self._outputs.append(channel)
+
+    def preprocess(self, channeldata):
+        if isinstance(channeldata, dict):
+            raise NotImplementedError(
+                'this Op has multiple previous inputs. Please override this method'
+            )
+        feed = channeldata.parse()
         return feed
 
     def midprocess(self, data):
@@ -343,124 +428,262 @@ class Op(object):
                     format(type(data))))
         logging.debug(self._log('data: {}'.format(data)))
         logging.debug(self._log('fetch: {}'.format(self._fetch_names)))
-        fetch_map = self._client.predict(feed=data, fetch=self._fetch_names)
-        logging.debug(self._log("finish predict"))
-        return fetch_map
+        call_future = self._client.predict(
+            feed=data, fetch=self._fetch_names, asyn=True)
+        logging.debug(self._log("get call_future"))
+        return call_future
 
     def postprocess(self, output_data):
-        raise Exception(
-            self._log(
-                'Please override this method to convert data to the format in channel.'
-            ))
-
-    def errorprocess(self, error_info):
-        data = python_service_channel_pb2.ChannelData()
-        data.is_error = 1
-        data.error_info = error_info
-        return data
+        return output_data
 
     def stop(self):
+        self._input.stop()
+        for channel in self._outputs:
+            channel.stop()
         self._run = False
 
+    def _parse_channeldata(self, channeldata):
+        data_id, error_pbdata = None, None
+        if isinstance(channeldata, dict):
+            parsed_data = {}
+            key = channeldata.keys()[0]
+            data_id = channeldata[key].pbdata.id
+            for _, data in channeldata.items():
+                if data.pbdata.ecode != ChannelDataEcode.OK.value:
+                    error_pbdata = data.pbdata
+                    break
+        else:
+            data_id = channeldata.pbdata.id
+            if channeldata.pbdata.ecode != ChannelDataEcode.OK.value:
+                error_pbdata = channeldata.pbdata
+        return data_id, error_pbdata
+
+    def _push_to_output_channels(self, data, name=None):
+        if name is None:
+            name = self.name
+        for channel in self._outputs:
+            channel.push(data, name)
+
     def start(self, concurrency_idx):
+        op_info_prefix = "[{}|{}]".format(self.name, concurrency_idx)
+        log = self._get_log_func(op_info_prefix)
         self._run = True
         while self._run:
-            _profiler.record("{}{}-get_0".format(self._name, concurrency_idx))
-            input_data = self._input.front(self._name)
-            _profiler.record("{}{}-get_1".format(self._name, concurrency_idx))
-            data_id = None
-            output_data = None
-            error_data = None
-            logging.debug(self._log("input_data: {}".format(input_data)))
-            if isinstance(input_data, dict):
-                key = input_data.keys()[0]
-                data_id = input_data[key].id
-                for _, data in input_data.items():
-                    if data.is_error != 0:
-                        error_data = data
-                        break
-            else:
-                data_id = input_data.id
-                if input_data.is_error != 0:
-                    error_data = input_data
-
-            if error_data is None:
-                _profiler.record("{}{}-prep_0".format(self._name,
-                                                      concurrency_idx))
-                data = self.preprocess(input_data)
-                _profiler.record("{}{}-prep_1".format(self._name,
-                                                      concurrency_idx))
-
-                error_info = None
-                if self.with_serving():
+            _profiler.record("{}-get_0".format(op_info_prefix))
+            channeldata = self._input.front(self.name)
+            _profiler.record("{}-get_1".format(op_info_prefix))
+            logging.debug(log("input_data: {}".format(channeldata)))
+
+            data_id, error_pbdata = self._parse_channeldata(channeldata)
+
+            # error data in predecessor Op
+            if error_pbdata is not None:
+                self._push_to_output_channels(ChannelData(pbdata=error_pbdata))
+                continue
+
+            # preprecess
+            try:
+                _profiler.record("{}-prep_0".format(op_info_prefix))
+                preped_data = self.preprocess(channeldata)
+                _profiler.record("{}-prep_1".format(op_info_prefix))
+            except NotImplementedError as e:
+                # preprocess function not implemented
+                error_info = log(e)
+                logging.error(error_info)
+                self._push_to_output_channels(
+                    ChannelData(
+                        ecode=ChannelDataEcode.NOT_IMPLEMENTED.value,
+                        error_info=error_info,
+                        data_id=data_id))
+                continue
+            except TypeError as e:
+                # Error type in channeldata.pbdata.type
+                error_info = log(e)
+                logging.error(error_info)
+                self._push_to_output_channels(
+                    ChannelData(
+                        ecode=ChannelDataEcode.TYPE_ERROR.value,
+                        error_info=error_info,
+                        data_id=data_id))
+                continue
+            except Exception as e:
+                error_info = log(e)
+                logging.error(error_info)
+                self._push_to_output_channels(
+                    ChannelData(
+                        ecode=ChannelDataEcode.TYPE_ERROR.value,
+                        error_info=error_info,
+                        data_id=data_id))
+                continue
+
+            # midprocess
+            call_future = None
+            if self.with_serving():
+                ecode = ChannelDataEcode.OK.value
+                _profiler.record("{}-midp_0".format(op_info_prefix))
+                if self._timeout <= 0:
+                    try:
+                        call_future = self.midprocess(preped_data)
+                    except Exception as e:
+                        ecode = ChannelDataEcode.UNKNOW.value
+                        error_info = log(e)
+                        logging.error(error_info)
+                else:
                     for i in range(self._retry):
-                        _profiler.record("{}{}-midp_0".format(self._name,
-                                                              concurrency_idx))
-                        if self._timeout > 0:
-                            try:
-                                middata = func_timeout.func_timeout(
-                                    self._timeout,
-                                    self.midprocess,
-                                    args=(data, ))
-                            except func_timeout.FunctionTimedOut:
-                                logging.error("error: timeout")
-                                error_info = "{}({}): timeout".format(
-                                    self._name, concurrency_idx)
-                            except Exception as e:
-                                logging.error("error: {}".format(e))
-                                error_info = "{}({}): {}".format(
-                                    self._name, concurrency_idx, e)
+                        try:
+                            call_future = func_timeout.func_timeout(
+                                self._timeout,
+                                self.midprocess,
+                                args=(preped_data, ))
+                        except func_timeout.FunctionTimedOut as e:
+                            if i + 1 >= self._retry:
+                                ecode = ChannelDataEcode.TIMEOUT.value
+                                error_info = log(e)
+                                logging.error(error_info)
+                            else:
+                                logging.warn(
+                                    log("timeout, retry({})".format(i + 1)))
+                        except Exception as e:
+                            ecode = ChannelDataEcode.UNKNOW.value
+                            error_info = log(e)
+                            logging.error(error_info)
+                            break
                         else:
-                            middata = self.midprocess(data)
-                        _profiler.record("{}{}-midp_1".format(self._name,
-                                                              concurrency_idx))
-                        if error_info is None:
-                            data = middata
                             break
-                        if i + 1 < self._retry:
-                            error_info = None
-                            logging.warn(
-                                self._log("warn: timeout, retry({})".format(i +
-                                                                            1)))
-
-                _profiler.record("{}{}-postp_0".format(self._name,
-                                                       concurrency_idx))
-                if error_info is not None:
-                    output_data = self.errorprocess(error_info)
-                else:
-                    output_data = self.postprocess(data)
-
-                    if not isinstance(output_data,
-                                      python_service_channel_pb2.ChannelData):
-                        raise TypeError(
-                            self._log(
-                                'output_data must be ChannelData type, but get {}'.
-                                format(type(output_data))))
-                    output_data.is_error = 0
-                _profiler.record("{}{}-postp_1".format(self._name,
-                                                       concurrency_idx))
-
-                output_data.id = data_id
+                if ecode != ChannelDataEcode.OK.value:
+                    self._push_to_output_channels(
+                        ChannelData(
+                            ecode=ecode, error_info=error_info,
+                            data_id=data_id))
+                    continue
+                _profiler.record("{}-midp_1".format(op_info_prefix))
+
+            # postprocess
+            output_data = None
+            _profiler.record("{}-postp_0".format(op_info_prefix))
+            if self.with_serving():
+                # use call_future
+                output_data = ChannelData(
+                    future=call_future,
+                    data_id=data_id,
+                    callback_func=self.postprocess)
             else:
-                output_data = error_data
-
-            _profiler.record("{}{}-push_0".format(self._name, concurrency_idx))
-            for channel in self._outputs:
-                channel.push(output_data, self._name)
-            _profiler.record("{}{}-push_1".format(self._name, concurrency_idx))
-
-    def _log(self, info_str):
-        return "[{}] {}".format(self._name, info_str)
+                try:
+                    postped_data = self.postprocess(preped_data)
+                except Exception as e:
+                    ecode = ChannelDataEcode.UNKNOW.value
+                    error_info = log(e)
+                    logging.error(error_info)
+                    self._push_to_output_channels(
+                        ChannelData(
+                            ecode=ecode, error_info=error_info,
+                            data_id=data_id))
+                    continue
+                if not isinstance(postped_data, dict):
+                    ecode = ChannelDataEcode.TYPE_ERROR.value
+                    error_info = log("output of postprocess funticon must be " \
+                            "dict type, but get {}".format(type(postped_data)))
+                    logging.error(error_info)
+                    self._push_to_output_channels(
+                        ChannelData(
+                            ecode=ecode, error_info=error_info,
+                            data_id=data_id))
+                    continue
+
+                ecode = ChannelDataEcode.OK.value
+                error_info = None
+                pbdata = channel_pb2.ChannelData()
+                for name, value in postped_data.items():
+                    if not isinstance(name, (str, unicode)):
+                        ecode = ChannelDataEcode.TYPE_ERROR.value
+                        error_info = log("the key of postped_data must " \
+                                "be str, but get {}".format(type(name)))
+                        break
+                    if not isinstance(value, np.ndarray):
+                        ecode = ChannelDataEcode.TYPE_ERROR.value
+                        error_info = log("the value of postped_data must " \
+                                "be np.ndarray, but get {}".format(type(value)))
+                        break
+                    inst = channel_pb2.Inst()
+                    inst.data = value.tobytes()
+                    inst.name = name
+                    inst.shape = np.array(value.shape, dtype="int32").tobytes()
+                    inst.type = str(value.dtype)
+                    pbdata.insts.append(inst)
+                if ecode != ChannelDataEcode.OK.value:
+                    logging.error(error_info)
+                    self._push_to_output_channels(
+                        ChannelData(
+                            ecode=ecode, error_info=error_info,
+                            data_id=data_id))
+                    continue
+                pbdata.ecode = ecode
+                pbdata.id = data_id
+                output_data = ChannelData(pbdata=pbdata)
+            _profiler.record("{}-postp_1".format(op_info_prefix))
+
+            # push data to channel (if run succ)
+            _profiler.record("{}-push_0".format(op_info_prefix))
+            self._push_to_output_channels(output_data)
+            _profiler.record("{}-push_1".format(op_info_prefix))
+
+    def _log(self, info):
+        return "{} {}".format(self.name, info)
+
+    def _get_log_func(self, op_info_prefix):
+        def log_func(info_str):
+            return "{} {}".format(op_info_prefix, info_str)
+
+        return log_func
 
     def get_concurrency(self):
         return self._concurrency
 
 
+class VirtualOp(Op):
+    ''' For connecting two channels. '''
+
+    def __init__(self, name, concurrency=1):
+        super(VirtualOp, self).__init__(
+            name=name, inputs=None, concurrency=concurrency)
+        self._virtual_pred_ops = []
+
+    def add_virtual_pred_op(self, op):
+        self._virtual_pred_ops.append(op)
+
+    def add_output_channel(self, channel):
+        if not isinstance(channel, Channel):
+            raise TypeError(
+                self._log('output channel must be Channel type, not {}'.format(
+                    type(channel))))
+        for op in self._virtual_pred_ops:
+            channel.add_producer(op.name)
+        self._outputs.append(channel)
+
+    def start(self, concurrency_idx):
+        op_info_prefix = "[{}|{}]".format(self.name, concurrency_idx)
+        log = self._get_log_func(op_info_prefix)
+        self._run = True
+        while self._run:
+            _profiler.record("{}-get_0".format(op_info_prefix))
+            channeldata = self._input.front(self.name)
+            _profiler.record("{}-get_1".format(op_info_prefix))
+
+            _profiler.record("{}-push_0".format(op_info_prefix))
+            if isinstance(channeldata, dict):
+                for name, data in channeldata.items():
+                    self._push_to_output_channels(data, name=name)
+            else:
+                self._push_to_output_channels(channeldata,
+                                              self._virtual_pred_ops[0].name)
+            _profiler.record("{}-push_1".format(op_info_prefix))
+
+
 class GeneralPythonService(
         general_python_service_pb2_grpc.GeneralPythonService):
     def __init__(self, in_channel, out_channel, retry=2):
         super(GeneralPythonService, self).__init__()
-        self._name = "#G"
+        self.name = "#G"
         self.set_in_channel(in_channel)
         self.set_out_channel(out_channel)
         logging.debug(self._log(in_channel.debug()))
@@ -478,14 +701,14 @@ class GeneralPythonService(
         self._recive_func.start()
 
     def _log(self, info_str):
-        return "[{}] {}".format(self._name, info_str)
+        return "[{}] {}".format(self.name, info_str)
 
     def set_in_channel(self, in_channel):
         if not isinstance(in_channel, Channel):
             raise TypeError(
                 self._log('in_channel must be Channel type, but get {}'.format(
                     type(in_channel))))
-        in_channel.add_producer(self._name)
+        in_channel.add_producer(self.name)
         self._in_channel = in_channel
 
     def set_out_channel(self, out_channel):
@@ -493,18 +716,19 @@ class GeneralPythonService(
             raise TypeError(
                 self._log('out_channel must be Channel type, but get {}'.format(
                     type(out_channel))))
-        out_channel.add_consumer(self._name)
+        out_channel.add_consumer(self.name)
         self._out_channel = out_channel
 
     def _recive_out_channel_func(self):
         while True:
-            data = self._out_channel.front(self._name)
-            if not isinstance(data, python_service_channel_pb2.ChannelData):
+            channeldata = self._out_channel.front(self.name)
+            if not isinstance(channeldata, ChannelData):
                 raise TypeError(
                     self._log('data must be ChannelData type, but get {}'.
-                              format(type(data))))
+                              format(type(channeldata))))
             with self._cv:
-                self._globel_resp_dict[data.id] = data
+                data_id = channeldata.pbdata.id
+                self._globel_resp_dict[data_id] = channeldata
                 self._cv.notify_all()
 
     def _get_next_id(self):
@@ -523,60 +747,78 @@ class GeneralPythonService(
 
     def _pack_data_for_infer(self, request):
         logging.debug(self._log('start inferce'))
-        data = python_service_channel_pb2.ChannelData()
+        pbdata = channel_pb2.ChannelData()
         data_id = self._get_next_id()
-        data.id = data_id
-        data.is_error = 0
+        pbdata.id = data_id
         for idx, name in enumerate(request.feed_var_names):
             logging.debug(
                 self._log('name: {}'.format(request.feed_var_names[idx])))
             logging.debug(self._log('data: {}'.format(request.feed_insts[idx])))
-            inst = python_service_channel_pb2.Inst()
+            inst = channel_pb2.Inst()
             inst.data = request.feed_insts[idx]
+            inst.shape = request.shape[idx]
             inst.name = name
-            data.insts.append(inst)
-        return data, data_id
-
-    def _pack_data_for_resp(self, data):
-        logging.debug(self._log('get data'))
-        resp = general_python_service_pb2.Response()
-        logging.debug(self._log('gen resp'))
-        logging.debug(data)
-        resp.is_error = data.is_error
-        if resp.is_error == 0:
-            for inst in data.insts:
-                logging.debug(self._log('append data'))
-                resp.fetch_insts.append(inst.data)
-                logging.debug(self._log('append name'))
-                resp.fetch_var_names.append(inst.name)
+            inst.type = request.type[idx]
+            pbdata.insts.append(inst)
+        pbdata.ecode = ChannelDataEcode.OK.value  #TODO: parse request error
+        return ChannelData(pbdata=pbdata), data_id
+
+    def _pack_data_for_resp(self, channeldata):
+        logging.debug(self._log('get channeldata'))
+        resp = pyservice_pb2.Response()
+        resp.ecode = channeldata.pbdata.ecode
+        if resp.ecode == ChannelDataEcode.OK.value:
+            if channeldata.pbdata.type == ChannelDataType.CHANNEL_PBDATA.value:
+                for inst in channeldata.pbdata.insts:
+                    resp.fetch_insts.append(inst.data)
+                    resp.fetch_var_names.append(inst.name)
+                    resp.shape.append(inst.shape)
+                    resp.type.append(inst.type)
+            elif channeldata.pbdata.type == ChannelDataType.CHANNEL_FUTURE.value:
+                feed = channeldata.futures.result()
+                if channeldata.callback_func is not None:
+                    feed = channeldata.callback_func(feed)
+                for name, var in feed:
+                    resp.fetch_insts.append(var.tobytes())
+                    resp.fetch_var_names.append(name)
+                    resp.shape.append(
+                        np.array(
+                            var.shape, dtype="int32").tobytes())
+                    resp.type.append(str(var.dtype))
+            else:
+                raise TypeError(
+                    self._log("Error type({}) in pbdata.type.".format(
+                        self.pbdata.type)))
         else:
-            resp.error_info = data.error_info
+            resp.error_info = channeldata.pbdata.error_info
         return resp
 
     def inference(self, request, context):
-        _profiler.record("{}-prepack_0".format(self._name))
+        _profiler.record("{}-prepack_0".format(self.name))
         data, data_id = self._pack_data_for_infer(request)
-        _profiler.record("{}-prepack_1".format(self._name))
+        _profiler.record("{}-prepack_1".format(self.name))
 
+        resp_channeldata = None
         for i in range(self._retry):
             logging.debug(self._log('push data'))
-            _profiler.record("{}-push_0".format(self._name))
-            self._in_channel.push(data, self._name)
-            _profiler.record("{}-push_1".format(self._name))
+            _profiler.record("{}-push_0".format(self.name))
+            self._in_channel.push(data, self.name)
+            _profiler.record("{}-push_1".format(self.name))
 
             logging.debug(self._log('wait for infer'))
-            resp_data = None
-            _profiler.record("{}-fetch_0".format(self._name))
-            resp_data = self._get_data_in_globel_resp_dict(data_id)
-            _profiler.record("{}-fetch_1".format(self._name))
+            _profiler.record("{}-fetch_0".format(self.name))
+            resp_channeldata = self._get_data_in_globel_resp_dict(data_id)
+            _profiler.record("{}-fetch_1".format(self.name))
 
-            if resp_data.is_error == 0:
+            if resp_channeldata.pbdata.ecode == ChannelDataEcode.OK.value:
                 break
-            logging.warn("retry({}): {}".format(i + 1, resp_data.error_info))
+            if i + 1 < self._retry:
+                logging.warn("retry({}): {}".format(
+                    i + 1, resp_channeldata.pbdata.error_info))
 
-        _profiler.record("{}-postpack_0".format(self._name))
-        resp = self._pack_data_for_resp(resp_data)
-        _profiler.record("{}-postpack_1".format(self._name))
+        _profiler.record("{}-postpack_0".format(self.name))
+        resp = self._pack_data_for_resp(resp_channeldata)
+        _profiler.record("{}-postpack_1".format(self.name))
         _profiler.print_profile()
         return resp
 
@@ -584,7 +826,8 @@ class GeneralPythonService(
 class PyServer(object):
     def __init__(self, retry=2, profile=False):
         self._channels = []
-        self._ops = []
+        self._user_ops = []
+        self._actual_ops = []
         self._op_threads = []
         self._port = None
         self._worker_num = None
@@ -597,47 +840,185 @@ class PyServer(object):
         self._channels.append(channel)
 
     def add_op(self, op):
-        self._ops.append(op)
+        self._user_ops.append(op)
+
+    def add_ops(self, ops):
+        self._user_ops.extend(ops)
 
     def gen_desc(self):
-        logging.info('here will generate desc for paas')
+        logging.info('here will generate desc for PAAS')
         pass
 
+    def _topo_sort(self):
+        indeg_num = {}
+        que_idx = 0  # scroll queue 
+        ques = [Queue.Queue() for _ in range(2)]
+        for op in self._user_ops:
+            if len(op.get_input_ops()) == 0:
+                op.name = "#G"  # update read_op.name
+                break
+        outdegs = {op.name: [] for op in self._user_ops}
+        for idx, op in enumerate(self._user_ops):
+            # check the name of op is globally unique
+            if op.name in indeg_num:
+                raise Exception("the name of Op must be unique")
+            indeg_num[op.name] = len(op.get_input_ops())
+            if indeg_num[op.name] == 0:
+                ques[que_idx].put(op)
+            for pred_op in op.get_input_ops():
+                outdegs[pred_op.name].append(op)
+
+        # topo sort to get dag_views
+        dag_views = []
+        sorted_op_num = 0
+        while True:
+            que = ques[que_idx]
+            next_que = ques[(que_idx + 1) % 2]
+            dag_view = []
+            while que.qsize() != 0:
+                op = que.get()
+                dag_view.append(op)
+                sorted_op_num += 1
+                for succ_op in outdegs[op.name]:
+                    indeg_num[succ_op.name] -= 1
+                    if indeg_num[succ_op.name] == 0:
+                        next_que.put(succ_op)
+            dag_views.append(dag_view)
+            if next_que.qsize() == 0:
+                break
+            que_idx = (que_idx + 1) % 2
+        if sorted_op_num < len(self._user_ops):
+            raise Exception("not legal DAG")
+        if len(dag_views[0]) != 1:
+            raise Exception("DAG contains multiple input Ops")
+        if len(dag_views[-1]) != 1:
+            raise Exception("DAG contains multiple output Ops")
+
+        # create channels and virtual ops
+        def name_generator(prefix):
+            def number_generator():
+                idx = 0
+                while True:
+                    yield "{}{}".format(prefix, idx)
+                    idx += 1
+
+            return number_generator()
+
+        virtual_op_name_gen = name_generator("vir")
+        channel_name_gen = name_generator("chl")
+        virtual_ops = []
+        channels = []
+        input_channel = None
+        actual_view = None
+        for v_idx, view in enumerate(dag_views):
+            if v_idx + 1 >= len(dag_views):
+                break
+            next_view = dag_views[v_idx + 1]
+            if actual_view is None:
+                actual_view = view
+            actual_next_view = []
+            pred_op_of_next_view_op = {}
+            for op in actual_view:
+                # find actual succ op in next view and create virtual op
+                for succ_op in outdegs[op.name]:
+                    if succ_op in next_view:
+                        if succ_op not in actual_next_view:
+                            actual_next_view.append(succ_op)
+                        if succ_op.name not in pred_op_of_next_view_op:
+                            pred_op_of_next_view_op[succ_op.name] = []
+                        pred_op_of_next_view_op[succ_op.name].append(op)
+                    else:
+                        # create virtual op
+                        virtual_op = None
+                        virtual_op = VirtualOp(name=virtual_op_name_gen.next())
+                        virtual_ops.append(virtual_op)
+                        outdegs[virtual_op.name] = [succ_op]
+                        actual_next_view.append(virtual_op)
+                        pred_op_of_next_view_op[virtual_op.name] = [op]
+                        virtual_op.add_virtual_pred_op(op)
+            actual_view = actual_next_view
+            # create channel
+            processed_op = set()
+            for o_idx, op in enumerate(actual_next_view):
+                if op.name in processed_op:
+                    continue
+                channel = Channel(name=channel_name_gen.next())
+                channels.append(channel)
+                logging.debug("{} => {}".format(channel.name, op.name))
+                op.add_input_channel(channel)
+                pred_ops = pred_op_of_next_view_op[op.name]
+                if v_idx == 0:
+                    input_channel = channel
+                else:
+                    # if pred_op is virtual op, it will use ancestors as producers to channel
+                    for pred_op in pred_ops:
+                        logging.debug("{} => {}".format(pred_op.name,
+                                                        channel.name))
+                        pred_op.add_output_channel(channel)
+                processed_op.add(op.name)
+                # find same input op to combine channel
+                for other_op in actual_next_view[o_idx + 1:]:
+                    if other_op.name in processed_op:
+                        continue
+                    other_pred_ops = pred_op_of_next_view_op[other_op.name]
+                    if len(other_pred_ops) != len(pred_ops):
+                        continue
+                    same_flag = True
+                    for pred_op in pred_ops:
+                        if pred_op not in other_pred_ops:
+                            same_flag = False
+                            break
+                    if same_flag:
+                        logging.debug("{} => {}".format(channel.name,
+                                                        other_op.name))
+                        other_op.add_input_channel(channel)
+                        processed_op.add(other_op.name)
+        output_channel = Channel(name=channel_name_gen.next())
+        channels.append(output_channel)
+        last_op = dag_views[-1][0]
+        last_op.add_output_channel(output_channel)
+
+        self._actual_ops = virtual_ops
+        for op in self._user_ops:
+            if len(op.get_input_ops()) == 0:
+                # pass read op
+                continue
+            self._actual_ops.append(op)
+        self._channels = channels
+        for c in channels:
+            logging.debug(c.debug())
+        return input_channel, output_channel
+
     def prepare_server(self, port, worker_num):
         self._port = port
         self._worker_num = worker_num
-        inputs = set()
-        outputs = set()
-        for op in self._ops:
-            inputs |= set([op.get_input()])
-            outputs |= set(op.get_outputs())
+
+        input_channel, output_channel = self._topo_sort()
+        self._in_channel = input_channel
+        self._out_channel = output_channel
+        for op in self._actual_ops:
             if op.with_serving():
                 self.prepare_serving(op)
-        in_channel = inputs - outputs
-        out_channel = outputs - inputs
-        if len(in_channel) != 1 or len(out_channel) != 1:
-            raise Exception(
-                "in_channel(out_channel) more than 1 or no in_channel(out_channel)"
-            )
-        self._in_channel = in_channel.pop()
-        self._out_channel = out_channel.pop()
         self.gen_desc()
 
     def _op_start_wrapper(self, op, concurrency_idx):
         return op.start(concurrency_idx)
 
     def _run_ops(self):
-        for op in self._ops:
+        for op in self._actual_ops:
             op_concurrency = op.get_concurrency()
             logging.debug("run op: {}, op_concurrency: {}".format(
-                op._name, op_concurrency))
+                op.name, op_concurrency))
             for c in range(op_concurrency):
-                # th = multiprocessing.Process(
                 th = threading.Thread(
                     target=self._op_start_wrapper, args=(op, c))
                 th.start()
                 self._op_threads.append(th)
 
+    def _stop_ops(self):
+        for op in self._actual_ops:
+            op.stop()
+
     def run_server(self):
         self._run_ops()
         server = grpc.server(
@@ -647,12 +1028,10 @@ class PyServer(object):
                                  self._retry), server)
         server.add_insecure_port('[::]:{}'.format(self._port))
         server.start()
-        try:
-            for th in self._op_threads:
-                th.join()
-            server.join()
-        except KeyboardInterrupt:
-            server.stop(0)
+        server.wait_for_termination()
+        self._stop_ops()  # TODO
+        for th in self._op_threads:
+            th.join()
 
     def prepare_serving(self, op):
         model_path = op._server_model
@@ -660,12 +1039,10 @@ class PyServer(object):
         device = op._device
 
         if device == "cpu":
-            cmd = "python -m paddle_serving_server.serve --model {} --thread 4 --port {} &>/dev/null &".format(
-                model_path, port)
+            cmd = "(Use MultiLangServer) python -m paddle_serving_server.serve" \
+                  " --model {} --thread 4 --port {} --use_multilang &>/dev/null &".format(model_path, port)
         else:
-            cmd = "python -m paddle_serving_server_gpu.serve --model {} --thread 4 --port {} &>/dev/null &".format(
-                model_path, port)
+            cmd = "(Use MultiLangServer) python -m paddle_serving_server_gpu.serve" \
+                  " --model {} --thread 4 --port {} --use_multilang &>/dev/null &".format(model_path, port)
         # run a server (not in PyServing)
         logging.info("run a server (not in PyServing): {}".format(cmd))
-        return
-        # os.system(cmd)
diff --git a/python/paddle_serving_server/serve.py b/python/paddle_serving_server/serve.py
index 894b0c5b132845cbde589982e1fb471f028e820b..e67cba7cd2bb89a8126c0a74393bdcec648eee17 100644
--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -40,15 +40,23 @@ def parse_args():  # pylint: disable=doc-string-missing
     parser.add_argument(
         "--device", type=str, default="cpu", help="Type of device")
     parser.add_argument(
-        "--mem_optim", type=bool, default=False, help="Memory optimize")
+        "--mem_optim",
+        default=False,
+        action="store_true",
+        help="Memory optimize")
     parser.add_argument(
-        "--ir_optim", type=bool, default=False, help="Graph optimize")
-    parser.add_argument("--use_mkl", type=bool, default=False, help="Use MKL")
+        "--ir_optim", default=False, action="store_true", help="Graph optimize")
+    parser.add_argument(
+        "--use_mkl", default=False, action="store_true", help="Use MKL")
     parser.add_argument(
         "--max_body_size",
         type=int,
         default=512 * 1024 * 1024,
         help="Limit sizes of messages")
+    parser.add_argument(
+        "--use_multilang",
+        action='store_true',
+        help="Use Multi-language-service")
     return parser.parse_args()
 
 
@@ -63,6 +71,7 @@ def start_standard_model():  # pylint: disable=doc-string-missing
     ir_optim = args.ir_optim
     max_body_size = args.max_body_size
     use_mkl = args.use_mkl
+    use_multilang = args.use_multilang
 
     if model == "":
         print("You must specify your serving model")
@@ -79,14 +88,19 @@ def start_standard_model():  # pylint: disable=doc-string-missing
     op_seq_maker.add_op(general_infer_op)
     op_seq_maker.add_op(general_response_op)
 
-    server = serving.Server()
-    server.set_op_sequence(op_seq_maker.get_op_sequence())
-    server.set_num_threads(thread_num)
-    server.set_memory_optimize(mem_optim)
-    server.set_ir_optimize(ir_optim)
-    server.use_mkl(use_mkl)
-    server.set_max_body_size(max_body_size)
-    server.set_port(port)
+    server = None
+    if use_multilang:
+        server = serving.MultiLangServer()
+        server.set_op_sequence(op_seq_maker.get_op_sequence())
+    else:
+        server = serving.Server()
+        server.set_op_sequence(op_seq_maker.get_op_sequence())
+        server.set_num_threads(thread_num)
+        server.set_memory_optimize(mem_optim)
+        server.set_ir_optimize(ir_optim)
+        server.use_mkl(use_mkl)
+        server.set_max_body_size(max_body_size)
+        server.set_port(port)
 
     server.load_model_config(model)
     server.prepare_server(workdir=workdir, port=port, device=device)
diff --git a/python/paddle_serving_server/web_service.py b/python/paddle_serving_server/web_service.py
index 7f37b10be05e84e29cf6cda3cd3cc3d939910027..b3fcc1b880fcbffa1da884e4b68350c1870997c1 100755
--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
@@ -86,7 +86,7 @@ class WebService(object):
             for key in fetch_map:
                 fetch_map[key] = fetch_map[key].tolist()
             fetch_map = self.postprocess(
-                feed=feed, fetch=fetch, fetch_map=fetch_map)
+                feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
             result = {"result": fetch_map}
         except ValueError:
             result = {"result": "Request Value Error"}
diff --git a/python/paddle_serving_server_gpu/__init__.py b/python/paddle_serving_server_gpu/__init__.py
index 7acc926c7f7fc465da20a7609bc767a5289d2e61..44733b154096255c3ce06e1be29d50d3e662269a 100644
--- a/python/paddle_serving_server_gpu/__init__.py
+++ b/python/paddle_serving_server_gpu/__init__.py
@@ -25,6 +25,17 @@ from .version import serving_server_version
 from contextlib import closing
 import argparse
 import collections
+import fcntl
+
+import numpy as np
+import grpc
+from .proto import multi_lang_general_model_service_pb2
+import sys
+sys.path.append(
+    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
+from .proto import multi_lang_general_model_service_pb2_grpc
+from multiprocessing import Pool, Process
+from concurrent import futures
 
 
 def serve_args():
@@ -46,9 +57,12 @@ def serve_args():
     parser.add_argument(
         "--name", type=str, default="None", help="Default service name")
     parser.add_argument(
-        "--mem_optim", type=bool, default=False, help="Memory optimize")
+        "--mem_optim",
+        default=False,
+        action="store_true",
+        help="Memory optimize")
     parser.add_argument(
-        "--ir_optim", type=bool, default=False, help="Graph optimize")
+        "--ir_optim", default=False, action="store_true", help="Graph optimize")
     parser.add_argument(
         "--max_body_size",
         type=int,
@@ -347,6 +361,11 @@ class Server(object):
 
         download_flag = "{}/{}.is_download".format(self.module_path,
                                                    folder_name)
+
+        #acquire lock
+        version_file = open("{}/version.py".format(self.module_path), "r")
+        fcntl.flock(version_file, fcntl.LOCK_EX)
+
         if os.path.exists(download_flag):
             os.chdir(self.cur_path)
             self.bin_path = self.server_path + "/serving"
@@ -377,6 +396,8 @@ class Server(object):
                         format(self.module_path))
                 finally:
                     os.remove(tar_name)
+        #release lock
+        version_file.close()
         os.chdir(self.cur_path)
         self.bin_path = self.server_path + "/serving"
 
@@ -461,3 +482,158 @@ class Server(object):
         print(command)
 
         os.system(command)
+
+
+class MultiLangServerService(
+        multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelService):
+    def __init__(self, model_config_path, endpoints):
+        from paddle_serving_client import Client
+        self._parse_model_config(model_config_path)
+        self.bclient_ = Client()
+        self.bclient_.load_client_config(
+            "{}/serving_server_conf.prototxt".format(model_config_path))
+        self.bclient_.connect(endpoints)
+
+    def _parse_model_config(self, model_config_path):
+        model_conf = m_config.GeneralModelConfig()
+        f = open("{}/serving_server_conf.prototxt".format(model_config_path),
+                 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
+        self.feed_types_ = {}
+        self.feed_shapes_ = {}
+        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
+        self.fetch_types_ = {}
+        self.lod_tensor_set_ = set()
+        for i, var in enumerate(model_conf.feed_var):
+            self.feed_types_[var.alias_name] = var.feed_type
+            self.feed_shapes_[var.alias_name] = var.shape
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+        for i, var in enumerate(model_conf.fetch_var):
+            self.fetch_types_[var.alias_name] = var.fetch_type
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+
+    def _flatten_list(self, nested_list):
+        for item in nested_list:
+            if isinstance(item, (list, tuple)):
+                for sub_item in self._flatten_list(item):
+                    yield sub_item
+            else:
+                yield item
+
+    def _unpack_request(self, request):
+        feed_names = list(request.feed_var_names)
+        fetch_names = list(request.fetch_var_names)
+        is_python = request.is_python
+        feed_batch = []
+        for feed_inst in request.insts:
+            feed_dict = {}
+            for idx, name in enumerate(feed_names):
+                var = feed_inst.tensor_array[idx]
+                v_type = self.feed_types_[name]
+                data = None
+                if is_python:
+                    if v_type == 0:
+                        data = np.frombuffer(var.data, dtype="int64")
+                    elif v_type == 1:
+                        data = np.frombuffer(var.data, dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                else:
+                    if v_type == 0:  # int64
+                        data = np.array(list(var.int64_data), dtype="int64")
+                    elif v_type == 1:  # float32
+                        data = np.array(list(var.float_data), dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                data.shape = list(feed_inst.tensor_array[idx].shape)
+                feed_dict[name] = data
+            feed_batch.append(feed_dict)
+        return feed_batch, fetch_names, is_python
+
+    def _pack_resp_package(self, result, fetch_names, is_python, tag):
+        resp = multi_lang_general_model_service_pb2.Response()
+        # Only one model is supported temporarily
+        model_output = multi_lang_general_model_service_pb2.ModelOutput()
+        inst = multi_lang_general_model_service_pb2.FetchInst()
+        for idx, name in enumerate(fetch_names):
+            tensor = multi_lang_general_model_service_pb2.Tensor()
+            v_type = self.fetch_types_[name]
+            if is_python:
+                tensor.data = result[name].tobytes()
+            else:
+                if v_type == 0:  # int64
+                    tensor.int64_data.extend(result[name].reshape(-1).tolist())
+                elif v_type == 1:  # float32
+                    tensor.float_data.extend(result[name].reshape(-1).tolist())
+                else:
+                    raise Exception("error type.")
+            tensor.shape.extend(list(result[name].shape))
+            if name in self.lod_tensor_set_:
+                tensor.lod.extend(result["{}.lod".format(name)].tolist())
+            inst.tensor_array.append(tensor)
+        model_output.insts.append(inst)
+        resp.outputs.append(model_output)
+        resp.tag = tag
+        return resp
+
+    def inference(self, request, context):
+        feed_dict, fetch_names, is_python = self._unpack_request(request)
+        data, tag = self.bclient_.predict(
+            feed=feed_dict, fetch=fetch_names, need_variant_tag=True)
+        return self._pack_resp_package(data, fetch_names, is_python, tag)
+
+
+class MultiLangServer(object):
+    def __init__(self, worker_num=2):
+        self.bserver_ = Server()
+        self.worker_num_ = worker_num
+
+    def set_op_sequence(self, op_seq):
+        self.bserver_.set_op_sequence(op_seq)
+
+    def load_model_config(self, model_config_path):
+        if not isinstance(model_config_path, str):
+            raise Exception(
+                "MultiLangServer only supports multi-model temporarily")
+        self.bserver_.load_model_config(model_config_path)
+        self.model_config_path_ = model_config_path
+
+    def prepare_server(self, workdir=None, port=9292, device="cpu"):
+        default_port = 12000
+        self.port_list_ = []
+        for i in range(1000):
+            if default_port + i != port and self._port_is_available(default_port
+                                                                    + i):
+                self.port_list_.append(default_port + i)
+                break
+        self.bserver_.prepare_server(
+            workdir=workdir, port=self.port_list_[0], device=device)
+        self.gport_ = port
+
+    def _launch_brpc_service(self, bserver):
+        bserver.run_server()
+
+    def _port_is_available(self, port):
+        with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
+            sock.settimeout(2)
+            result = sock.connect_ex(('0.0.0.0', port))
+        return result != 0
+
+    def run_server(self):
+        p_bserver = Process(
+            target=self._launch_brpc_service, args=(self.bserver_, ))
+        p_bserver.start()
+        server = grpc.server(
+            futures.ThreadPoolExecutor(max_workers=self.worker_num_))
+        multi_lang_general_model_service_pb2_grpc.add_MultiLangGeneralModelServiceServicer_to_server(
+            MultiLangServerService(self.model_config_path_,
+                                   ["0.0.0.0:{}".format(self.port_list_[0])]),
+            server)
+        server.add_insecure_port('[::]:{}'.format(self.gport_))
+        server.start()
+        p_bserver.join()
+        server.wait_for_termination()
diff --git a/python/paddle_serving_server_gpu/monitor.py b/python/paddle_serving_server_gpu/monitor.py
index 3f1ff6436917b8ae7ff4ea06fcae1f55bd65e887..84146039c40794436030a8c5c6ba9d18ccbfda06 100644
--- a/python/paddle_serving_server_gpu/monitor.py
+++ b/python/paddle_serving_server_gpu/monitor.py
@@ -20,7 +20,7 @@ Usage:
 import os
 import time
 import argparse
-import commands
+import subprocess
 import datetime
 import shutil
 import tarfile
@@ -209,7 +209,7 @@ class HadoopMonitor(Monitor):
         remote_filepath = os.path.join(path, filename)
         cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath)
         _LOGGER.debug('check cmd: {}'.format(cmd))
-        [status, output] = commands.getstatusoutput(cmd)
+        [status, output] = subprocess.getstatusoutput(cmd)
         _LOGGER.debug('resp: {}'.format(output))
         if status == 0:
             [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split()
diff --git a/python/paddle_serving_server_gpu/web_service.py b/python/paddle_serving_server_gpu/web_service.py
index 2328453268f6cefa9c5bddb818677cc3962ea7ea..76721de8a005dfb23fbe2427671446889aa72af1 100644
--- a/python/paddle_serving_server_gpu/web_service.py
+++ b/python/paddle_serving_server_gpu/web_service.py
@@ -131,7 +131,7 @@ class WebService(object):
             for key in fetch_map:
                 fetch_map[key] = fetch_map[key].tolist()
             result = self.postprocess(
-                feed=feed, fetch=fetch, fetch_map=fetch_map)
+                feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
             result = {"result": result}
         except ValueError:
             result = {"result": "Request Value Error"}
diff --git a/python/requirements.txt b/python/requirements.txt
index d445216b3112ea3d5791045b43a6a3147865522f..4b61fa6a4f89d88338cd868134f510d179bc45b6 100644
--- a/python/requirements.txt
+++ b/python/requirements.txt
@@ -1 +1,3 @@
 numpy>=1.12, <=1.16.4 ; python_version<"3.5"
+grpcio-tools>=1.28.1
+grpcio>=1.28.1
diff --git a/python/setup.py.app.in b/python/setup.py.app.in
index 77099e667e880f3f62ab4cde9d5ae3b6295d1b90..1ee1cabb5a572536e6869852e3ab638cda6adcb8 100644
--- a/python/setup.py.app.in
+++ b/python/setup.py.app.in
@@ -42,7 +42,8 @@ if '${PACK}' == 'ON':
 
 
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'sentencepiece', 'opencv-python', 'pillow'
+    'six >= 1.10.0', 'sentencepiece', 'opencv-python', 'pillow',
+    'shapely', 'pyclipper'
 ]
 
 packages=['paddle_serving_app',
diff --git a/python/setup.py.client.in b/python/setup.py.client.in
index c46a58733a2c6ac6785e0047ab19080e92dd5695..601cfc81f0971cf1fa480b1daaed70eb6c696494 100644
--- a/python/setup.py.client.in
+++ b/python/setup.py.client.in
@@ -58,7 +58,8 @@ if '${PACK}' == 'ON':
 
 
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.1.0', 'numpy >= 1.12'
+    'six >= 1.10.0', 'protobuf >= 3.1.0', 'numpy >= 1.12', 'grpcio >= 1.28.1',
+    'grpcio-tools >= 1.28.1'
 ]
 
 if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
diff --git a/python/setup.py.server.in b/python/setup.py.server.in
index 97f02078806b20f41e917e0c385983a767a4df8c..efa9a50bb8a31fc81b97dec0243316cdc9cd8af6 100644
--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
@@ -37,13 +37,10 @@ def python_version():
 max_version, mid_version, min_version = python_version()
 
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.1.0',
-    'paddle_serving_client', 'flask >= 1.1.1'
+    'six >= 1.10.0', 'protobuf >= 3.1.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
+    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
 ]
 
-if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
-    REQUIRED_PACKAGES.append("paddlepaddle")
-
 packages=['paddle_serving_server',
           'paddle_serving_server.proto']
 
diff --git a/python/setup.py.server_gpu.in b/python/setup.py.server_gpu.in
index 6a651053391b30afb71996c5073d21a5620d3320..06b51c1c404590ed1db141f273bdc35f26c13176 100644
--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
@@ -37,12 +37,10 @@ def python_version():
 max_version, mid_version, min_version = python_version()
 
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.1.0',
-    'paddle_serving_client', 'flask >= 1.1.1'
+    'six >= 1.10.0', 'protobuf >= 3.1.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
+    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
 ]
 
-if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
-    REQUIRED_PACKAGES.append("paddlepaddle") 
 
 packages=['paddle_serving_server_gpu',
           'paddle_serving_server_gpu.proto']
diff --git a/tools/Dockerfile b/tools/Dockerfile
index dc39adf01288f092143803557b322a0c8fbcb2b4..3c701725400350247153f828410d06cec69856f5 100644
--- a/tools/Dockerfile
+++ b/tools/Dockerfile
@@ -9,4 +9,6 @@ RUN yum -y install wget && \
     yum -y install python3 python3-devel && \
     yum clean all && \
     curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
-    python get-pip.py && rm get-pip.py
+    python get-pip.py && rm get-pip.py && \
+    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
diff --git a/tools/Dockerfile.centos6.devel b/tools/Dockerfile.centos6.devel
index 5223693d846bdbc90bdefe58c26db29d6a81359d..83981dcc4731252dfc75270b5ce6fc623a0266a8 100644
--- a/tools/Dockerfile.centos6.devel
+++ b/tools/Dockerfile.centos6.devel
@@ -44,4 +44,6 @@ RUN yum -y install wget && \
     cd .. && rm -rf Python-3.6.8* && \
     pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
     yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
-    yum clean all
+    yum clean all && \
+    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
diff --git a/tools/Dockerfile.centos6.gpu.devel b/tools/Dockerfile.centos6.gpu.devel
index 1432d49abe9a4aec3b558d855c9cfcf30efef461..9ee3591b9a1e2ea5881106cf7e67ca28b24c1890 100644
--- a/tools/Dockerfile.centos6.gpu.devel
+++ b/tools/Dockerfile.centos6.gpu.devel
@@ -44,4 +44,5 @@ RUN yum -y install wget && \
     cd .. && rm -rf Python-3.6.8* && \
     pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
     yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
-    yum clean all
+    yum clean all && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
diff --git a/tools/Dockerfile.devel b/tools/Dockerfile.devel
index 385e568273eab54f7dfa51a20bb7dcd89cfa98a8..e4bcd33534cb9e887f49fcba5029619aaa1dea4c 100644
--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
@@ -21,4 +21,6 @@ RUN yum -y install wget >/dev/null \
     && yum install -y python3 python3-devel \
     && pip3 install google protobuf setuptools wheel flask \
     && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all
+    && yum clean all \
+    && localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
+    && echo "export LANG=en_US.utf8" >> /root/.bashrc
diff --git a/tools/Dockerfile.gpu b/tools/Dockerfile.gpu
index a08bdf3daef103b5944df192fef967ebd9772b6c..2f38a3a3cd1c8987d34a81259ec9ad6ba67156a7 100644
--- a/tools/Dockerfile.gpu
+++ b/tools/Dockerfile.gpu
@@ -1,5 +1,6 @@
-FROM nvidia/cuda:9.0-cudnn7-runtime-centos7
+FROM nvidia/cuda:9.0-cudnn7-devel-centos7 as builder
 
+FROM nvidia/cuda:9.0-cudnn7-runtime-centos7
 RUN yum -y install wget && \
     yum -y install epel-release && yum -y install patchelf && \
     yum -y install gcc make python-devel && \
@@ -13,4 +14,8 @@ RUN yum -y install wget && \
     ln -s /usr/local/cuda-9.0/lib64/libcublas.so.9.0 /usr/local/cuda-9.0/lib64/libcublas.so && \
     echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \
     ln -s /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so && \
-    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc
+    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc && \
+    mkdir -p /usr/local/cuda/extras
+
+COPY --from=builder /usr/local/cuda/extras/CUPTI /usr/local/cuda/extras/CUPTI
diff --git a/tools/Dockerfile.gpu.devel b/tools/Dockerfile.gpu.devel
index 2ffbe4601e1f7e9b05c87f9562b3e0ffc4b967ff..057201cefa1f8de7a105ea9b7f93e7ca9e342777 100644
--- a/tools/Dockerfile.gpu.devel
+++ b/tools/Dockerfile.gpu.devel
@@ -22,4 +22,5 @@ RUN yum -y install wget >/dev/null \
     && yum install -y python3 python3-devel \
     && pip3 install google protobuf setuptools wheel flask \
     && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all
+    && yum clean all \
+    && echo "export LANG=en_US.utf8" >> /root/.bashrc
diff --git a/tools/serving_build.sh b/tools/serving_build.sh
index 8e78e13ef8e86b55af6a90df1b9235611508c0ba..989e48ead9864e717e573f7f0800a1afba2e934a 100644
--- a/tools/serving_build.sh
+++ b/tools/serving_build.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-
+set -x
 function unsetproxy() {
     HTTP_PROXY_TEMP=$http_proxy
     HTTPS_PROXY_TEMP=$https_proxy
@@ -375,16 +375,17 @@ function python_test_multi_process(){
     sh get_data.sh
     case $TYPE in
         CPU)
-            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9292 &"
-            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9293 &"
+            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9292 --workdir test9292 &"
+            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9293 --workdir test9293 &"
             sleep 5
             check_cmd "python test_multi_process_client.py"
             kill_server_process
             echo "bert mutli rpc RPC inference pass"
             ;;
         GPU)
-            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --gpu_ids 0 &"
-            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9293 --gpu_ids 0 &"
+            rm -rf ./image #TODO: The following code tried to create this folder, but no corresponding code was found
+            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --workdir test9292 --gpu_ids 0 &"
+            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9293 --workdir test9293 --gpu_ids 0 &"
             sleep 5
             check_cmd "python test_multi_process_client.py"
             kill_server_process
@@ -454,15 +455,16 @@ function python_test_lac() {
     cd lac # pwd: /Serving/python/examples/lac
     case $TYPE in
         CPU)
-            sh get_data.sh
-            check_cmd "python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292 &"
+            python -m paddle_serving_app.package --get_model lac
+            tar -xzvf lac.tar.gz
+            check_cmd "python -m paddle_serving_server.serve --model lac_model/ --port 9292 &"
             sleep 5
-            check_cmd "echo \"我爱北京天安门\" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/"
+            check_cmd "echo \"我爱北京天安门\" | python lac_client.py lac_client/serving_client_conf.prototxt "
             echo "lac CPU RPC inference pass"
             kill_server_process
 
             unsetproxy # maybe the proxy is used on iPipe, which makes web-test failed.
-            check_cmd "python lac_web_service.py jieba_server_model/ lac_workdir 9292 &"
+            check_cmd "python lac_web_service.py lac_model/ lac_workdir 9292 &"
             sleep 5
             check_cmd "curl -H \"Content-Type:application/json\" -X POST -d '{\"feed\":[{\"words\": \"我爱北京天安门\"}], \"fetch\":[\"word_seg\"]}' http://127.0.0.1:9292/lac/prediction"
             # check http code