merge

12dbbcfe · wangjiawei04 · 4c9826b0 · 36465c6c · 12dbbcfe · 12dbbcfe
98 changed file
--- a/README.md
+++ b/README.md
+([简体中文](./README_CN.md)|English)
 <p align="center">
    <br>
 <img src='doc/serving_logo.png' width = "600" height = "130">
    <br>
 <p>
 <p align="center">
    <br>
    <a href="https://travis-ci.com/PaddlePaddle/Serving">
@@ -23,14 +26,6 @@ We consider deploying deep learning inference service online to be a user-facing
    <img src="doc/demo.gif" width="700">
 </p>
-<h2 align="center">Some Key Features</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
- **Extensible framework design** which can support model serving beyond Paddle.
 <h2 align="center">Installation</h2>
@@ -58,10 +53,42 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
 If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
-Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
+Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client.
+<h2 align="center"> Pre-built services with Paddle Serving</h2>
+<h3 align="center">Chinese Word Segmentation</h4>
+``` shell
+> python -m paddle_serving_app.package --get_model lac
+> tar -xzf lac.tar.gz
+> python lac_web_service.py lac_model/ lac_workdir 9393 &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
+{"result":[{"word_seg":"我|爱|北京|天安门"}]}
+```
+<h3 align="center">Image Classification</h4>
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+``` shell
+> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+> tar -xzf resnet_v2_50_imagenet.tar.gz
+> python resnet50_imagenet_classify.py resnet50_serving_model &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
+```
 <h2 align="center">Quick Start Example</h2>
+This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
 ### Boston House Price Prediction model
 ``` shell
 wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
@@ -84,9 +111,9 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `port` | int | `9292` | Exposed port of current service to users|
 | `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
-| `mem_optim` | bool | `False` | Enable memory / graphic memory optimization |
+| `mem_optim` | - | - | Enable memory / graphic memory optimization |
-| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
+| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
-| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
+| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
 </center>
@@ -117,138 +144,13 @@ print(fetch_map)
 ```
 Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
-<h2 align="center"> Pre-built services with Paddle Serving</h2>
+<h2 align="center">Some Key Features of Paddle Serving</h2>
-<h3 align="center">Chinese Word Segmentation</h4>
- **Description**: 
-``` shell
-Chinese word segmentation HTTP service that can be deployed with one line command.
-```
- **Download Servable Package**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-```
- **Host web service**: 
-``` shell
-tar -xzf lac_model_jieba_web.tar.gz
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
-```
- **Request sample**: 
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
-```
- **Request result**: 
-``` shell
-{"word_seg":"我|爱|北京|天安门"}
-```
-<h3 align="center">Image Classification</h4>
- **Description**: 
-``` shell
-Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
-Note: This demo needs paddle-serving-server-gpu. 
-```
- **Download Servable Package**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
-```
- **Host web service**: 
-``` shell
-tar -xzf imagenet_demo.tar.gz
-python image_classification_service_demo.py resnet50_serving_model
-```
- **Request sample**: 
-<p align="center">
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-```
- **Request result**: 
-``` shell
-{"label":"daisy","prob":0.9341403245925903}
-```
-<h3 align="center">More Demos</h3>
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Bert-Base-Baike                                              |
-| URL                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
-| Description        | Get semantic representation from a Chinese Sentence          |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Resnet50-Imagenet                                            |
-| URL                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| Description        | Get image semantic representation from an image              |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Resnet101-Imagenet                                           |
-| URL                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| Description        | Get image semantic representation from an image              |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | CNN-IMDB                                                     |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | LSTM-IMDB                                                    |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | BOW-IMDB                                                     |
-| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| Description        | Get category probability from an English Sentence            |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | Jieba-LAC                                                    |
-| URL                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
-| Description        | Get word segmentation from a Chinese Sentence                |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| Model Name         | DNN-CTR                                                      |
-| URL                | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz                            |
-| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
-| Description        | Get click probability from a feature vector of item          |
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
+- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
+- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
+- **Highly concurrent and efficient communication** between clients and servers supported.
+- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
 <h2 align="center">Document</h2>
@@ -268,13 +170,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
 ### About Efficiency
 - [How to profile Paddle Serving latency?](python/examples/util)
- [How to optimize performance?(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
+- [How to optimize performance?](doc/PERFORMANCE_OPTIM.md)
 - [Deploy multi-services on one GPU(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
 - [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
 - [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
 ### FAQ
- [FAQ(Chinese)](doc/deprecated/FAQ.md)
+- [FAQ(Chinese)](doc/FAQ.md)
 ### Design

--- a/README_CN.md
+++ b/README_CN.md
+(简体中文|[English](./README.md))
 <p align="center">
    <br>
 <img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130">
    <br>
 <p>
 <p align="center">
    <br>
    <a href="https://travis-ci.com/PaddlePaddle/Serving">
@@ -24,14 +27,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
    <img src="doc/demo.gif" width="700">
 </p>
-<h2 align="center">核心功能</h2>
- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
- **可伸缩框架设计** 可支持不限于Paddle的模型服务.
 <h2 align="center">安装</h2>
@@ -59,9 +55,40 @@ pip install paddle-serving-server-gpu # GPU
 如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。
-客户端安装包支持Centos 7和Ubuntu 18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
+Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
+<h2 align="center"> Paddle Serving预装的服务 </h2>
+<h3 align="center">中文分词</h4>
+``` shell
+> python -m paddle_serving_app.package --get_model lac
+> tar -xzf lac.tar.gz
+> python lac_web_service.py lac_model/ lac_workdir 9393 &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
+{"result":[{"word_seg":"我|爱|北京|天安门"}]}
+```
+<h3 align="center">图像分类</h4>
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+``` shell
+> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+> tar -xzf resnet_v2_50_imagenet.tar.gz
+> python resnet50_imagenet_classify.py resnet50_serving_model &
+> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
+```
-<h2 align="center">快速启动示例</h2>
+<h2 align="center">快速开始示例</h2>
+这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的，而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程，请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)
 <h3 align="center">波士顿房价预测</h3>
@@ -88,9 +115,9 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `port` | int | `9292` | Exposed port of current service to users|
 | `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
-| `mem_optim` | bool | `False` | Enable memory optimization |
+| `mem_optim` | - | - | Enable memory optimization |
-| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
+| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
-| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
+| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
 </center>
@@ -122,139 +149,13 @@ print(fetch_map)
 ```
 在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。
-<h2 align="center">Paddle Serving预装的服务</h2>
+<h2 align="center">Paddle Serving的核心功能</h2>
-<h3 align="center">中文分词模型</h4>
- **介绍**: 
-``` shell
-本示例为中文分词HTTP服务一键部署
-```
- **下载服务包**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-```
- **启动web服务**: 
-``` shell
-tar -xzf lac_model_jieba_web.tar.gz
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
-```
- **客户端请求示例**: 
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
-```
- **返回结果示例**: 
-``` shell
-{"word_seg":"我|爱|北京|天安门"}
-```
-<h3 align="center">图像分类模型</h4>
- **介绍**: 
-``` shell
-图像分类模型由Imagenet数据集训练而成，该服务会返回一个标签及其概率
-注意：本示例需要安装paddle-serving-server-gpu
-```
- **下载服务包**: 
-``` shell
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
-```
- **启动web服务**: 
-``` shell
-tar -xzf imagenet_demo.tar.gz
-python image_classification_service_demo.py resnet50_serving_model
-```
- **客户端请求示例**: 
-<p align="center">
-    <br>
-<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
-    <br>
-<p>
-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
-```
- **返回结果示例**: 
-``` shell
-{"label":"daisy","prob":0.9341403245925903}
-```
-<h3 align="center">更多示例</h3>
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名              | Bert-Base-Baike                                              |
-| 下载链接                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
-| 客户端/服务端代码     | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
-| 介绍                | 获得一个中文语句的语义表示          |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | Resnet50-Imagenet                                            |
-| 下载链接                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| 介绍        | 获得一张图片的图像语义表示              |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名       | Resnet101-Imagenet                                           |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
-| 介绍      | 获得一张图片的图像语义表示              |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名        | CNN-IMDB                                                     |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍       | 从一个中文语句获得类别及其概率           |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | LSTM-IMDB                                                    |
-| 下载链接               | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍        | 从一个英文语句获得类别及其概率            |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | BOW-IMDB                                                     |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
-| 介绍       | 从一个英文语句获得类别及其概率            |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | Jieba-LAC                                                    |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
-| 介绍       | 获取中文语句的分词                |
-| Key                | Value                                                        |
-| :----------------- | :----------------------------------------------------------- |
-| 模型名         | DNN-CTR                                                      |
-| 下载链接                | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz                    |
-| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
-| 介绍        | 从项目的特征向量中获得点击概率        |
+- 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
+- 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
+- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
+- 支持客户端和服务端之间 **高并发和高效通信**.
+- 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
 <h2 align="center">文档</h2>
@@ -280,7 +181,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
 - [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
 ### FAQ
- [常见问答](doc/deprecated/FAQ.md)
+- [常见问答](doc/FAQ.md)
 ### 设计文档
 - [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)

--- a/cmake/external/protobuf.cmake
+++ b/cmake/external/protobuf.cmake
@@ -86,6 +86,63 @@ function(protobuf_generate_python SRCS)
    set(${SRCS} ${${SRCS}} PARENT_SCOPE)
 endfunction()
+function(grpc_protobuf_generate_python SRCS)
+    # shameless copy from https://github.com/Kitware/CMake/blob/master/Modules/FindProtobuf.cmake
+    if(NOT ARGN)
+        message(SEND_ERROR "Error: GRPC_PROTOBUF_GENERATE_PYTHON() called without any proto files")
+        return()
+    endif()
+    if(PROTOBUF_GENERATE_CPP_APPEND_PATH)
+        # Create an include path for each file specified
+        foreach(FIL ${ARGN})
+            get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
+            get_filename_component(ABS_PATH ${ABS_FIL} PATH)
+            list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
+            if(${_contains_already} EQUAL -1)
+                list(APPEND _protobuf_include_path -I ${ABS_PATH})
+            endif()
+        endforeach()
+    else()
+        set(_protobuf_include_path -I ${CMAKE_CURRENT_SOURCE_DIR})
+    endif()
+    if(DEFINED PROTOBUF_IMPORT_DIRS AND NOT DEFINED Protobuf_IMPORT_DIRS)
+        set(Protobuf_IMPORT_DIRS "${PROTOBUF_IMPORT_DIRS}")
+    endif()
+    if(DEFINED Protobuf_IMPORT_DIRS)
+        foreach(DIR ${Protobuf_IMPORT_DIRS})
+            get_filename_component(ABS_PATH ${DIR} ABSOLUTE)
+            list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
+            if(${_contains_already} EQUAL -1)
+                list(APPEND _protobuf_include_path -I ${ABS_PATH})
+            endif()
+        endforeach()
+    endif()
+    set(${SRCS})
+    foreach(FIL ${ARGN})
+        get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
+        get_filename_component(FIL_WE ${FIL} NAME_WE)
+        if(NOT PROTOBUF_GENERATE_CPP_APPEND_PATH)
+            get_filename_component(FIL_DIR ${FIL} DIRECTORY)
+            if(FIL_DIR)
+                set(FIL_WE "${FIL_DIR}/${FIL_WE}")
+            endif()
+        endif()
+        list(APPEND ${SRCS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2_grpc.py")
+        add_custom_command(
+                OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2_grpc.py"
+                COMMAND ${PYTHON_EXECUTABLE} -m grpc_tools.protoc --python_out ${CMAKE_CURRENT_BINARY_DIR} --grpc_python_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
+                DEPENDS ${ABS_FIL}
+                COMMENT "Running Python grpc protocol buffer compiler on ${FIL}"
+                VERBATIM )
+    endforeach()
+    set(${SRCS} ${${SRCS}} PARENT_SCOPE)
+endfunction()
 # Print and set the protobuf library information,
 # finish this cmake process and exit from this file.
 macro(PROMPT_PROTOBUF_LIB)

--- a/cmake/generic.cmake
+++ b/cmake/generic.cmake
@@ -704,6 +704,15 @@ function(py_proto_compile TARGET_NAME)
  add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs})
 endfunction()
+function(py_grpc_proto_compile TARGET_NAME)
+  set(oneValueArgs "")
+  set(multiValueArgs SRCS)
+  cmake_parse_arguments(py_grpc_proto_compile "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
+  set(py_srcs)
+  grpc_protobuf_generate_python(py_srcs ${py_grpc_proto_compile_SRCS})
+  add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs})
+endfunction()
 function(py_test TARGET_NAME)
  if(WITH_TESTING)
    set(options "")

--- a/core/configure/CMakeLists.txt
+++ b/core/configure/CMakeLists.txt
@@ -35,6 +35,13 @@ py_proto_compile(general_model_config_py_proto SRCS proto/general_model_config.p
 add_custom_target(general_model_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
 add_dependencies(general_model_config_py_proto general_model_config_py_proto_init)
+py_grpc_proto_compile(multi_lang_general_model_service_py_proto SRCS proto/multi_lang_general_model_service.proto)
+add_custom_target(multi_lang_general_model_service_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
+add_dependencies(multi_lang_general_model_service_py_proto multi_lang_general_model_service_py_proto_init)
+py_grpc_proto_compile(general_python_service_py_proto SRCS proto/general_python_service.proto)
+add_custom_target(general_python_service_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
+add_dependencies(general_python_service_py_proto general_python_service_py_proto_init)
 if (CLIENT)
 py_proto_compile(sdk_configure_py_proto SRCS proto/sdk_configure.proto)
 add_custom_target(sdk_configure_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
@@ -51,6 +58,17 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
                COMMENT "Copy generated general_model_config proto file into directory paddle_serving_client/proto."
                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+                COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_client/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET general_python_service_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
+        COMMENT "Copy generated general_python_service proto file into directory paddle_serving_client/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 endif()
 if (APP)
@@ -65,6 +83,11 @@ if (SERVER)
 py_proto_compile(server_config_py_proto SRCS proto/server_configure.proto)
 add_custom_target(server_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
 add_dependencies(server_config_py_proto server_config_py_proto_init)
+py_proto_compile(pyserving_channel_py_proto SRCS proto/pyserving_channel.proto)
+add_custom_target(pyserving_channel_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
+add_dependencies(pyserving_channel_py_proto pyserving_channel_py_proto_init)
 if (NOT WITH_GPU)
 add_custom_command(TARGET server_config_py_proto POST_BUILD
 		COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
@@ -77,6 +100,24 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
 		COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
 		COMMENT "Copy generated general_model_config proto file into directory paddle_serving_server/proto."
 		WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET general_python_service_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMENT "Copy generated general_python_service proto file into directory paddle_serving_server/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET pyserving_channel_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+        COMMENT "Copy generated pyserving_channel proto file into directory paddle_serving_server/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
+                COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_server/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 else()
 add_custom_command(TARGET server_config_py_proto POST_BUILD
 		COMMAND ${CMAKE_COMMAND} -E make_directory
@@ -95,5 +136,23 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
 		COMMENT "Copy generated general_model_config proto file into directory
        paddle_serving_server_gpu/proto."
 		WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET general_python_service_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMENT "Copy generated general_python_service proto file into directory paddle_serving_server_gpu/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET pyserving_channel_py_proto POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+        COMMENT "Copy generated pyserving_channel proto file into directory paddle_serving_server_gpu/proto."
+        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
+                COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+                COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
+                COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_server_gpu/proto."
+                WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
 endif()
 endif()
--- a/python/paddle_serving_server/general_python_service.proto
+++ b/python/paddle_serving_server/general_python_service.proto
@@ -13,6 +13,7 @@
 // limitations under the License.
 syntax = "proto2";
+package baidu.paddle_serving.pyserving;
 service GeneralPythonService {
  rpc inference(Request) returns (Response) {}
@@ -21,11 +22,15 @@ service GeneralPythonService {
 message Request {
  repeated bytes feed_insts = 1;
  repeated string feed_var_names = 2;
+  repeated bytes shape = 3;
+  repeated string type = 4;
 }
 message Response {
  repeated bytes fetch_insts = 1;
  repeated string fetch_var_names = 2;
-  required int32 is_error = 3;
+  required int32 ecode = 3;
  optional string error_info = 4;
+  repeated bytes shape = 5;
+  repeated string type = 6;
 }
--- a/python/paddle_serving_client/general_python_service.proto
+++ b/python/paddle_serving_client/general_python_service.proto
-// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
@@ -14,18 +14,37 @@
 syntax = "proto2";
-service GeneralPythonService {
+message Tensor {
-  rpc inference(Request) returns (Response) {}
+  optional bytes data = 1;
-}
+  repeated int32 int_data = 2;
+  repeated int64 int64_data = 3;
+  repeated float float_data = 4;
+  optional int32 elem_type = 5;
+  repeated int32 shape = 6;
+  repeated int32 lod = 7; // only for fetch tensor currently
+};
+message FeedInst { repeated Tensor tensor_array = 1; };
+message FetchInst { repeated Tensor tensor_array = 1; };
 message Request {
-  repeated bytes feed_insts = 1;
+  repeated FeedInst insts = 1;
  repeated string feed_var_names = 2;
-}
+  repeated string fetch_var_names = 3;
+  required bool is_python = 4 [ default = false ];
+};
 message Response {
-  repeated bytes fetch_insts = 1;
+  repeated ModelOutput outputs = 1;
-  repeated string fetch_var_names = 2;
+  optional string tag = 2;
-  required int32 is_error = 3;
+};
-  optional string error_info = 4;
+message ModelOutput {
+  repeated FetchInst insts = 1;
+  optional string engine_name = 2;
 }
+service MultiLangGeneralModelService {
+  rpc inference(Request) returns (Response) {}
+};
--- a/python/paddle_serving_server/python_service_channel.proto
+++ b/python/paddle_serving_server/python_service_channel.proto
@@ -13,17 +13,19 @@
 // limitations under the License.
 syntax = "proto2";
+package baidu.paddle_serving.pyserving;
 message ChannelData {
  repeated Inst insts = 1;
  required int32 id = 2;
-  optional string type = 3
+  required int32 type = 3 [ default = 0 ];
-      [ default = "CD" ]; // CD(channel data), CF(channel futures)
+  required int32 ecode = 4;
-  required int32 is_error = 4;
  optional string error_info = 5;
 }
 message Inst {
  required bytes data = 1;
  required string name = 2;
+  required bytes shape = 3;
+  required string type = 4;
 }
--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -21,7 +21,7 @@ The following Python code will process the data `test_data/part-0` and write to
 [//file]:#process.py
 ``` python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')
@@ -78,7 +78,7 @@ with open('processed.data') as f:
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1
@@ -88,7 +88,7 @@ with open('processed.data') as f:
 In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
-When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
+When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
 ### Expected Results

--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -20,7 +20,7 @@ sh get_data.sh
 下面Python代码将处理`test_data/part-0`的数据，写入`processed.data`文件中。
 ```python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')
@@ -76,7 +76,7 @@ with open('processed.data') as f:
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1

--- a/doc/BERT_10_MINS.md
+++ b/doc/BERT_10_MINS.md
@@ -59,7 +59,7 @@ the script of client side bert_client.py is as follow:
 import os
 import sys
 from paddle_serving_client import Client
-from paddle_serving_app import ChineseBertReader
+from paddle_serving_app.reader import ChineseBertReader
 reader = ChineseBertReader()
 fetch = ["pooled_output"]

--- a/doc/BERT_10_MINS_CN.md
+++ b/doc/BERT_10_MINS_CN.md
@@ -52,7 +52,7 @@ pip install paddle_serving_app
 ``` python
 import sys
 from paddle_serving_client import Client
-from paddle_serving_app import ChineseBertReader
+from paddle_serving_app.reader import ChineseBertReader
 reader = ChineseBertReader()
 fetch = ["pooled_output"]

--- a/doc/COMPILE.md
+++ b/doc/COMPILE.md
@@ -20,7 +20,7 @@ This document will take Python2 as an example to show how to compile Paddle Serv
 - Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
 - Set  `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3`
+- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`
 ## Get Code
@@ -36,6 +36,8 @@ cd Serving && git submodule update --init --recursive
 export PYTHONROOT=/usr/
 ```
+In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
 ## Compile Server
 ### Integrated CPU version paddle inference library

--- a/doc/COMPILE_CN.md
+++ b/doc/COMPILE_CN.md
@@ -20,7 +20,7 @@
 - 将`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
 - 将`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
- 将`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3`
+- 将`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`
 ## 获取代码
@@ -36,6 +36,8 @@ cd Serving && git submodule update --init --recursive
 export PYTHONROOT=/usr/
 ```
+我们提供默认Centos7的Python路径为`/usr/bin/python`，如果您要使用我们的Centos6镜像，需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`。
 ## 编译Server部分
 ### 集成CPU版本Paddle Inference Library

--- a/doc/FAQ.md
+++ b/doc/FAQ.md
+# FAQ
+- Q：如何调整RPC服务的等待时间，避免超时？ 
+  A：使用set_rpc_timeout_ms设置更长的等待时间，单位为毫秒，默认时间为20秒。
+  示例：
+  ```
+  from paddle_serving_client import Client
+  client = Client()
+  client.load_client_config(sys.argv[1])
+  client.set_rpc_timeout_ms(100000)
+  client.connect(["127.0.0.1:9393"])
+  ```
--- a/doc/HOT_LOADING_IN_SERVING.md
+++ b/doc/HOT_LOADING_IN_SERVING.md
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
 ### Product model
-Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
+Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())
 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
+    afs = 'afs://***.***.***.***:***' # User needs to change
-    os.system('{} fs -put -f {} {}'.format(
+    uci = '***,***' # User needs to change
-      hadoop_bin, local_file_path, remote_path))
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
 name = "uci_housing"
 for pass_id in range(30):

--- a/doc/HOT_LOADING_IN_SERVING_CN.md
+++ b/doc/HOT_LOADING_IN_SERVING_CN.md
@@ -46,7 +46,7 @@ Paddle Serving提供了一个自动监控脚本，远端地址更新模型后会
 ### 生产模型
-在`product_path`下运行下面的Python代码生产模型，每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
+在`product_path`下运行下面的Python代码生产模型（运行前需要修改hadoop相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())
 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
+    afs = 'afs://***.***.***.***:***' # User needs to change
-    os.system('{} fs -put -f {} {}'.format(
+    uci = '***,***' # User needs to change
-      hadoop_bin, local_file_path, remote_path))
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
 name = "uci_housing"
 for pass_id in range(30):

--- a/doc/IMDB_GO_CLIENT_CN.md
+++ b/doc/IMDB_GO_CLIENT_CN.md
@@ -99,7 +99,7 @@ func main() {
 ### 基于IMDB测试集的预测
 ```python
-go run imdb_client.go serving_client_conf / serving_client_conf.stream.prototxt test.data> result
+go run imdb_client.go serving_client_conf/serving_client_conf.stream.prototxt test.data> result
 ```
 ### 计算精度

--- a/doc/PERFORMANCE_OPTIM.md
+++ b/doc/PERFORMANCE_OPTIM.md
@@ -2,9 +2,9 @@
 ([简体中文](./PERFORMANCE_OPTIM_CN.md)|English)
-Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computationa-intensive services.
+Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
-For a prediction service, the easiest way to determine what type it is is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
+For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
 For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
@@ -16,5 +16,5 @@ Parameters for performance optimization:
 | Parameters | Type | Default | Description                                                  |
 | ---------- | ---- | ------- | ------------------------------------------------------------ |
-| mem_optim  | bool | False   | Enable memory / graphic memory optimization                                   |
+| mem_optim  | - | - | Enable memory / graphic memory optimization                                   |
-| ir_optim   | bool | Fasle   | Enable analysis and optimization of calculation graph,including OP fusion, etc |
+| ir_optim   | - | -  | Enable analysis and optimization of calculation graph,including OP fusion, etc |
--- a/doc/PERFORMANCE_OPTIM_CN.md
+++ b/doc/PERFORMANCE_OPTIM_CN.md
@@ -16,5 +16,5 @@
 | 参数      | 类型 | 默认值 | 含义                      |
 | --------- | ---- | ------ | -------------------------------- |
-| mem_optim | bool | False  | 开启内存/显存优化                |
+| mem_optim | - | -  | 开启内存/显存优化                |
-| ir_optim  | bool | Fasle  | 开启计算图分析优化，包括OP融合等 |
+| ir_optim  | - | -  | 开启计算图分析优化，包括OP融合等 |
--- a/doc/SAVE.md
+++ b/doc/SAVE.md
@@ -34,7 +34,7 @@ for line in sys.stdin:
 ## Export from saved model files
 If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
-```
+```python
 import paddle_serving_client.io as serving_io
 serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
 ```

--- a/doc/SAVE_CN.md
+++ b/doc/SAVE_CN.md
@@ -35,7 +35,7 @@ for line in sys.stdin:
 ## 从已保存的模型文件中导出
 如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型，则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
-```
+```python
 import paddle_serving_client.io as serving_io
 serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client",  model_filename=None, params_filename=None)
 ```

--- a/doc/UWSGI_DEPLOY.md
+++ b/doc/UWSGI_DEPLOY.md
@@ -18,7 +18,7 @@ http://10.127.3.150:9393/uci/prediction
 Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. 
 The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
-Next, we will show how to use the [uWSGI] (https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
+Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
 ```python
@@ -29,7 +29,7 @@ from paddle_serving_server.web_service import WebService
 uci_service = WebService(name = "uci")
 uci_service.load_model_config("./uci_housing_model")
 uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_server()
+uci_service.run_rpc_service()
 #Get flask application
 app_instance = uci_service.get_app_instance()
 ```

--- a/doc/UWSGI_DEPLOY_CN.md
+++ b/doc/UWSGI_DEPLOY_CN.md
@@ -29,7 +29,7 @@ from paddle_serving_server.web_service import WebService
 uci_service = WebService(name = "uci")
 uci_service.load_model_config("./uci_housing_model")
 uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_server()
+uci_service.run_rpc_service()
 #获取flask服务
 app_instance = uci_service.get_app_instance()
 ```

--- a/python/examples/bert/benchmark.py
+++ b/python/examples/bert/benchmark.py
@@ -19,13 +19,11 @@ from __future__ import unicode_literals, absolute_import
 import os
 import sys
 import time
+import json
+import requests
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
+from paddle_serving_client.utils import benchmark_args, show_latency
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
 from paddle_serving_app.reader import ChineseBertReader
 args = benchmark_args()
@@ -36,42 +34,105 @@ def single_func(idx, resource):
    dataset = []
    for line in fin:
        dataset.append(line.strip())
+    profile_flags = False
+    latency_flags = False
+    if os.getenv("FLAGS_profile_client"):
+        profile_flags = True
+    if os.getenv("FLAGS_serving_latency"):
+        latency_flags = True
+        latency_list = []
    if args.request == "rpc":
-        reader = ChineseBertReader(vocab_file="vocab.txt", max_seq_len=20)
+        reader = ChineseBertReader({"max_seq_len": 128})
        fetch = ["pooled_output"]
        client = Client()
        client.load_client_config(args.model)
        client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
        start = time.time()
-        for i in range(1000):
+        for i in range(turns):
-            if args.batch_size == 1:
+            if args.batch_size >= 1:
-                feed_dict = reader.process(dataset[i])
+                l_start = time.time()
-                result = client.predict(feed=feed_dict, fetch=fetch)
+                feed_batch = []
+                b_start = time.time()
+                for bi in range(args.batch_size):
+                    feed_batch.append(reader.process(dataset[bi]))
+                b_end = time.time()
+                if profile_flags:
+                    sys.stderr.write(
+                        "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
+                            os.getpid(),
+                            int(round(b_start * 1000000)),
+                            int(round(b_end * 1000000))))
+                result = client.predict(feed=feed_batch, fetch=fetch)
+                l_end = time.time()
+                if latency_flags:
+                    latency_list.append(l_end * 1000 - l_start * 1000)
            else:
                print("unsupport batch size {}".format(args.batch_size))
    elif args.request == "http":
+        reader = ChineseBertReader({"max_seq_len": 128})
+        fetch = ["pooled_output"]
+        server = "http://" + resource["endpoint"][idx % len(resource[
+            "endpoint"])] + "/bert/prediction"
        start = time.time()
-        header = {"Content-Type": "application/json"}
+        for i in range(turns):
-        for i in range(1000):
+            if args.batch_size >= 1:
-            dict_data = {"words": dataset[i], "fetch": ["pooled_output"]}
+                l_start = time.time()
-            r = requests.post(
+                feed_batch = []
-                'http://{}/bert/prediction'.format(resource["endpoint"][
+                b_start = time.time()
-                    idx % len(resource["endpoint"])]),
+                for bi in range(args.batch_size):
-                data=json.dumps(dict_data),
+                    feed_batch.append({"words": dataset[bi]})
-                headers=header)
+                req = json.dumps({"feed": feed_batch, "fetch": fetch})
+                b_end = time.time()
+                if profile_flags:
+                    sys.stderr.write(
+                        "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
+                            os.getpid(),
+                            int(round(b_start * 1000000)),
+                            int(round(b_end * 1000000))))
+                result = requests.post(
+                    server,
+                    data=req,
+                    headers={"Content-Type": "application/json"})
+                l_end = time.time()
+                if latency_flags:
+                    latency_list.append(l_end * 1000 - l_start * 1000)
+            else:
+                print("unsupport batch size {}".format(args.batch_size))
+    else:
+        raise ValueError("not implemented {} request".format(args.request))
    end = time.time()
-    return [[end - start]]
+    if latency_flags:
+        return [[end - start], latency_list]
+    else:
+        return [[end - start]]
 if __name__ == '__main__':
    multi_thread_runner = MultiThreadRunner()
    endpoint_list = ["127.0.0.1:9292"]
-    result = multi_thread_runner.run(single_func, args.thread,
+    turns = 10
-                                     {"endpoint": endpoint_list})
+    start = time.time()
+    result = multi_thread_runner.run(
+        single_func, args.thread, {"endpoint": endpoint_list,
+                                   "turns": turns})
+    end = time.time()
+    total_cost = end - start
    avg_cost = 0
    for i in range(args.thread):
        avg_cost += result[0][i]
    avg_cost = avg_cost / args.thread
-    print("average total cost {} s.".format(avg_cost))
+    print("total cost :{} s".format(total_cost))
+    print("each thread cost :{} s. ".format(avg_cost))
+    print("qps :{} samples/s".format(args.batch_size * args.thread * turns /
+                                     total_cost))
+    if os.getenv("FLAGS_serving_latency"):
+        show_latency(result[1])
--- a/python/examples/bert/benchmark.sh
+++ b/python/examples/bert/benchmark.sh
 rm profile_log
-for thread_num in 1 2 4 8 16
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+export FLAGS_serving_latency=1
+python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog &
+sleep 5
+#warm up
+python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+for thread_num in 4 8 16
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
+for batch_size in 1 4 16 64 256
-    echo "========================================"
+do
-    echo "batch size : $batch_size" >> profile_log
+    python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
+    echo "model name :" $1
-    tail -n 1 profile >> profile_log
+    echo "thread num :" $thread_num
+    echo "batch size :" $batch_size
+    echo "=================Done===================="
+    echo "model name :$1" >> profile_log_$1
+    echo "batch size :$batch_size" >> profile_log_$1
+    python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
+    tail -n 8 profile >> profile_log_$1
+    echo "" >> profile_log_$1
+done
 done
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/bert/benchmark_batch.py
+++ b/python/examples/bert/benchmark_batch.py
-# -*- coding: utf-8 -*-
-#
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-from __future__ import unicode_literals, absolute_import
-import os
-import sys
-import time
-from paddle_serving_client import Client
-from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
-from bert_reader import BertReader
-args = benchmark_args()
-def single_func(idx, resource):
-    fin = open("data-c.txt")
-    dataset = []
-    for line in fin:
-        dataset.append(line.strip())
-    profile_flags = False
-    if os.environ["FLAGS_profile_client"]:
-        profile_flags = True
-    if args.request == "rpc":
-        reader = BertReader(vocab_file="vocab.txt", max_seq_len=20)
-        fetch = ["pooled_output"]
-        client = Client()
-        client.load_client_config(args.model)
-        client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
-        start = time.time()
-        for i in range(1000):
-            if args.batch_size >= 1:
-                feed_batch = []
-                b_start = time.time()
-                for bi in range(args.batch_size):
-                    feed_batch.append(reader.process(dataset[bi]))
-                b_end = time.time()
-                if profile_flags:
-                    print("PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}".format(
-                        os.getpid(),
-                        int(round(b_start * 1000000)),
-                        int(round(b_end * 1000000))))
-                result = client.predict(feed=feed_batch, fetch=fetch)
-            else:
-                print("unsupport batch size {}".format(args.batch_size))
-    elif args.request == "http":
-        raise ("no batch predict for http")
-    end = time.time()
-    return [[end - start]]
-if __name__ == '__main__':
-    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9292"]
-    result = multi_thread_runner.run(single_func, args.thread,
-                                     {"endpoint": endpoint_list})
-    avg_cost = 0
-    for i in range(args.thread):
-        avg_cost += result[0][i]
-    avg_cost = avg_cost / args.thread
-    print("average total cost {} s.".format(avg_cost))
--- a/python/examples/bert/benchmark_batch.sh
+++ b/python/examples/bert/benchmark_batch.sh
-rm profile_log
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9295 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
-sleep 5
-for thread_num in 1 2 4 8 16
-do
-for batch_size in 1 2 4 8 16 32 64 128 256 512
-do
-    $PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
-    echo "thread num: ", $thread_num
-    echo "batch size: ", $batch_size
-    echo "batch size : $batch_size" >> profile_log
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
-done
-done
--- a/python/examples/bert/bert_client.py
+++ b/python/examples/bert/bert_client.py
@@ -14,15 +14,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import os
 import sys
-import numpy as np
-import paddlehub as hub
-import ujson
-import random
-import time
-from paddlehub.common.logger import logger
-import socket
 from paddle_serving_client import Client
 from paddle_serving_client.utils import benchmark_args
 from paddle_serving_app.reader import ChineseBertReader

--- a/python/examples/bert/bert_web_service.py
+++ b/python/examples/bert/bert_web_service.py
@@ -21,7 +21,10 @@ import os
 class BertService(WebService):
    def load(self):
-        self.reader = ChineseBertReader(vocab_file="vocab.txt", max_seq_len=128)
+        self.reader = ChineseBertReader({
+            "vocab_file": "vocab.txt",
+            "max_seq_len": 128
+        })
    def preprocess(self, feed=[], fetch=[]):
        feed_res = [

--- a/python/examples/criteo_ctr_with_cube/cube_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_prepare.sh
@@ -17,6 +17,6 @@
 mkdir -p cube_model
 mkdir -p cube/data
 ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
 mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
 cd cube && ./cube 
--- a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
@@ -17,6 +17,6 @@
 mkdir -p cube_model
 mkdir -p cube/data
 ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
 mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
 cd cube && ./cube 
--- a/python/examples/deeplabv3/README.md
+++ b/python/examples/deeplabv3/README.md
+# Image Segmentation
+## Get Model
+```
+python -m paddle_serving_app.package --get_model deeplabv3
+tar -xzvf deeplabv3.tar.gz
+```
+## RPC Service
+### Start Service
+```
+python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
+```
+### Client Prediction
+```
+python deeplabv3_client.py
+```
--- a/python/examples/deeplabv3/README_CN.md
+++ b/python/examples/deeplabv3/README_CN.md
+# 图像分割
+## 获取模型
+```
+python -m paddle_serving_app.package --get_model deeplabv3
+tar -xzvf deeplabv3.tar.gz
+```
+## RPC 服务
+### 启动服务端
+```
+python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
+```
+### 客户端预测
+```
+python deeplabv3_client.py
--- a/python/examples/deeplabv3/deeplabv3_client.py
+++ b/python/examples/deeplabv3/deeplabv3_client.py
@@ -18,7 +18,7 @@ import sys
 import cv2
 client = Client()
-client.load_client_config("seg_client/serving_client_conf.prototxt")
+client.load_client_config("deeplabv3_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9494"])
 preprocess = Sequential(

--- a/python/examples/faster_rcnn_model/README.md
+++ b/python/examples/faster_rcnn_model/README.md
@@ -12,8 +12,8 @@ If you want to have more detection models, please refer to [Paddle Detection Mod
 ### Start the service
 ```
 tar xf faster_rcnn_model.tar.gz
-mv faster_rcnn_model/pddet *.
+mv faster_rcnn_model/pddet* .
-GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
+GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
 ### Perform prediction

--- a/python/examples/faster_rcnn_model/README_CN.md
+++ b/python/examples/faster_rcnn_model/README_CN.md
@@ -13,7 +13,7 @@ wget https://paddle-serving.bj.bcebos.com/pddet_demo/infer_cfg.yml
 ```
 tar xf faster_rcnn_model.tar.gz
 mv faster_rcnn_model/pddet* ./
-GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
+GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
 ```
 ### 执行预测

--- a/python/examples/fit_a_line/test_multi_process_client.py
+++ b/python/examples/fit_a_line/test_multi_process_client.py
@@ -22,15 +22,19 @@ def single_func(idx, resource):
    client.load_client_config(
        "./uci_housing_client/serving_client_conf.prototxt")
    client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
-    test_reader = paddle.batch(
+    x = [
-        paddle.reader.shuffle(
+        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
-            paddle.dataset.uci_housing.test(), buf_size=500),
+        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
-        batch_size=1)
+    ]
-    for data in test_reader():
+    for i in range(1000):
-        fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["price"])
+        fetch_map = client.predict(feed={"x": x}, fetch=["price"])
+        if fetch_map is None:
+            return [[None]]
    return [[0]]
 multi_thread_runner = MultiThreadRunner()
 thread_num = 4
 result = multi_thread_runner.run(single_func, thread_num, {})
+if None in result[0]:
+    exit(1)
--- a/python/examples/fit_a_line/test_multilang_client.py
+++ b/python/examples/fit_a_line/test_multilang_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+from paddle_serving_client import MultiLangClient
+import sys
+client = MultiLangClient()
+client.load_client_config(sys.argv[1])
+client.connect(["127.0.0.1:9393"])
+import paddle
+test_reader = paddle.batch(
+    paddle.reader.shuffle(
+        paddle.dataset.uci_housing.test(), buf_size=500),
+    batch_size=1)
+for data in test_reader():
+    future = client.predict(feed={"x": data[0][0]}, fetch=["price"], asyn=True)
+    fetch_map = future.result()
+    print("{} {}".format(fetch_map["price"][0], data[0][1][0]))
--- a/python/examples/fit_a_line/test_multilang_server.py
+++ b/python/examples/fit_a_line/test_multilang_server.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+import os
+import sys
+from paddle_serving_server import OpMaker
+from paddle_serving_server import OpSeqMaker
+from paddle_serving_server import MultiLangServer
+op_maker = OpMaker()
+read_op = op_maker.create('general_reader')
+general_infer_op = op_maker.create('general_infer')
+response_op = op_maker.create('general_response')
+op_seq_maker = OpSeqMaker()
+op_seq_maker.add_op(read_op)
+op_seq_maker.add_op(general_infer_op)
+op_seq_maker.add_op(response_op)
+server = MultiLangServer()
+server.set_op_sequence(op_seq_maker.get_op_sequence())
+server.load_model_config(sys.argv[1])
+server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
+server.run_server()
--- a/python/examples/imagenet/README_CN.md
+++ b/python/examples/imagenet/README_CN.md
@@ -19,10 +19,10 @@ pip install paddle_serving_app
 启动server端
 ```
-python image_classification_service.py ResNet50_vd_model cpu 9696 #cpu预测服务
+python resnet50_web_service.py ResNet50_vd_model cpu 9696 #cpu预测服务
 ```
 ```
-python image_classification_service.py ResNet50_vd_model gpu 9696 #gpu预测服务
+python resnet50_web_service.py ResNet50_vd_model gpu 9696 #gpu预测服务
 ```

--- a/python/examples/imagenet/benchmark.py
+++ b/python/examples/imagenet/benchmark.py
@@ -73,7 +73,7 @@ def single_func(idx, resource):
                print("unsupport batch size {}".format(args.batch_size))
    elif args.request == "http":
-        py_version = 2
+        py_version = sys.version_info[0]
        server = "http://" + resource["endpoint"][idx % len(resource[
            "endpoint"])] + "/image/prediction"
        start = time.time()
@@ -93,7 +93,7 @@ def single_func(idx, resource):
 if __name__ == '__main__':
    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9696"]
+    endpoint_list = ["127.0.0.1:9393"]
    #endpoint_list = endpoint_list + endpoint_list + endpoint_list
    result = multi_thread_runner.run(single_func, args.thread,
                                     {"endpoint": endpoint_list})

--- a/python/examples/imagenet/benchmark.sh
+++ b/python/examples/imagenet/benchmark.sh
 rm profile_log
-for thread_num in 1 2 4 8
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
+sleep 5
+#warm up
+$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+for thread_num in 4 8 16
 do
-for batch_size in 1 2 4 8 16 32 64 128
+for batch_size in 1 4 16 64 256
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model ResNet50_vd_client_config/serving_client_conf.prototxt --request rpc > profile 2>&1
+    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
+    echo "model name :" $1
-    echo "batch size : $batch_size" >> profile_log
+    echo "thread num :" $thread_num
+    echo "batch size :" $batch_size
+    echo "=================Done===================="
+    echo "model name :$1" >> profile_log
+    echo "batch size :$batch_size" >> profile_log
    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
+    tail -n 8 profile >> profile_log
 done
 done
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/fit_a_line/test_py_client.py
+++ b/python/examples/fit_a_line/test_py_client.py
@@ -13,27 +13,23 @@
 # limitations under the License.
 from paddle_serving_client.pyclient import PyClient
 import numpy as np
+from paddle_serving_app.reader import IMDBDataset
 from line_profiler import LineProfiler
 client = PyClient()
 client.connect('localhost:8080')
-x = np.array(
-    [
-        0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
-        0.6283, 0.4919, 0.1856, 0.0795, -0.0332
-    ],
-    dtype='float')
 lp = LineProfiler()
 lp_wrapper = lp(client.predict)
+words = 'i am very sad | 0'
+imdb_dataset = IMDBDataset()
+imdb_dataset.load_resource('imdb.vocab')
 for i in range(1):
+    word_ids, label = imdb_dataset.get_words_and_label(words)
    fetch_map = lp_wrapper(
-        feed={"x": x}, fetch_with_type={"combine_op_output": "float"})
+        feed={"words": word_ids}, fetch=["combined_prediction"])
-    # fetch_map = client.predict(
-    # feed={"x": x}, fetch_with_type={"combine_op_output": "float"})
    print(fetch_map)
 #lp.print_stats()
--- a/python/examples/fit_a_line/test_py_server.py
+++ b/python/examples/fit_a_line/test_py_server.py
@@ -16,101 +16,54 @@
 from paddle_serving_server.pyserver import Op
 from paddle_serving_server.pyserver import Channel
 from paddle_serving_server.pyserver import PyServer
-from paddle_serving_server import python_service_channel_pb2
 import numpy as np
 import logging
 logging.basicConfig(
    format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s',
    datefmt='%Y-%m-%d %H:%M',
+    #level=logging.DEBUG)
    level=logging.INFO)
-# channel data: {name(str): data(bytes)}
 class CombineOp(Op):
    def preprocess(self, input_data):
-        cnt = 0
+        combined_prediction = 0
-        for op_name, data in input_data.items():
+        for op_name, channeldata in input_data.items():
-            logging.debug("CombineOp preprocess: {}".format(op_name))
+            data = channeldata.parse()
-            cnt += np.frombuffer(data.insts[0].data, dtype='float')
+            logging.info("{}: {}".format(op_name, data["prediction"]))
-        data = python_service_channel_pb2.ChannelData()
+            combined_prediction += data["prediction"]
-        inst = python_service_channel_pb2.Inst()
+        data = {"combined_prediction": combined_prediction / 2}
-        inst.data = np.ndarray.tobytes(cnt)
-        inst.name = "combine_op_output"
-        data.insts.append(inst)
-        return data
-    def postprocess(self, output_data):
-        return output_data
-class UciOp(Op):
-    def postprocess(self, output_data):
-        data = python_service_channel_pb2.ChannelData()
-        inst = python_service_channel_pb2.Inst()
-        pred = np.array(output_data["price"][0][0], dtype='float')
-        inst.data = np.ndarray.tobytes(pred)
-        inst.name = "prediction"
-        data.insts.append(inst)
        return data
-read_channel = Channel(name="read_channel")
+read_op = Op(name="read", inputs=None)
-combine_channel = Channel(name="combine_channel")
+bow_op = Op(name="bow",
-out_channel = Channel(name="out_channel")
+            inputs=[read_op],
+            server_model="imdb_bow_model",
-cnn_op = UciOp(
+            server_port="9393",
-    name="cnn",
+            device="cpu",
-    input=read_channel,
+            client_config="imdb_bow_client_conf/serving_client_conf.prototxt",
-    in_dtype='float',
+            server_name="127.0.0.1:9393",
-    outputs=[combine_channel],
+            fetch_names=["prediction"],
-    out_dtype='float',
+            concurrency=1,
-    server_model="./uci_housing_model",
+            timeout=0.1,
-    server_port="9393",
+            retry=2)
-    device="cpu",
+cnn_op = Op(name="cnn",
-    client_config="uci_housing_client/serving_client_conf.prototxt",
+            inputs=[read_op],
-    server_name="127.0.0.1:9393",
+            server_model="imdb_cnn_model",
-    fetch_names=["price"],
+            server_port="9292",
-    concurrency=1,
+            device="cpu",
-    timeout=0.01,
+            client_config="imdb_cnn_client_conf/serving_client_conf.prototxt",
-    retry=2)
+            server_name="127.0.0.1:9292",
+            fetch_names=["prediction"],
-bow_op = UciOp(
+            concurrency=1,
-    name="bow",
+            timeout=-1,
-    input=read_channel,
+            retry=1)
-    in_dtype='float',
-    outputs=[combine_channel],
-    out_dtype='float',
-    server_model="./uci_housing_model",
-    server_port="9292",
-    device="cpu",
-    client_config="uci_housing_client/serving_client_conf.prototxt",
-    server_name="127.0.0.1:9393",
-    fetch_names=["price"],
-    concurrency=1,
-    timeout=-1,
-    retry=1)
 combine_op = CombineOp(
-    name="combine",
+    name="combine", inputs=[bow_op, cnn_op], concurrency=1, timeout=-1, retry=1)
-    input=combine_channel,
-    in_dtype='float',
-    outputs=[out_channel],
-    out_dtype='float',
-    concurrency=1,
-    timeout=-1,
-    retry=1)
-logging.info(read_channel.debug())
-logging.info(combine_channel.debug())
-logging.info(out_channel.debug())
 pyserver = PyServer(profile=False, retry=1)
-pyserver.add_channel(read_channel)
+pyserver.add_ops([read_op, bow_op, cnn_op, combine_op])
-pyserver.add_channel(combine_channel)
-pyserver.add_channel(out_channel)
-pyserver.add_op(cnn_op)
-pyserver.add_op(bow_op)
-pyserver.add_op(combine_op)
 pyserver.prepare_server(port=8080, worker_num=2)
 pyserver.run_server()
--- a/python/examples/lac/README.md
+++ b/python/examples/lac/README.md
@@ -2,28 +2,27 @@
 ([简体中文](./README_CN.md)|English)
-### Get model files and sample data
+### Get Model
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
 ```
-the package downloaded contains lac model config along with lac dictionary.
 #### Start RPC inference service
 ```
-python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
+python -m paddle_serving_server.serve --model lac_model/ --port 9292
 ```
 ### RPC Infer
 ```
-echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
+echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
 ```
-it will get the segmentation result
+It will get the segmentation result. 
 ### Start HTTP inference service
 ```
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
+python lac_web_service.py lac_model/ lac_workdir 9292
 ```
 ### HTTP Infer

--- a/python/examples/lac/README_CN.md
+++ b/python/examples/lac/README_CN.md
@@ -2,28 +2,27 @@
 (简体中文|[English](./README.md))
-### 获取模型和字典文件
+### 获取模型
 ```
-sh get_data.sh
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
 ```
-下载包里包含了lac模型和lac模型预测需要的字典文件
 #### 开启RPC预测服务
 ```
-python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
+python -m paddle_serving_server.serve --model lac_model/ --port 9292
 ```
 ### 执行RPC预测
 ```
-echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
+echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
 ```
 我们就能得到分词结果
 ### 开启HTTP预测服务
 ```
-python lac_web_service.py jieba_server_model/ lac_workdir 9292
+python lac_web_service.py lac_model/ lac_workdir 9292
 ```
 ### 执行HTTP预测

--- a/python/examples/lac/benchmark.py
+++ b/python/examples/lac/benchmark.py
@@ -16,7 +16,7 @@
 import sys
 import time
 import requests
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
 from paddle_serving_client.utils import benchmark_args
@@ -25,7 +25,7 @@ args = benchmark_args()
 def single_func(idx, resource):
-    reader = LACReader("lac_dict")
+    reader = LACReader()
    start = time.time()
    if args.request == "rpc":
        client = Client()

--- a/python/examples/lac/get_data.sh
+++ b/python/examples/lac/get_data.sh
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
-tar -zxvf lac_model_jieba_web.tar.gz
--- a/python/examples/lac/lac_client.py
+++ b/python/examples/lac/lac_client.py
@@ -15,7 +15,7 @@
 # pylint: disable=doc-string-missing
 from paddle_serving_client import Client
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 import sys
 import os
 import io
@@ -24,7 +24,7 @@ client = Client()
 client.load_client_config(sys.argv[1])
 client.connect(["127.0.0.1:9292"])
-reader = LACReader(sys.argv[2])
+reader = LACReader()
 for line in sys.stdin:
    if len(line) <= 0:
        continue
@@ -32,4 +32,7 @@ for line in sys.stdin:
    if len(feed_data) <= 0:
        continue
    fetch_map = client.predict(feed={"words": feed_data}, fetch=["crf_decode"])
-    print(fetch_map)
+    begin = fetch_map['crf_decode.lod'][0]
+    end = fetch_map['crf_decode.lod'][1]
+    segs = reader.parse_result(line, fetch_map["crf_decode"][begin:end])
+    print("word_seg: " + "|".join(str(words) for words in segs))
--- a/python/examples/lac/lac_web_service.py
+++ b/python/examples/lac/lac_web_service.py
@@ -14,12 +14,12 @@
 from paddle_serving_server.web_service import WebService
 import sys
-from lac_reader import LACReader
+from paddle_serving_app.reader import LACReader
 class LACService(WebService):
    def load_reader(self):
-        self.reader = LACReader("lac_dict")
+        self.reader = LACReader()
    def preprocess(self, feed={}, fetch=[]):
        feed_batch = []

--- a/python/examples/mobilenet/README.md
+++ b/python/examples/mobilenet/README.md
+# Image Classification
+## Get Model
+```
+python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
+tar -xzvf mobilenet_v2_imagenet.tar.gz
+```
+## RPC Service
+### Start Service
+```
+python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
+```
+### Client Prediction
+```
+python mobilenet_tutorial.py
+```
--- a/python/examples/mobilenet/README_CN.md
+++ b/python/examples/mobilenet/README_CN.md
+# 图像分类
+## 获取模型
+```
+python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
+tar -xzvf mobilenet_v2_imagenet.tar.gz
+```
+## RPC 服务
+### 启动服务端
+```
+python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
+```
+### 客户端预测
+```
+python mobilenet_tutorial.py
+```
--- a/python/examples/ocr/README.md
+++ b/python/examples/ocr/README.md
+# OCR 
+## Get Model
+```
+python -m paddle_serving_app.package --get_model ocr_rec
+tar -xzvf ocr_rec.tar.gz
+```
+## RPC Service
+### Start Service
+```
+python -m paddle_serving_server.serve --model ocr_rec_model --port 9292
+```
+### Client Prediction
+```
+python test_ocr_rec_client.py
+```
--- a/python/examples/ocr/test_ocr_rec_client.py
+++ b/python/examples/ocr/test_ocr_rec_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_client import Client
+from paddle_serving_app.reader import OCRReader
+import cv2
+client = Client()
+client.load_client_config("ocr_rec_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9292"])
+image_file_list = ["./test_rec.jpg"]
+img = cv2.imread(image_file_list[0])
+ocr_reader = OCRReader()
+feed = {"image": ocr_reader.preprocess([img])}
+fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
+fetch_map = client.predict(feed=feed, fetch=fetch)
+rec_res = ocr_reader.postprocess(fetch_map)
+print(image_file_list[0])
+print(rec_res[0][0])
--- a/python/examples/ocr/test_rec.jpg
+++ b/python/examples/ocr/test_rec.jpg
--- a/python/examples/ocr_detection/7.jpg
+++ b/python/examples/ocr_detection/7.jpg
--- a/python/examples/ocr_detection/text_det_client.py
+++ b/python/examples/ocr_detection/text_det_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+from paddle_serving_client import Client
+from paddle_serving_app.reader import Sequential, File2Image, ResizeByFactor
+from paddle_serving_app.reader import Div, Normalize, Transpose
+from paddle_serving_app.reader import DBPostProcess, FilterBoxes
+client = Client()
+client.load_client_config("ocr_det_client/serving_client_conf.prototxt")
+client.connect(["127.0.0.1:9494"])
+read_image_file = File2Image()
+preprocess = Sequential([
+    ResizeByFactor(32, 960), Div(255),
+    Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
+        (2, 0, 1))
+])
+post_func = DBPostProcess({
+    "thresh": 0.3,
+    "box_thresh": 0.5,
+    "max_candidates": 1000,
+    "unclip_ratio": 1.5,
+    "min_size": 3
+})
+filter_func = FilterBoxes(10, 10)
+img = read_image_file(name)
+ori_h, ori_w, _ = img.shape
+img = preprocess(img)
+new_h, new_w, _ = img.shape
+ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
+outputs = client.predict(feed={"image": img}, fetch=["concat_1.tmp_0"])
+dt_boxes_list = post_func(outputs["concat_1.tmp_0"], [ratio_list])
+dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
--- a/python/examples/resnet_v2_50/README.md
+++ b/python/examples/resnet_v2_50/README.md
+# Image Classification
+## Get Model
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+## RPC Service
+### Start Service
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
+```
+### Client Prediction
+```
+python resnet50_v2_tutorial.py
+```
--- a/python/examples/resnet_v2_50/README_CN.md
+++ b/python/examples/resnet_v2_50/README_CN.md
+# 图像分类
+## 获取模型
+```
+python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
+tar -xzvf resnet_v2_50_imagenet.tar.gz
+```
+## RPC 服务
+### 启动服务端
+```
+python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
+```
+### 客户端预测
+```
+python resnet50_v2_tutorial.py
+```
--- a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
+++ b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
@@ -14,7 +14,7 @@
 from paddle_serving_client import Client
 from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
-from apddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
 client = Client()
 client.load_client_config(
@@ -28,5 +28,5 @@ seq = Sequential([
 image_file = "daisy.jpg"
 img = seq(image_file)
-fetch_map = client.predict(feed={"image": img}, fetch=["feature_map"])
+fetch_map = client.predict(feed={"image": img}, fetch=["score"])
-print(fetch_map["feature_map"].reshape(-1))
+print(fetch_map["score"].reshape(-1))
--- a/python/examples/senta/README.md
+++ b/python/examples/senta/README.md
-# Chinese sentence sentiment classification
+# Chinese Sentence Sentiment Classification
 ([简体中文](./README_CN.md)|English)
-## Get model files and sample data
-```
-sh get_data.sh
-```
-## Install preprocess module
+## Get Model
 ```
-pip install paddle_serving_app
+python -m paddle_serving_app.package --get_model senta_bilstm
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf senta_bilstm.tar.gz
+tar -xzvf lac.tar.gz
 ```
-## Start http service
+## Start HTTP Service
 ```
-python senta_web_service.py senta_bilstm_model/ workdir 9292
+python -m paddle_serving_server.serve --model lac_model --port 9300
+python senta_web_service.py
 ```
-In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). Set model path by ```lac_model_path``` and dictionary path by ```lac_dict_path```. 
+In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). 
-In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task. The LAC prediction service is deployed on the CPU, and the sentiment classification task is deployed on the GPU, which can be changed according to the actual situation.
+In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task.
 ## Client prediction
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
--- a/python/examples/senta/README_CN.md
+++ b/python/examples/senta/README_CN.md
 # 中文语句情感分类
 (简体中文|[English](./README.md))
-## 获取模型文件和样例数据
-```
+## 获取模型文件
-sh get_data.sh
-```
-## 安装数据预处理模块
 ```
-pip install paddle_serving_app
+python -m paddle_serving_app.package --get_model senta_bilstm
+python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
+tar -xzvf senta_bilstm.tar.gz
 ```
 ## 启动HTTP服务
 ```
-python senta_web_service.py senta_bilstm_model/ workdir 9292
+python -m paddle_serving_server.serve --model lac_model --port 9300
+python senta_web_service.py
 ```
-中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词，在脚本中通过```lac_model_path```参数配置LAC任务的模型文件路径,```lac_dict_path```参数配置LAC任务词典路径。
+中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词。
-示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分，LAC预测服务部署在CPU上，情感分类任务部署在GPU上,可以根据实际情况进行更改。
+示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分。
 ## 客户端预测
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
--- a/python/examples/senta/senta_web_service.py
+++ b/python/examples/senta/senta_web_service.py
+#encoding=utf-8
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
@@ -12,56 +13,28 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from paddle_serving_server_gpu.web_service import WebService
+from paddle_serving_server.web_service import WebService
 from paddle_serving_client import Client
 from paddle_serving_app.reader import LACReader, SentaReader
 import os
 import sys
-from multiprocessing import Process
+#senta_web_service.py
+from paddle_serving_server.web_service import WebService
+from paddle_serving_client import Client
+from paddle_serving_app.reader import LACReader, SentaReader
-class SentaService(WebService):
-    def set_config(
-            self,
-            lac_model_path,
-            lac_dict_path,
-            senta_dict_path, ):
-        self.lac_model_path = lac_model_path
-        self.lac_client_config_path = lac_model_path + "/serving_server_conf.prototxt"
-        self.lac_dict_path = lac_dict_path
-        self.senta_dict_path = senta_dict_path
-    def start_lac_service(self):
-        if not os.path.exists('./lac_serving'):
-            os.mkdir("./lac_serving")
-        os.chdir('./lac_serving')
-        self.lac_port = self.port + 100
-        r = os.popen(
-            "python -m paddle_serving_server.serve --model {} --port {} &".
-            format("../" + self.lac_model_path, self.lac_port))
-        os.chdir('..')
-    def init_lac_service(self):
-        ps = Process(target=self.start_lac_service())
-        ps.start()
-        self.init_lac_client()
-    def lac_predict(self, feed_data):
-        lac_result = self.lac_client.predict(
-            feed={"words": feed_data}, fetch=["crf_decode"])
-        return lac_result
-    def init_lac_client(self):
-        self.lac_client = Client()
-        self.lac_client.load_client_config(self.lac_client_config_path)
-        self.lac_client.connect(["127.0.0.1:{}".format(self.lac_port)])
-    def init_lac_reader(self):
+class SentaService(WebService):
+    #初始化lac模型预测服务
+    def init_lac_client(self, lac_port, lac_client_config):
        self.lac_reader = LACReader()
-    def init_senta_reader(self):
        self.senta_reader = SentaReader()
+        self.lac_client = Client()
+        self.lac_client.load_client_config(lac_client_config)
+        self.lac_client.connect(["127.0.0.1:{}".format(lac_port)])
+    #定义senta模型预测服务的预处理，调用顺序：lac reader->lac模型预测->预测结果后处理->senta reader
    def preprocess(self, feed=[], fetch=[]):
        feed_data = [{
            "words": self.lac_reader.process(x["words"])
@@ -80,15 +53,9 @@ class SentaService(WebService):
 senta_service = SentaService(name="senta")
-senta_service.set_config(
+senta_service.load_model_config("senta_bilstm_model")
-    lac_model_path="./lac_model",
+senta_service.prepare_server(workdir="workdir")
-    lac_dict_path="./lac_dict",
+senta_service.init_lac_client(
-    senta_dict_path="./vocab.txt")
+    lac_port=9300, lac_client_config="lac_model/serving_server_conf.prototxt")
-senta_service.load_model_config(sys.argv[1])
-senta_service.prepare_server(
-    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
-senta_service.init_lac_reader()
-senta_service.init_senta_reader()
-senta_service.init_lac_service()
 senta_service.run_rpc_service()
 senta_service.run_web_service()
--- a/python/examples/unet_for_image_seg/README.md
+++ b/python/examples/unet_for_image_seg/README.md
+# Image Segmentation
+## Get Model
+```
+python -m paddle_serving_app.package --get_model unet
+tar -xzvf unet.tar.gz
+```
+## RPC Service
+### Start Service
+```
+python -m paddle_serving_server_gpu.serve --model unet_model --gpu_ids 0 --port 9494
+```
+### Client Prediction
+```
+python seg_client.py
+```
--- a/python/examples/unet_for_image_seg/README_CN.md
+++ b/python/examples/unet_for_image_seg/README_CN.md
+# 图像分割
+## 获取模型
+```
+python -m paddle_serving_app.package --get_model unet
+tar -xzvf unet.tar.gz
+```
+## RPC 服务
+### 启动服务端
+```
+python -m paddle_serving_server_gpu.serve --model unet_model --gpu_ids 0 --port 9494
+```
+### 客户端预测
+```
+python seg_client.py
+```
--- a/python/examples/unet_for_image_seg/seg_client.py
+++ b/python/examples/unet_for_image_seg/seg_client.py
@@ -27,7 +27,8 @@ preprocess = Sequential(
 postprocess = SegPostprocess(2)
-im = preprocess("N0060.jpg")
+filename = "N0060.jpg"
+im = preprocess(filename)
 fetch_map = client.predict(feed={"image": im}, fetch=["output"])
 fetch_map["filename"] = filename
 postprocess(fetch_map)
--- a/python/examples/util/show_profile.py
+++ b/python/examples/util/show_profile.py
@@ -31,7 +31,7 @@ with open(profile_file) as f:
        if line[0] == "PROFILE":
            prase(line[2])
-print("thread num {}".format(thread_num))
+print("thread num :{}".format(thread_num))
 for name in time_dict:
-    print("{} cost {} s in each thread ".format(name, time_dict[name] / (
+    print("{} cost :{} s in each thread ".format(name, time_dict[name] / (
        1000000.0 * float(thread_num))))
--- a/python/paddle_serving_app/README.md
+++ b/python/paddle_serving_app/README.md
@@ -12,7 +12,7 @@ pip install paddle_serving_app
 ## Get model list
 ```shell
-python -m paddle_serving_app.package --model_list
+python -m paddle_serving_app.package --list_model
 ```
 ## Download pre-training model
@@ -21,16 +21,16 @@ python -m paddle_serving_app.package --model_list
 python -m paddle_serving_app.package --get_model senta_bilstm
 ```
-11 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks.
+1 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks.
 The model files can be directly used for deployment, and the `--tutorial` argument can be added to obtain the deployment method.
 | Prediction task | Model name                                         |
 | ------------ | ------------------------------------------------ |
 | SentimentAnalysis | 'senta_bilstm', 'senta_bow', 'senta_cnn'         |
-| SemanticRepresentation | 'ernie_base'                                     |
+| SemanticRepresentation | 'ernie'                                     |
 | ChineseWordSegmentation     | 'lac'                                            |
-| ObjectDetection     | 'faster_rcnn', 'yolov3'                          |
+| ObjectDetection     | 'faster_rcnn'                         |
-| ImageSegmentation     | 'unet', 'deeplabv3'                              |
+| ImageSegmentation     | 'unet', 'deeplabv3','deeplabv3+cityscapes'      |
 | ImageClassification     | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' |
 ## Data preprocess API
@@ -38,7 +38,8 @@ The model files can be directly used for deployment, and the `--tutorial` argume
 paddle_serving_app provides a variety of data preprocessing methods for prediction tasks in the field of CV and NLP.
 - class ChineseBertReader 
 Preprocessing for Chinese semantic representation task.
  - `__init__(vocab_file, max_seq_len=20)`
@@ -54,7 +55,8 @@ Preprocessing for Chinese semantic representation task.
  [example](../examples/bert/bert_client.py)
 - class LACReader 
 Preprocessing for Chinese word segmentation task.
  - `__init__(dict_floder)`
@@ -65,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
    - words（st ）：Original text input.
    - crf_decode（np.array）：CRF code predicted by model.
-  [example](../examples/bert/lac_web_service.py)
+  [example](../examples/lac/lac_web_service.py)
 - class SentaReader
@@ -76,7 +78,7 @@ Preprocessing for Chinese word segmentation task.
  [example](../examples/senta/senta_web_service.py)
- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes，[example](../examples/imagenet/image_rpc_client.py)
+- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes，[example](../examples/imagenet/resnet50_rpc_client.py)
 - class Sequentia

--- a/python/paddle_serving_app/README_CN.md
+++ b/python/paddle_serving_app/README_CN.md
@@ -11,7 +11,7 @@ pip install paddle_serving_app
 ## 获取模型列表
 ```shell
-python -m paddle_serving_app.package --model_list
+python -m paddle_serving_app.package --list_model
 ```
 ## 下载预训练模型
@@ -20,15 +20,15 @@ python -m paddle_serving_app.package --model_list
 python -m paddle_serving_app.package --get_model senta_bilstm
 ```
-paddle_serving_app中内置了11中预训练模型，涵盖了6种预测任务。获取到的模型文件可以直接用于部署，添加`--tutorial`参数可以获取对应的部署方式。
+paddle_serving_app中内置了11种预训练模型，涵盖了6种预测任务。获取到的模型文件可以直接用于部署，添加`--tutorial`参数可以获取对应的部署方式。
 | 预测服务类型 | 模型名称                                         |
 | ------------ | ------------------------------------------------ |
 | 中文情感分析 | 'senta_bilstm', 'senta_bow', 'senta_cnn'         |
-| 语义理解     | 'ernie_base'                                     |
+| 语义理解     | 'ernie'                                          |
 | 中文分词     | 'lac'                                            |
-| 图像检测     | 'faster_rcnn', 'yolov3'                          |
+| 图像检测     | 'faster_rcnn'                                    |
-| 图像分割     | 'unet', 'deeplabv3'                              |
+| 图像分割     | 'unet', 'deeplabv3', 'deeplabv3+cityscapes'                              |
 | 图像分类     | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' |
 ## 数据预处理API
@@ -36,7 +36,7 @@ paddle_serving_app中内置了11中预训练模型，涵盖了6种预测任务
 paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的数据预处理方法。
 - class ChineseBertReader 
    中文语义理解模型预处理
  - `__init__(vocab_file, max_seq_len=20)`
@@ -71,7 +71,7 @@ paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的
  [参考示例](../examples/senta/senta_web_service.py)
- 图像的预处理方法相比于上述的方法更加灵活多变，可以通过以下的多个类进行组合，[参考示例](../examples/imagenet/image_rpc_client.py)
+- 图像的预处理方法相比于上述的方法更加灵活多变，可以通过以下的多个类进行组合，[参考示例](../examples/imagenet/resnet50_rpc_client.py)
 - class Sequentia

--- a/python/paddle_serving_app/models/model_list.py
+++ b/python/paddle_serving_app/models/model_list.py
@@ -22,22 +22,26 @@ class ServingModels(object):
        self.model_dict = OrderedDict()
        self.model_dict[
            "SentimentAnalysis"] = ["senta_bilstm", "senta_bow", "senta_cnn"]
-        self.model_dict["SemanticRepresentation"] = ["ernie_base"]
+        self.model_dict["SemanticRepresentation"] = ["ernie"]
        self.model_dict["ChineseWordSegmentation"] = ["lac"]
-        self.model_dict["ObjectDetection"] = ["faster_rcnn", "yolov3"]
+        self.model_dict["ObjectDetection"] = ["faster_rcnn"]
        self.model_dict["ImageSegmentation"] = [
            "unet", "deeplabv3", "deeplabv3+cityscapes"
        ]
        self.model_dict["ImageClassification"] = [
            "resnet_v2_50_imagenet", "mobilenet_v2_imagenet"
        ]
+        self.model_dict["TextDetection"] = ["ocr_detection"]
+        self.model_dict["OCR"] = ["ocr_rec"]
        image_class_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/"
        image_seg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageSegmentation/"
        object_detection_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ObjectDetection/"
+        ocr_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/OCR/"
        senta_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/"
-        semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticRepresentation/"
+        semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/"
        wordseg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/"
+        ocr_det_url = "https://paddle-serving.bj.bcebos.com/ocr/"
        self.url_dict = {}
@@ -52,6 +56,8 @@ class ServingModels(object):
        pack_url(self.model_dict, "ObjectDetection", object_detection_url)
        pack_url(self.model_dict, "ImageSegmentation", image_seg_url)
        pack_url(self.model_dict, "ImageClassification", image_class_url)
+        pack_url(self.model_dict, "OCR", ocr_url)
+        pack_url(self.model_dict, "TextDetection", ocr_det_url)
    def get_model_list(self):
        return self.model_dict

--- a/python/paddle_serving_app/reader/__init__.py
+++ b/python/paddle_serving_app/reader/__init__.py
@@ -12,7 +12,11 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .chinese_bert_reader import ChineseBertReader
-from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize, CenterCrop, Resize, Transpose, Div, RGB2BGR, BGR2RGB, RCNNPostprocess, SegPostprocess, PadStride
+from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize
+from .image_reader import CenterCrop, Resize, Transpose, Div, RGB2BGR, BGR2RGB, ResizeByFactor
+from .image_reader import RCNNPostprocess, SegPostprocess, PadStride
+from .image_reader import DBPostProcess, FilterBoxes
 from .lac_reader import LACReader
 from .senta_reader import SentaReader
 from .imdb_reader import IMDBDataset
+from .ocr_reader import OCRReader
--- a/python/paddle_serving_app/reader/image_reader.py
+++ b/python/paddle_serving_app/reader/image_reader.py
@@ -11,6 +11,9 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import cv2
 import os
 import numpy as np
@@ -18,6 +21,8 @@ import base64
 import sys
 from . import functional as F
 from PIL import Image, ImageDraw
+from shapely.geometry import Polygon
+import pyclipper
 import json
 _cv2_interpolation_to_str = {cv2.INTER_LINEAR: "cv2.INTER_LINEAR", None: "None"}
@@ -43,6 +48,196 @@ def generate_colormap(num_classes):
    return color_map
+class DBPostProcess(object):
+    """
+    The post process for Differentiable Binarization (DB).
+    """
+    def __init__(self, params):
+        self.thresh = params['thresh']
+        self.box_thresh = params['box_thresh']
+        self.max_candidates = params['max_candidates']
+        self.unclip_ratio = params['unclip_ratio']
+        self.min_size = 3
+    def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
+        '''
+        _bitmap: single map with shape (1, H, W),
+                whose values are binarized as {0, 1}
+        '''
+        bitmap = _bitmap
+        height, width = bitmap.shape
+        outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
+                                cv2.CHAIN_APPROX_SIMPLE)
+        if len(outs) == 3:
+            img, contours, _ = outs[0], outs[1], outs[2]
+        elif len(outs) == 2:
+            contours, _ = outs[0], outs[1]
+        num_contours = min(len(contours), self.max_candidates)
+        boxes = np.zeros((num_contours, 4, 2), dtype=np.int16)
+        scores = np.zeros((num_contours, ), dtype=np.float32)
+        for index in range(num_contours):
+            contour = contours[index]
+            points, sside = self.get_mini_boxes(contour)
+            if sside < self.min_size:
+                continue
+            points = np.array(points)
+            score = self.box_score_fast(pred, points.reshape(-1, 2))
+            if self.box_thresh > score:
+                continue
+            box = self.unclip(points).reshape(-1, 1, 2)
+            box, sside = self.get_mini_boxes(box)
+            if sside < self.min_size + 2:
+                continue
+            box = np.array(box)
+            if not isinstance(dest_width, int):
+                dest_width = dest_width.item()
+                dest_height = dest_height.item()
+            box[:, 0] = np.clip(
+                np.round(box[:, 0] / width * dest_width), 0, dest_width)
+            box[:, 1] = np.clip(
+                np.round(box[:, 1] / height * dest_height), 0, dest_height)
+            boxes[index, :, :] = box.astype(np.int16)
+            scores[index] = score
+        return boxes, scores
+    def unclip(self, box):
+        unclip_ratio = self.unclip_ratio
+        poly = Polygon(box)
+        distance = poly.area * unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded = np.array(offset.Execute(distance))
+        return expanded
+    def get_mini_boxes(self, contour):
+        bounding_box = cv2.minAreaRect(contour)
+        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+        index_1, index_2, index_3, index_4 = 0, 1, 2, 3
+        if points[1][1] > points[0][1]:
+            index_1 = 0
+            index_4 = 1
+        else:
+            index_1 = 1
+            index_4 = 0
+        if points[3][1] > points[2][1]:
+            index_2 = 2
+            index_3 = 3
+        else:
+            index_2 = 3
+            index_3 = 2
+        box = [
+            points[index_1], points[index_2], points[index_3], points[index_4]
+        ]
+        return box, min(bounding_box[1])
+    def box_score_fast(self, bitmap, _box):
+        h, w = bitmap.shape[:2]
+        box = _box.copy()
+        xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
+        xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
+        ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
+        ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+        box[:, 0] = box[:, 0] - xmin
+        box[:, 1] = box[:, 1] - ymin
+        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
+        return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
+    def __call__(self, pred, ratio_list):
+        pred = pred[:, 0, :, :]
+        segmentation = pred > self.thresh
+        boxes_batch = []
+        for batch_index in range(pred.shape[0]):
+            height, width = pred.shape[-2:]
+            tmp_boxes, tmp_scores = self.boxes_from_bitmap(
+                pred[batch_index], segmentation[batch_index], width, height)
+            boxes = []
+            for k in range(len(tmp_boxes)):
+                if tmp_scores[k] > self.box_thresh:
+                    boxes.append(tmp_boxes[k])
+            if len(boxes) > 0:
+                boxes = np.array(boxes)
+                ratio_h, ratio_w = ratio_list[batch_index]
+                boxes[:, :, 0] = boxes[:, :, 0] / ratio_w
+                boxes[:, :, 1] = boxes[:, :, 1] / ratio_h
+            boxes_batch.append(boxes)
+        return boxes_batch
+    def __repr__(self):
+        return self.__class__.__name__ + \
+            " thresh: {1}, box_thresh: {2}, max_candidates: {3}, unclip_ratio: {4}, min_size: {5}".format(
+                self.thresh, self.box_thresh, self.max_candidates, self.unclip_ratio, self.min_size)
+class FilterBoxes(object):
+    def __init__(self, width, height):
+        self.filter_width = width
+        self.filter_height = height
+    def order_points_clockwise(self, pts):
+        """
+        reference from: https://github.com/jrosebr1/imutils/blob/master/imutils/perspective.py
+        # sort the points based on their x-coordinates
+        """
+        xSorted = pts[np.argsort(pts[:, 0]), :]
+        # grab the left-most and right-most points from the sorted
+        # x-roodinate points
+        leftMost = xSorted[:2, :]
+        rightMost = xSorted[2:, :]
+        # now, sort the left-most coordinates according to their
+        # y-coordinates so we can grab the top-left and bottom-left
+        # points, respectively
+        leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
+        (tl, bl) = leftMost
+        rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
+        (tr, br) = rightMost
+        rect = np.array([tl, tr, br, bl], dtype="float32")
+        return rect
+    def clip_det_res(self, points, img_height, img_width):
+        for pno in range(4):
+            points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
+            points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1))
+        return points
+    def __call__(self, dt_boxes, image_shape):
+        img_height, img_width = image_shape[0:2]
+        dt_boxes_new = []
+        for box in dt_boxes:
+            box = self.order_points_clockwise(box)
+            box = self.clip_det_res(box, img_height, img_width)
+            rect_width = int(np.linalg.norm(box[0] - box[1]))
+            rect_height = int(np.linalg.norm(box[0] - box[3]))
+            if rect_width <= self.filter_width or \
+               rect_height <= self.filter_height:
+                continue
+            dt_boxes_new.append(box)
+        dt_boxes = np.array(dt_boxes_new)
+        return dt_boxes
+    def __repr__(self):
+        return self.__class__.__name__ + " filter_width: {1}, filter_height: {2}".format(
+            self.filter_width, self.filter_height)
 class SegPostprocess(object):
    def __init__(self, class_num):
        self.class_num = class_num
@@ -77,8 +272,7 @@ class SegPostprocess(object):
        result_png = score_png
        result_png = cv2.resize(
-            result_png,
+            result_png, (ori_shape[1], ori_shape[0]),
-            ori_shape[:2],
            fx=0,
            fy=0,
            interpolation=cv2.INTER_CUBIC)
@@ -296,7 +490,10 @@ class File2Image(object):
        pass
    def __call__(self, img_path):
-        fin = open(img_path)
+        if py_version == 2:
+            fin = open(img_path)
+        else:
+            fin = open(img_path, "rb")
        sample = fin.read()
        data = np.fromstring(sample, np.uint8)
        img = cv2.imdecode(data, cv2.IMREAD_COLOR)
@@ -470,6 +667,57 @@ class Resize(object):
            _cv2_interpolation_to_str[self.interpolation])
+class ResizeByFactor(object):
+    """Resize the input numpy array Image to a size multiple of factor which is usually required by a network
+    Args:
+        factor (int): Resize factor. make width and height multiple factor of the value of factor. Default is 32
+        max_side_len (int): max size of width and height. if width or height is larger than max_side_len, just resize the width or the height. Default is 2400
+    """
+    def __init__(self, factor=32, max_side_len=2400):
+        self.factor = factor
+        self.max_side_len = max_side_len
+    def __call__(self, img):
+        h, w, _ = img.shape
+        resize_w = w
+        resize_h = h
+        if max(resize_h, resize_w) > self.max_side_len:
+            if resize_h > resize_w:
+                ratio = float(self.max_side_len) / resize_h
+            else:
+                ratio = float(self.max_side_len) / resize_w
+        else:
+            ratio = 1.
+        resize_h = int(resize_h * ratio)
+        resize_w = int(resize_w * ratio)
+        if resize_h % self.factor == 0:
+            resize_h = resize_h
+        elif resize_h // self.factor <= 1:
+            resize_h = self.factor
+        else:
+            resize_h = (resize_h // 32 - 1) * 32
+        if resize_w % self.factor == 0:
+            resize_w = resize_w
+        elif resize_w // self.factor <= 1:
+            resize_w = self.factor
+        else:
+            resize_w = (resize_w // self.factor - 1) * self.factor
+        try:
+            if int(resize_w) <= 0 or int(resize_h) <= 0:
+                return None, (None, None)
+            im = cv2.resize(img, (int(resize_w), int(resize_h)))
+        except:
+            print(resize_w, resize_h)
+            sys.exit(0)
+        return im
+    def __repr__(self):
+        return self.__class__.__name__ + '(factor={0}, max_side_len={1})'.format(
+            self.factor, self.max_side_len)
 class PadStride(object):
    def __init__(self, stride):
        self.coarsest_stride = stride

--- a/python/paddle_serving_app/reader/lac_reader.py
+++ b/python/paddle_serving_app/reader/lac_reader.py
@@ -111,6 +111,10 @@ class LACReader(object):
        return word_ids
    def parse_result(self, words, crf_decode):
+        try:
+            words = unicode(words, "utf-8")
+        except:
+            pass
        tags = [self.id2label_dict[str(x[0])] for x in crf_decode]
        sent_out = []

--- a/python/paddle_serving_app/reader/ocr_reader.py
+++ b/python/paddle_serving_app/reader/ocr_reader.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import cv2
+import copy
+import numpy as np
+import math
+import re
+import sys
+import argparse
+from paddle_serving_app.reader import Sequential, Resize, Transpose, Div, Normalize
+class CharacterOps(object):
+    """ Convert between text-label and text-index """
+    def __init__(self, config):
+        self.character_type = config['character_type']
+        self.loss_type = config['loss_type']
+        if self.character_type == "en":
+            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+            dict_character = list(self.character_str)
+        elif self.character_type == "ch":
+            character_dict_path = config['character_dict_path']
+            self.character_str = ""
+            with open(character_dict_path, "rb") as fin:
+                lines = fin.readlines()
+                for line in lines:
+                    line = line.decode('utf-8').strip("\n").strip("\r\n")
+                    self.character_str += line
+            dict_character = list(self.character_str)
+        elif self.character_type == "en_sensitive":
+            # same with ASTER setting (use 94 char).
+            self.character_str = string.printable[:-6]
+            dict_character = list(self.character_str)
+        else:
+            self.character_str = None
+        assert self.character_str is not None, \
+            "Nonsupport type of the character: {}".format(self.character_str)
+        self.beg_str = "sos"
+        self.end_str = "eos"
+        if self.loss_type == "attention":
+            dict_character = [self.beg_str, self.end_str] + dict_character
+        self.dict = {}
+        for i, char in enumerate(dict_character):
+            self.dict[char] = i
+        self.character = dict_character
+    def encode(self, text):
+        """convert text-label into text-index.
+        input:
+            text: text labels of each image. [batch_size]
+        output:
+            text: concatenated text index for CTCLoss.
+                    [sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
+            length: length of each text. [batch_size]
+        """
+        if self.character_type == "en":
+            text = text.lower()
+        text_list = []
+        for char in text:
+            if char not in self.dict:
+                continue
+            text_list.append(self.dict[char])
+        text = np.array(text_list)
+        return text
+    def decode(self, text_index, is_remove_duplicate=False):
+        """ convert text-index into text-label. """
+        char_list = []
+        char_num = self.get_char_num()
+        if self.loss_type == "attention":
+            beg_idx = self.get_beg_end_flag_idx("beg")
+            end_idx = self.get_beg_end_flag_idx("end")
+            ignored_tokens = [beg_idx, end_idx]
+        else:
+            ignored_tokens = [char_num]
+        for idx in range(len(text_index)):
+            if text_index[idx] in ignored_tokens:
+                continue
+            if is_remove_duplicate:
+                if idx > 0 and text_index[idx - 1] == text_index[idx]:
+                    continue
+            char_list.append(self.character[text_index[idx]])
+        text = ''.join(char_list)
+        return text
+    def get_char_num(self):
+        return len(self.character)
+    def get_beg_end_flag_idx(self, beg_or_end):
+        if self.loss_type == "attention":
+            if beg_or_end == "beg":
+                idx = np.array(self.dict[self.beg_str])
+            elif beg_or_end == "end":
+                idx = np.array(self.dict[self.end_str])
+            else:
+                assert False, "Unsupport type %s in get_beg_end_flag_idx"\
+                    % beg_or_end
+            return idx
+        else:
+            err = "error in get_beg_end_flag_idx when using the loss %s"\
+                % (self.loss_type)
+            assert False, err
+class OCRReader(object):
+    def __init__(self):
+        args = self.parse_args()
+        image_shape = [int(v) for v in args.rec_image_shape.split(",")]
+        self.rec_image_shape = image_shape
+        self.character_type = args.rec_char_type
+        self.rec_batch_num = args.rec_batch_num
+        char_ops_params = {}
+        char_ops_params["character_type"] = args.rec_char_type
+        char_ops_params["character_dict_path"] = args.rec_char_dict_path
+        char_ops_params['loss_type'] = 'ctc'
+        self.char_ops = CharacterOps(char_ops_params)
+    def parse_args(self):
+        parser = argparse.ArgumentParser()
+        parser.add_argument("--rec_algorithm", type=str, default='CRNN')
+        parser.add_argument("--rec_model_dir", type=str)
+        parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
+        parser.add_argument("--rec_char_type", type=str, default='ch')
+        parser.add_argument("--rec_batch_num", type=int, default=1)
+        parser.add_argument(
+            "--rec_char_dict_path", type=str, default="./ppocr_keys_v1.txt")
+        return parser.parse_args()
+    def resize_norm_img(self, img, max_wh_ratio):
+        imgC, imgH, imgW = self.rec_image_shape
+        if self.character_type == "ch":
+            imgW = int(32 * max_wh_ratio)
+        h = img.shape[0]
+        w = img.shape[1]
+        ratio = w / float(h)
+        if math.ceil(imgH * ratio) > imgW:
+            resized_w = imgW
+        else:
+            resized_w = int(math.ceil(imgH * ratio))
+        seq = Sequential([
+            Resize(imgH, resized_w), Transpose((2, 0, 1)), Div(255),
+            Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5], True)
+        ])
+        resized_image = seq(img)
+        padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
+        padding_im[:, :, 0:resized_w] = resized_image
+        return padding_im
+    def preprocess(self, img_list):
+        img_num = len(img_list)
+        norm_img_batch = []
+        max_wh_ratio = 0
+        for ino in range(img_num):
+            h, w = img_list[ino].shape[0:2]
+            wh_ratio = w * 1.0 / h
+            max_wh_ratio = max(max_wh_ratio, wh_ratio)
+        for ino in range(img_num):
+            norm_img = self.resize_norm_img(img_list[ino], max_wh_ratio)
+            norm_img = norm_img[np.newaxis, :]
+            norm_img_batch.append(norm_img)
+        norm_img_batch = np.concatenate(norm_img_batch)
+        norm_img_batch = norm_img_batch.copy()
+        return norm_img_batch[0]
+    def postprocess(self, outputs):
+        rec_res = []
+        rec_idx_lod = outputs["ctc_greedy_decoder_0.tmp_0.lod"]
+        predict_lod = outputs["softmax_0.tmp_0.lod"]
+        rec_idx_batch = outputs["ctc_greedy_decoder_0.tmp_0"]
+        for rno in range(len(rec_idx_lod) - 1):
+            beg = rec_idx_lod[rno]
+            end = rec_idx_lod[rno + 1]
+            rec_idx_tmp = rec_idx_batch[beg:end, 0]
+            preds_text = self.char_ops.decode(rec_idx_tmp)
+            beg = predict_lod[rno]
+            end = predict_lod[rno + 1]
+            probs = outputs["softmax_0.tmp_0"][beg:end, :]
+            ind = np.argmax(probs, axis=1)
+            blank = probs.shape[1]
+            valid_ind = np.where(ind != (blank - 1))[0]
+            score = np.mean(probs[valid_ind, ind[valid_ind]])
+            rec_res.append([preds_text, score])
+        return rec_res
--- a/python/paddle_serving_client/__init__.py
+++ b/python/paddle_serving_client/__init__.py
@@ -21,7 +21,12 @@ import google.protobuf.text_format
 import numpy as np
 import time
 import sys
-from .serving_client import PredictorRes
+import grpc
+from .proto import multi_lang_general_model_service_pb2
+sys.path.append(
+    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
+from .proto import multi_lang_general_model_service_pb2_grpc
 int_type = 0
 float_type = 1
@@ -61,13 +66,18 @@ class SDKConfig(object):
        self.tag_list = []
        self.cluster_list = []
        self.variant_weight_list = []
+        self.rpc_timeout_ms = 20000
+        self.load_balance_strategy = "la"
    def add_server_variant(self, tag, cluster, variant_weight):
        self.tag_list.append(tag)
        self.cluster_list.append(cluster)
        self.variant_weight_list.append(variant_weight)
-    def gen_desc(self):
+    def set_load_banlance_strategy(self, strategy):
+        self.load_balance_strategy = strategy
+    def gen_desc(self, rpc_timeout_ms):
        predictor_desc = sdk.Predictor()
        predictor_desc.name = "general_model"
        predictor_desc.service_name = \
@@ -86,7 +96,7 @@ class SDKConfig(object):
        self.sdk_desc.predictors.extend([predictor_desc])
        self.sdk_desc.default_variant_conf.tag = "default"
        self.sdk_desc.default_variant_conf.connection_conf.connect_timeout_ms = 2000
-        self.sdk_desc.default_variant_conf.connection_conf.rpc_timeout_ms = 20000
+        self.sdk_desc.default_variant_conf.connection_conf.rpc_timeout_ms = rpc_timeout_ms
        self.sdk_desc.default_variant_conf.connection_conf.connect_retry_count = 2
        self.sdk_desc.default_variant_conf.connection_conf.max_connection_per_host = 100
        self.sdk_desc.default_variant_conf.connection_conf.hedge_request_timeout_ms = -1
@@ -119,6 +129,9 @@ class Client(object):
        self.profile_ = _Profiler()
        self.all_numpy_input = True
        self.has_numpy_input = False
+        self.rpc_timeout_ms = 20000
+        from .serving_client import PredictorRes
+        self.predictorres_constructor = PredictorRes
    def load_client_config(self, path):
        from .serving_client import PredictorClient
@@ -171,13 +184,19 @@ class Client(object):
        self.predictor_sdk_.add_server_variant(tag, cluster,
                                               str(variant_weight))
+    def set_rpc_timeout_ms(self, rpc_timeout):
+        if not isinstance(rpc_timeout, int):
+            raise ValueError("rpc_timeout must be int type.")
+        else:
+            self.rpc_timeout_ms = rpc_timeout
    def connect(self, endpoints=None):
        # check whether current endpoint is available
        # init from client config
        # create predictor here
        if endpoints is None:
            if self.predictor_sdk_ is None:
-                raise SystemExit(
+                raise ValueError(
                    "You must set the endpoints parameter or use add_variant function to create a variant."
                )
        else:
@@ -188,7 +207,7 @@ class Client(object):
                print(
                    "parameter endpoints({}) will not take effect, because you use the add_variant function.".
                    format(endpoints))
-        sdk_desc = self.predictor_sdk_.gen_desc()
+        sdk_desc = self.predictor_sdk_.gen_desc(self.rpc_timeout_ms)
        self.client_handle_.create_predictor_by_desc(sdk_desc.SerializeToString(
        ))
@@ -203,7 +222,7 @@ class Client(object):
            return
        if isinstance(feed[key],
                      list) and len(feed[key]) != self.feed_tensor_len[key]:
-            raise SystemExit("The shape of feed tensor {} not match.".format(
+            raise ValueError("The shape of feed tensor {} not match.".format(
                key))
        if type(feed[key]).__module__ == np.__name__ and np.size(feed[
                key]) != self.feed_tensor_len[key]:
@@ -292,7 +311,7 @@ class Client(object):
        self.profile_.record('py_prepro_1')
        self.profile_.record('py_client_infer_0')
-        result_batch_handle = PredictorRes()
+        result_batch_handle = self.predictorres_constructor()
        if self.all_numpy_input:
            res = self.client_handle_.numpy_predict(
                float_slot_batch, float_feed_names, float_shape, int_slot_batch,
@@ -304,7 +323,7 @@ class Client(object):
                int_feed_names, int_shape, fetch_names, result_batch_handle,
                self.pid)
        else:
-            raise SystemExit(
+            raise ValueError(
                "Please make sure the inputs are all in list type or all in numpy.array type"
            )
@@ -360,3 +379,172 @@ class Client(object):
    def release(self):
        self.client_handle_.destroy_predictor()
        self.client_handle_ = None
+class MultiLangClient(object):
+    def __init__(self):
+        self.channel_ = None
+    def load_client_config(self, path):
+        if not isinstance(path, str):
+            raise Exception("GClient only supports multi-model temporarily")
+        self._parse_model_config(path)
+    def connect(self, endpoint):
+        self.channel_ = grpc.insecure_channel(endpoint[0])  #TODO
+        self.stub_ = multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelServiceStub(
+            self.channel_)
+    def _flatten_list(self, nested_list):
+        for item in nested_list:
+            if isinstance(item, (list, tuple)):
+                for sub_item in self._flatten_list(item):
+                    yield sub_item
+            else:
+                yield item
+    def _parse_model_config(self, model_config_path):
+        model_conf = m_config.GeneralModelConfig()
+        f = open(model_config_path, 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
+        self.feed_types_ = {}
+        self.feed_shapes_ = {}
+        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
+        self.fetch_types_ = {}
+        self.lod_tensor_set_ = set()
+        for i, var in enumerate(model_conf.feed_var):
+            self.feed_types_[var.alias_name] = var.feed_type
+            self.feed_shapes_[var.alias_name] = var.shape
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+            else:
+                counter = 1
+                for dim in self.feed_shapes_[var.alias_name]:
+                    counter *= dim
+        for i, var in enumerate(model_conf.fetch_var):
+            self.fetch_types_[var.alias_name] = var.fetch_type
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+    def _pack_feed_data(self, feed, fetch, is_python):
+        req = multi_lang_general_model_service_pb2.Request()
+        req.fetch_var_names.extend(fetch)
+        req.feed_var_names.extend(feed.keys())
+        req.is_python = is_python
+        feed_batch = None
+        if isinstance(feed, dict):
+            feed_batch = [feed]
+        elif isinstance(feed, list):
+            feed_batch = feed
+        else:
+            raise Exception("{} not support".format(type(feed)))
+        init_feed_names = False
+        for feed_data in feed_batch:
+            inst = multi_lang_general_model_service_pb2.FeedInst()
+            for name in req.feed_var_names:
+                tensor = multi_lang_general_model_service_pb2.Tensor()
+                var = feed_data[name]
+                v_type = self.feed_types_[name]
+                if is_python:
+                    data = None
+                    if isinstance(var, list):
+                        if v_type == 0:  # int64
+                            data = np.array(var, dtype="int64")
+                        elif v_type == 1:  # float32
+                            data = np.array(var, dtype="float32")
+                        else:
+                            raise Exception("error type.")
+                    else:
+                        data = var
+                        if var.dtype == "float64":
+                            data = data.astype("float32")
+                    tensor.data = data.tobytes()
+                else:
+                    if v_type == 0:  # int64
+                        if isinstance(var, np.ndarray):
+                            tensor.int64_data.extend(var.reshape(-1).tolist())
+                        else:
+                            tensor.int64_data.extend(self._flatten_list(var))
+                    elif v_type == 1:  # float32
+                        if isinstance(var, np.ndarray):
+                            tensor.float_data.extend(var.reshape(-1).tolist())
+                        else:
+                            tensor.float_data.extend(self._flatten_list(var))
+                    else:
+                        raise Exception("error type.")
+                if isinstance(var, np.ndarray):
+                    tensor.shape.extend(list(var.shape))
+                else:
+                    tensor.shape.extend(self.feed_shapes_[name])
+                inst.tensor_array.append(tensor)
+            req.insts.append(inst)
+        return req
+    def _unpack_resp(self, resp, fetch, is_python, need_variant_tag):
+        result_map = {}
+        inst = resp.outputs[0].insts[0]
+        tag = resp.tag
+        for i, name in enumerate(fetch):
+            var = inst.tensor_array[i]
+            v_type = self.fetch_types_[name]
+            if is_python:
+                if v_type == 0:  # int64
+                    result_map[name] = np.frombuffer(var.data, dtype="int64")
+                elif v_type == 1:  # float32
+                    result_map[name] = np.frombuffer(var.data, dtype="float32")
+                else:
+                    raise Exception("error type.")
+            else:
+                if v_type == 0:  # int64
+                    result_map[name] = np.array(
+                        list(var.int64_data), dtype="int64")
+                elif v_type == 1:  # float32
+                    result_map[name] = np.array(
+                        list(var.float_data), dtype="float32")
+                else:
+                    raise Exception("error type.")
+            result_map[name].shape = list(var.shape)
+            if name in self.lod_tensor_set_:
+                result_map["{}.lod".format(name)] = np.array(list(var.lod))
+        return result_map if not need_variant_tag else [result_map, tag]
+    def _done_callback_func(self, fetch, is_python, need_variant_tag):
+        def unpack_resp(resp):
+            return self._unpack_resp(resp, fetch, is_python, need_variant_tag)
+        return unpack_resp
+    def predict(self,
+                feed,
+                fetch,
+                need_variant_tag=False,
+                asyn=False,
+                is_python=True):
+        req = self._pack_feed_data(feed, fetch, is_python=is_python)
+        if not asyn:
+            resp = self.stub_.inference(req)
+            return self._unpack_resp(
+                resp,
+                fetch,
+                is_python=is_python,
+                need_variant_tag=need_variant_tag)
+        else:
+            call_future = self.stub_.inference.future(req)
+            return MultiLangPredictFuture(
+                call_future,
+                self._done_callback_func(
+                    fetch,
+                    is_python=is_python,
+                    need_variant_tag=need_variant_tag))
+class MultiLangPredictFuture(object):
+    def __init__(self, call_future, callback_func):
+        self.call_future_ = call_future
+        self.callback_func_ = callback_func
+    def result(self):
+        resp = self.call_future_.result()
+        return self.callback_func_(resp)
--- a/python/paddle_serving_client/pyclient.py
+++ b/python/paddle_serving_client/pyclient.py
@@ -13,8 +13,8 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing
 import grpc
-import general_python_service_pb2
+from .proto import general_python_service_pb2
-import general_python_service_pb2_grpc
+from .proto import general_python_service_pb2_grpc
 import numpy as np
@@ -30,27 +30,33 @@ class PyClient(object):
    def _pack_data_for_infer(self, feed_data):
        req = general_python_service_pb2.Request()
        for name, data in feed_data.items():
-            if not isinstance(data, np.ndarray):
+            if isinstance(data, list):
-                raise TypeError(
+                data = np.array(data)
-                    "only numpy array type is supported temporarily.")
+            elif not isinstance(data, np.ndarray):
-            data2bytes = np.ndarray.tobytes(data)
+                raise TypeError("only list and numpy array type is supported.")
            req.feed_var_names.append(name)
-            req.feed_insts.append(data2bytes)
+            req.feed_insts.append(data.tobytes())
+            req.shape.append(np.array(data.shape, dtype="int32").tobytes())
+            req.type.append(str(data.dtype))
        return req
-    def predict(self, feed, fetch_with_type):
+    def predict(self, feed, fetch):
        if not isinstance(feed, dict):
            raise TypeError(
                "feed must be dict type with format: {name: value}.")
-        if not isinstance(fetch_with_type, dict):
+        if not isinstance(fetch, list):
            raise TypeError(
-                "fetch_with_type must be dict type with format: {name : type}.")
+                "fetch_with_type must be list type with format: [name].")
        req = self._pack_data_for_infer(feed)
        resp = self._stub.inference(req)
-        fetch_map = {}
+        if resp.ecode != 0:
+            return {"ecode": resp.ecode, "error_info": resp.error_info}
+        fetch_map = {"ecode": resp.ecode}
        for idx, name in enumerate(resp.fetch_var_names):
-            if name not in fetch_with_type:
+            if name not in fetch:
                continue
            fetch_map[name] = np.frombuffer(
-                resp.fetch_insts[idx], dtype=fetch_with_type[name])
+                resp.fetch_insts[idx], dtype=resp.type[idx])
+            fetch_map[name].shape = np.frombuffer(
+                resp.shape[idx], dtype="int32")
        return fetch_map
--- a/python/paddle_serving_client/utils/__init__.py
+++ b/python/paddle_serving_client/utils/__init__.py
@@ -17,6 +17,7 @@ import sys
 import subprocess
 import argparse
 from multiprocessing import Pool
+import numpy as np
 def benchmark_args():
@@ -35,6 +36,17 @@ def benchmark_args():
    return parser.parse_args()
+def show_latency(latency_list):
+    latency_array = np.array(latency_list)
+    info = "latency:\n"
+    info += "mean :{} ms\n".format(np.mean(latency_array))
+    info += "median :{} ms\n".format(np.median(latency_array))
+    info += "80 percent :{} ms\n".format(np.percentile(latency_array, 80))
+    info += "90 percent :{} ms\n".format(np.percentile(latency_array, 90))
+    info += "99 percent :{} ms\n".format(np.percentile(latency_array, 99))
+    sys.stderr.write(info)
 class MultiThreadRunner(object):
    def __init__(self):
        pass

--- a/python/paddle_serving_server/__init__.py
+++ b/python/paddle_serving_server/__init__.py
@@ -23,6 +23,17 @@ import paddle_serving_server as paddle_serving_server
 from .version import serving_server_version
 from contextlib import closing
 import collections
+import fcntl
+import numpy as np
+import grpc
+from .proto import multi_lang_general_model_service_pb2
+import sys
+sys.path.append(
+    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
+from .proto import multi_lang_general_model_service_pb2_grpc
+from multiprocessing import Pool, Process
+from concurrent import futures
 class OpMaker(object):
@@ -322,6 +333,10 @@ class Server(object):
        bin_url = "https://paddle-serving.bj.bcebos.com/bin/" + tar_name
        self.server_path = os.path.join(self.module_path, floder_name)
+        #acquire lock
+        version_file = open("{}/version.py".format(self.module_path), "r")
+        fcntl.flock(version_file, fcntl.LOCK_EX)
        if not os.path.exists(self.server_path):
            print('Frist time run, downloading PaddleServing components ...')
            r = os.system('wget ' + bin_url + ' --no-check-certificate')
@@ -345,6 +360,8 @@ class Server(object):
                        foemat(self.module_path))
                finally:
                    os.remove(tar_name)
+        #release lock
+        version_file.close()
        os.chdir(self.cur_path)
        self.bin_path = self.server_path + "/serving"
@@ -421,3 +438,158 @@ class Server(object):
        print("Going to Run Command")
        print(command)
        os.system(command)
+class MultiLangServerService(
+        multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelService):
+    def __init__(self, model_config_path, endpoints):
+        from paddle_serving_client import Client
+        self._parse_model_config(model_config_path)
+        self.bclient_ = Client()
+        self.bclient_.load_client_config(
+            "{}/serving_server_conf.prototxt".format(model_config_path))
+        self.bclient_.connect(endpoints)
+    def _parse_model_config(self, model_config_path):
+        model_conf = m_config.GeneralModelConfig()
+        f = open("{}/serving_server_conf.prototxt".format(model_config_path),
+                 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
+        self.feed_types_ = {}
+        self.feed_shapes_ = {}
+        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
+        self.fetch_types_ = {}
+        self.lod_tensor_set_ = set()
+        for i, var in enumerate(model_conf.feed_var):
+            self.feed_types_[var.alias_name] = var.feed_type
+            self.feed_shapes_[var.alias_name] = var.shape
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+        for i, var in enumerate(model_conf.fetch_var):
+            self.fetch_types_[var.alias_name] = var.fetch_type
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+    def _flatten_list(self, nested_list):
+        for item in nested_list:
+            if isinstance(item, (list, tuple)):
+                for sub_item in self._flatten_list(item):
+                    yield sub_item
+            else:
+                yield item
+    def _unpack_request(self, request):
+        feed_names = list(request.feed_var_names)
+        fetch_names = list(request.fetch_var_names)
+        is_python = request.is_python
+        feed_batch = []
+        for feed_inst in request.insts:
+            feed_dict = {}
+            for idx, name in enumerate(feed_names):
+                var = feed_inst.tensor_array[idx]
+                v_type = self.feed_types_[name]
+                data = None
+                if is_python:
+                    if v_type == 0:
+                        data = np.frombuffer(var.data, dtype="int64")
+                    elif v_type == 1:
+                        data = np.frombuffer(var.data, dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                else:
+                    if v_type == 0:  # int64
+                        data = np.array(list(var.int64_data), dtype="int64")
+                    elif v_type == 1:  # float32
+                        data = np.array(list(var.float_data), dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                data.shape = list(feed_inst.tensor_array[idx].shape)
+                feed_dict[name] = data
+            feed_batch.append(feed_dict)
+        return feed_batch, fetch_names, is_python
+    def _pack_resp_package(self, result, fetch_names, is_python, tag):
+        resp = multi_lang_general_model_service_pb2.Response()
+        # Only one model is supported temporarily
+        model_output = multi_lang_general_model_service_pb2.ModelOutput()
+        inst = multi_lang_general_model_service_pb2.FetchInst()
+        for idx, name in enumerate(fetch_names):
+            tensor = multi_lang_general_model_service_pb2.Tensor()
+            v_type = self.fetch_types_[name]
+            if is_python:
+                tensor.data = result[name].tobytes()
+            else:
+                if v_type == 0:  # int64
+                    tensor.int64_data.extend(result[name].reshape(-1).tolist())
+                elif v_type == 1:  # float32
+                    tensor.float_data.extend(result[name].reshape(-1).tolist())
+                else:
+                    raise Exception("error type.")
+            tensor.shape.extend(list(result[name].shape))
+            if name in self.lod_tensor_set_:
+                tensor.lod.extend(result["{}.lod".format(name)].tolist())
+            inst.tensor_array.append(tensor)
+        model_output.insts.append(inst)
+        resp.outputs.append(model_output)
+        resp.tag = tag
+        return resp
+    def inference(self, request, context):
+        feed_dict, fetch_names, is_python = self._unpack_request(request)
+        data, tag = self.bclient_.predict(
+            feed=feed_dict, fetch=fetch_names, need_variant_tag=True)
+        return self._pack_resp_package(data, fetch_names, is_python, tag)
+class MultiLangServer(object):
+    def __init__(self, worker_num=2):
+        self.bserver_ = Server()
+        self.worker_num_ = worker_num
+    def set_op_sequence(self, op_seq):
+        self.bserver_.set_op_sequence(op_seq)
+    def load_model_config(self, model_config_path):
+        if not isinstance(model_config_path, str):
+            raise Exception(
+                "MultiLangServer only supports multi-model temporarily")
+        self.bserver_.load_model_config(model_config_path)
+        self.model_config_path_ = model_config_path
+    def prepare_server(self, workdir=None, port=9292, device="cpu"):
+        default_port = 12000
+        self.port_list_ = []
+        for i in range(1000):
+            if default_port + i != port and self._port_is_available(default_port
+                                                                    + i):
+                self.port_list_.append(default_port + i)
+                break
+        self.bserver_.prepare_server(
+            workdir=workdir, port=self.port_list_[0], device=device)
+        self.gport_ = port
+    def _launch_brpc_service(self, bserver):
+        bserver.run_server()
+    def _port_is_available(self, port):
+        with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
+            sock.settimeout(2)
+            result = sock.connect_ex(('0.0.0.0', port))
+        return result != 0
+    def run_server(self):
+        p_bserver = Process(
+            target=self._launch_brpc_service, args=(self.bserver_, ))
+        p_bserver.start()
+        server = grpc.server(
+            futures.ThreadPoolExecutor(max_workers=self.worker_num_))
+        multi_lang_general_model_service_pb2_grpc.add_MultiLangGeneralModelServiceServicer_to_server(
+            MultiLangServerService(self.model_config_path_,
+                                   ["0.0.0.0:{}".format(self.port_list_[0])]),
+            server)
+        server.add_insecure_port('[::]:{}'.format(self.gport_))
+        server.start()
+        p_bserver.join()
+        server.wait_for_termination()
--- a/python/paddle_serving_server/monitor.py
+++ b/python/paddle_serving_server/monitor.py
@@ -20,7 +20,7 @@ Usage:
 import os
 import time
 import argparse
-import commands
+import subprocess
 import datetime
 import shutil
 import tarfile
@@ -209,7 +209,7 @@ class HadoopMonitor(Monitor):
        remote_filepath = os.path.join(path, filename)
        cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath)
        _LOGGER.debug('check cmd: {}'.format(cmd))
-        [status, output] = commands.getstatusoutput(cmd)
+        [status, output] = subprocess.getstatusoutput(cmd)
        _LOGGER.debug('resp: {}'.format(output))
        if status == 0:
            [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split()

--- a/python/paddle_serving_server/pyserver.py
+++ b/python/paddle_serving_server/pyserver.py
--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -40,15 +40,23 @@ def parse_args():  # pylint: disable=doc-string-missing
    parser.add_argument(
        "--device", type=str, default="cpu", help="Type of device")
    parser.add_argument(
-        "--mem_optim", type=bool, default=False, help="Memory optimize")
+        "--mem_optim",
+        default=False,
+        action="store_true",
+        help="Memory optimize")
    parser.add_argument(
-        "--ir_optim", type=bool, default=False, help="Graph optimize")
+        "--ir_optim", default=False, action="store_true", help="Graph optimize")
-    parser.add_argument("--use_mkl", type=bool, default=False, help="Use MKL")
+    parser.add_argument(
+        "--use_mkl", default=False, action="store_true", help="Use MKL")
    parser.add_argument(
        "--max_body_size",
        type=int,
        default=512 * 1024 * 1024,
        help="Limit sizes of messages")
+    parser.add_argument(
+        "--use_multilang",
+        action='store_true',
+        help="Use Multi-language-service")
    return parser.parse_args()
@@ -63,6 +71,7 @@ def start_standard_model():  # pylint: disable=doc-string-missing
    ir_optim = args.ir_optim
    max_body_size = args.max_body_size
    use_mkl = args.use_mkl
+    use_multilang = args.use_multilang
    if model == "":
        print("You must specify your serving model")
@@ -79,14 +88,19 @@ def start_standard_model():  # pylint: disable=doc-string-missing
    op_seq_maker.add_op(general_infer_op)
    op_seq_maker.add_op(general_response_op)
-    server = serving.Server()
+    server = None
-    server.set_op_sequence(op_seq_maker.get_op_sequence())
+    if use_multilang:
-    server.set_num_threads(thread_num)
+        server = serving.MultiLangServer()
-    server.set_memory_optimize(mem_optim)
+        server.set_op_sequence(op_seq_maker.get_op_sequence())
-    server.set_ir_optimize(ir_optim)
+    else:
-    server.use_mkl(use_mkl)
+        server = serving.Server()
-    server.set_max_body_size(max_body_size)
+        server.set_op_sequence(op_seq_maker.get_op_sequence())
-    server.set_port(port)
+        server.set_num_threads(thread_num)
+        server.set_memory_optimize(mem_optim)
+        server.set_ir_optimize(ir_optim)
+        server.use_mkl(use_mkl)
+        server.set_max_body_size(max_body_size)
+        server.set_port(port)
    server.load_model_config(model)
    server.prepare_server(workdir=workdir, port=port, device=device)

--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
@@ -86,7 +86,7 @@ class WebService(object):
            for key in fetch_map:
                fetch_map[key] = fetch_map[key].tolist()
            fetch_map = self.postprocess(
-                feed=feed, fetch=fetch, fetch_map=fetch_map)
+                feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
            result = {"result": fetch_map}
        except ValueError:
            result = {"result": "Request Value Error"}

--- a/python/paddle_serving_server_gpu/__init__.py
+++ b/python/paddle_serving_server_gpu/__init__.py
@@ -25,6 +25,17 @@ from .version import serving_server_version
 from contextlib import closing
 import argparse
 import collections
+import fcntl
+import numpy as np
+import grpc
+from .proto import multi_lang_general_model_service_pb2
+import sys
+sys.path.append(
+    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
+from .proto import multi_lang_general_model_service_pb2_grpc
+from multiprocessing import Pool, Process
+from concurrent import futures
 def serve_args():
@@ -46,9 +57,12 @@ def serve_args():
    parser.add_argument(
        "--name", type=str, default="None", help="Default service name")
    parser.add_argument(
-        "--mem_optim", type=bool, default=False, help="Memory optimize")
+        "--mem_optim",
+        default=False,
+        action="store_true",
+        help="Memory optimize")
    parser.add_argument(
-        "--ir_optim", type=bool, default=False, help="Graph optimize")
+        "--ir_optim", default=False, action="store_true", help="Graph optimize")
    parser.add_argument(
        "--max_body_size",
        type=int,
@@ -347,6 +361,11 @@ class Server(object):
        download_flag = "{}/{}.is_download".format(self.module_path,
                                                   folder_name)
+        #acquire lock
+        version_file = open("{}/version.py".format(self.module_path), "r")
+        fcntl.flock(version_file, fcntl.LOCK_EX)
        if os.path.exists(download_flag):
            os.chdir(self.cur_path)
            self.bin_path = self.server_path + "/serving"
@@ -377,6 +396,8 @@ class Server(object):
                        format(self.module_path))
                finally:
                    os.remove(tar_name)
+        #release lock
+        version_file.close()
        os.chdir(self.cur_path)
        self.bin_path = self.server_path + "/serving"
@@ -461,3 +482,158 @@ class Server(object):
        print(command)
        os.system(command)
+class MultiLangServerService(
+        multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelService):
+    def __init__(self, model_config_path, endpoints):
+        from paddle_serving_client import Client
+        self._parse_model_config(model_config_path)
+        self.bclient_ = Client()
+        self.bclient_.load_client_config(
+            "{}/serving_server_conf.prototxt".format(model_config_path))
+        self.bclient_.connect(endpoints)
+    def _parse_model_config(self, model_config_path):
+        model_conf = m_config.GeneralModelConfig()
+        f = open("{}/serving_server_conf.prototxt".format(model_config_path),
+                 'r')
+        model_conf = google.protobuf.text_format.Merge(
+            str(f.read()), model_conf)
+        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
+        self.feed_types_ = {}
+        self.feed_shapes_ = {}
+        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
+        self.fetch_types_ = {}
+        self.lod_tensor_set_ = set()
+        for i, var in enumerate(model_conf.feed_var):
+            self.feed_types_[var.alias_name] = var.feed_type
+            self.feed_shapes_[var.alias_name] = var.shape
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+        for i, var in enumerate(model_conf.fetch_var):
+            self.fetch_types_[var.alias_name] = var.fetch_type
+            if var.is_lod_tensor:
+                self.lod_tensor_set_.add(var.alias_name)
+    def _flatten_list(self, nested_list):
+        for item in nested_list:
+            if isinstance(item, (list, tuple)):
+                for sub_item in self._flatten_list(item):
+                    yield sub_item
+            else:
+                yield item
+    def _unpack_request(self, request):
+        feed_names = list(request.feed_var_names)
+        fetch_names = list(request.fetch_var_names)
+        is_python = request.is_python
+        feed_batch = []
+        for feed_inst in request.insts:
+            feed_dict = {}
+            for idx, name in enumerate(feed_names):
+                var = feed_inst.tensor_array[idx]
+                v_type = self.feed_types_[name]
+                data = None
+                if is_python:
+                    if v_type == 0:
+                        data = np.frombuffer(var.data, dtype="int64")
+                    elif v_type == 1:
+                        data = np.frombuffer(var.data, dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                else:
+                    if v_type == 0:  # int64
+                        data = np.array(list(var.int64_data), dtype="int64")
+                    elif v_type == 1:  # float32
+                        data = np.array(list(var.float_data), dtype="float32")
+                    else:
+                        raise Exception("error type.")
+                data.shape = list(feed_inst.tensor_array[idx].shape)
+                feed_dict[name] = data
+            feed_batch.append(feed_dict)
+        return feed_batch, fetch_names, is_python
+    def _pack_resp_package(self, result, fetch_names, is_python, tag):
+        resp = multi_lang_general_model_service_pb2.Response()
+        # Only one model is supported temporarily
+        model_output = multi_lang_general_model_service_pb2.ModelOutput()
+        inst = multi_lang_general_model_service_pb2.FetchInst()
+        for idx, name in enumerate(fetch_names):
+            tensor = multi_lang_general_model_service_pb2.Tensor()
+            v_type = self.fetch_types_[name]
+            if is_python:
+                tensor.data = result[name].tobytes()
+            else:
+                if v_type == 0:  # int64
+                    tensor.int64_data.extend(result[name].reshape(-1).tolist())
+                elif v_type == 1:  # float32
+                    tensor.float_data.extend(result[name].reshape(-1).tolist())
+                else:
+                    raise Exception("error type.")
+            tensor.shape.extend(list(result[name].shape))
+            if name in self.lod_tensor_set_:
+                tensor.lod.extend(result["{}.lod".format(name)].tolist())
+            inst.tensor_array.append(tensor)
+        model_output.insts.append(inst)
+        resp.outputs.append(model_output)
+        resp.tag = tag
+        return resp
+    def inference(self, request, context):
+        feed_dict, fetch_names, is_python = self._unpack_request(request)
+        data, tag = self.bclient_.predict(
+            feed=feed_dict, fetch=fetch_names, need_variant_tag=True)
+        return self._pack_resp_package(data, fetch_names, is_python, tag)
+class MultiLangServer(object):
+    def __init__(self, worker_num=2):
+        self.bserver_ = Server()
+        self.worker_num_ = worker_num
+    def set_op_sequence(self, op_seq):
+        self.bserver_.set_op_sequence(op_seq)
+    def load_model_config(self, model_config_path):
+        if not isinstance(model_config_path, str):
+            raise Exception(
+                "MultiLangServer only supports multi-model temporarily")
+        self.bserver_.load_model_config(model_config_path)
+        self.model_config_path_ = model_config_path
+    def prepare_server(self, workdir=None, port=9292, device="cpu"):
+        default_port = 12000
+        self.port_list_ = []
+        for i in range(1000):
+            if default_port + i != port and self._port_is_available(default_port
+                                                                    + i):
+                self.port_list_.append(default_port + i)
+                break
+        self.bserver_.prepare_server(
+            workdir=workdir, port=self.port_list_[0], device=device)
+        self.gport_ = port
+    def _launch_brpc_service(self, bserver):
+        bserver.run_server()
+    def _port_is_available(self, port):
+        with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
+            sock.settimeout(2)
+            result = sock.connect_ex(('0.0.0.0', port))
+        return result != 0
+    def run_server(self):
+        p_bserver = Process(
+            target=self._launch_brpc_service, args=(self.bserver_, ))
+        p_bserver.start()
+        server = grpc.server(
+            futures.ThreadPoolExecutor(max_workers=self.worker_num_))
+        multi_lang_general_model_service_pb2_grpc.add_MultiLangGeneralModelServiceServicer_to_server(
+            MultiLangServerService(self.model_config_path_,
+                                   ["0.0.0.0:{}".format(self.port_list_[0])]),
+            server)
+        server.add_insecure_port('[::]:{}'.format(self.gport_))
+        server.start()
+        p_bserver.join()
+        server.wait_for_termination()
--- a/python/paddle_serving_server_gpu/monitor.py
+++ b/python/paddle_serving_server_gpu/monitor.py
@@ -20,7 +20,7 @@ Usage:
 import os
 import time
 import argparse
-import commands
+import subprocess
 import datetime
 import shutil
 import tarfile
@@ -209,7 +209,7 @@ class HadoopMonitor(Monitor):
        remote_filepath = os.path.join(path, filename)
        cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath)
        _LOGGER.debug('check cmd: {}'.format(cmd))
-        [status, output] = commands.getstatusoutput(cmd)
+        [status, output] = subprocess.getstatusoutput(cmd)
        _LOGGER.debug('resp: {}'.format(output))
        if status == 0:
            [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split()

--- a/python/paddle_serving_server_gpu/web_service.py
+++ b/python/paddle_serving_server_gpu/web_service.py
@@ -131,7 +131,7 @@ class WebService(object):
            for key in fetch_map:
                fetch_map[key] = fetch_map[key].tolist()
            result = self.postprocess(
-                feed=feed, fetch=fetch, fetch_map=fetch_map)
+                feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
            result = {"result": result}
        except ValueError:
            result = {"result": "Request Value Error"}

--- a/python/requirements.txt
+++ b/python/requirements.txt
 numpy>=1.12, <=1.16.4 ; python_version<"3.5"
+grpcio-tools>=1.28.1
+grpcio>=1.28.1
--- a/python/setup.py.app.in
+++ b/python/setup.py.app.in
@@ -42,7 +42,8 @@ if '${PACK}' == 'ON':
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'sentencepiece', 'opencv-python', 'pillow'
+    'six >= 1.10.0', 'sentencepiece', 'opencv-python', 'pillow',
+    'shapely', 'pyclipper'
 ]
 packages=['paddle_serving_app',

--- a/python/setup.py.client.in
+++ b/python/setup.py.client.in
@@ -58,7 +58,8 @@ if '${PACK}' == 'ON':
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.1.0', 'numpy >= 1.12'
+    'six >= 1.10.0', 'protobuf >= 3.1.0', 'numpy >= 1.12', 'grpcio >= 1.28.1',
+    'grpcio-tools >= 1.28.1'
 ]
 if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):

--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
@@ -37,13 +37,10 @@ def python_version():
 max_version, mid_version, min_version = python_version()
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.1.0',
+    'six >= 1.10.0', 'protobuf >= 3.1.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
-    'paddle_serving_client', 'flask >= 1.1.1'
+    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
 ]
-if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
-    REQUIRED_PACKAGES.append("paddlepaddle")
 packages=['paddle_serving_server',
          'paddle_serving_server.proto']

--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
@@ -37,12 +37,10 @@ def python_version():
 max_version, mid_version, min_version = python_version()
 REQUIRED_PACKAGES = [
-    'six >= 1.10.0', 'protobuf >= 3.1.0',
+    'six >= 1.10.0', 'protobuf >= 3.1.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
-    'paddle_serving_client', 'flask >= 1.1.1'
+    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
 ]
-if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
-    REQUIRED_PACKAGES.append("paddlepaddle") 
 packages=['paddle_serving_server_gpu',
          'paddle_serving_server_gpu.proto']

--- a/tools/Dockerfile
+++ b/tools/Dockerfile
@@ -9,4 +9,6 @@ RUN yum -y install wget && \
    yum -y install python3 python3-devel && \
    yum clean all && \
    curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
-    python get-pip.py && rm get-pip.py
+    python get-pip.py && rm get-pip.py && \
+    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.centos6.devel
+++ b/tools/Dockerfile.centos6.devel
@@ -44,4 +44,6 @@ RUN yum -y install wget && \
    cd .. && rm -rf Python-3.6.8* && \
    pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
    yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
-    yum clean all
+    yum clean all && \
+    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.centos6.gpu.devel
+++ b/tools/Dockerfile.centos6.gpu.devel
@@ -44,4 +44,5 @@ RUN yum -y install wget && \
    cd .. && rm -rf Python-3.6.8* && \
    pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
    yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
-    yum clean all
+    yum clean all && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
@@ -21,4 +21,6 @@ RUN yum -y install wget >/dev/null \
    && yum install -y python3 python3-devel \
    && pip3 install google protobuf setuptools wheel flask \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all
+    && yum clean all \
+    && localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
+    && echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.gpu
+++ b/tools/Dockerfile.gpu
-FROM nvidia/cuda:9.0-cudnn7-runtime-centos7
+FROM nvidia/cuda:9.0-cudnn7-devel-centos7 as builder
+FROM nvidia/cuda:9.0-cudnn7-runtime-centos7
 RUN yum -y install wget && \
    yum -y install epel-release && yum -y install patchelf && \
    yum -y install gcc make python-devel && \
@@ -13,4 +14,8 @@ RUN yum -y install wget && \
    ln -s /usr/local/cuda-9.0/lib64/libcublas.so.9.0 /usr/local/cuda-9.0/lib64/libcublas.so && \
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \
    ln -s /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so && \
-    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc
+    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc && \
+    mkdir -p /usr/local/cuda/extras
+COPY --from=builder /usr/local/cuda/extras/CUPTI /usr/local/cuda/extras/CUPTI
--- a/tools/Dockerfile.gpu.devel
+++ b/tools/Dockerfile.gpu.devel
@@ -22,4 +22,5 @@ RUN yum -y install wget >/dev/null \
    && yum install -y python3 python3-devel \
    && pip3 install google protobuf setuptools wheel flask \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all
+    && yum clean all \
+    && echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/serving_build.sh
+++ b/tools/serving_build.sh
 #!/usr/bin/env bash
+set -x
 function unsetproxy() {
    HTTP_PROXY_TEMP=$http_proxy
    HTTPS_PROXY_TEMP=$https_proxy
@@ -375,16 +375,17 @@ function python_test_multi_process(){
    sh get_data.sh
    case $TYPE in
        CPU)
-            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9292 &"
+            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9292 --workdir test9292 &"
-            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9293 &"
+            check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9293 --workdir test9293 &"
            sleep 5
            check_cmd "python test_multi_process_client.py"
            kill_server_process
            echo "bert mutli rpc RPC inference pass"
            ;;
        GPU)
-            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --gpu_ids 0 &"
+            rm -rf ./image #TODO: The following code tried to create this folder, but no corresponding code was found
-            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9293 --gpu_ids 0 &"
+            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --workdir test9292 --gpu_ids 0 &"
+            check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9293 --workdir test9293 --gpu_ids 0 &"
            sleep 5
            check_cmd "python test_multi_process_client.py"
            kill_server_process
@@ -454,15 +455,16 @@ function python_test_lac() {
    cd lac # pwd: /Serving/python/examples/lac
    case $TYPE in
        CPU)
-            sh get_data.sh
+            python -m paddle_serving_app.package --get_model lac
-            check_cmd "python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292 &"
+            tar -xzvf lac.tar.gz
+            check_cmd "python -m paddle_serving_server.serve --model lac_model/ --port 9292 &"
            sleep 5
-            check_cmd "echo \"我爱北京天安门\" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/"
+            check_cmd "echo \"我爱北京天安门\" | python lac_client.py lac_client/serving_client_conf.prototxt "
            echo "lac CPU RPC inference pass"
            kill_server_process
            unsetproxy # maybe the proxy is used on iPipe, which makes web-test failed.
-            check_cmd "python lac_web_service.py jieba_server_model/ lac_workdir 9292 &"
+            check_cmd "python lac_web_service.py lac_model/ lac_workdir 9292 &"
            sleep 5
            check_cmd "curl -H \"Content-Type:application/json\" -X POST -d '{\"feed\":[{\"words\": \"我爱北京天安门\"}], \"fetch\":[\"word_seg\"]}' http://127.0.0.1:9292/lac/prediction"
            # check http code