diff --git a/README.md b/README.md
index 297a6cf7901d0f18418b573a622a55f33bbed2cc..de48b7a9baa457f4d062e43d5fb0c79757a2a68d 100644
--- a/README.md
+++ b/README.md
@@ -18,19 +18,19 @@
 
 <h2 align="center">Motivation</h2>
 
-We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you can put the model online without much effort. A demo of serving is as follows:
+We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
 <p align="center">
     <img src="doc/demo.gif" width="700">
 </p>
 
 <h2 align="center">Some Key Features</h2>
 
-- Integrate with Paddle training pipeline seemlessly, most paddle models can be deployed **with one line command**.
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
 - **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
-- **Distributed Key-Value indexing** supported that is especially useful for large scale sparse features as model inputs.
-- **Highly concurrent and efficient communication** between clients and servers.
-- **Multiple programming languages** supported on client side, such as Golang, C++ and python
-- **Extensible framework design** that can support model serving beyond Paddle.
+- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
+- **Highly concurrent and efficient communication** between clients and servers supported.
+- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
+- **Extensible framework design** which can support model serving beyond Paddle.
 
 <h2 align="center">Installation</h2>
 
@@ -53,7 +53,7 @@ Paddle Serving provides HTTP and RPC based service for users to access
 
 ### HTTP service
 
-Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a rpc service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
+Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a RPC service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
 ``` shell
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
 ```
@@ -75,7 +75,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.25
 
 ### RPC service
 
-A user can also start a rpc service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
+A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
 ``` shell
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```
@@ -154,34 +154,111 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
 {"label":"daisy","prob":0.9341403245925903}
 ```
 
+<h3 align="center">More Demos</h4>
 
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | Bert-Base-Baike                                              |
+| URL                | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
+| Description        | Get semantic representation from a Chinese Sentence          |
 
 
 
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | Resnet50-Imagenet                                            |
+| URL                | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
+| Description        | Get image semantic representation from an image              |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | Resnet101-Imagenet                                           |
+| URL                | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
+| Description        | Get image semantic representation from an image              |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | CNN-IMDB                                                     |
+| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
+| Description        | Get category probability from an English Sentence            |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | LSTM-IMDB                                                    |
+| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
+| Description        | Get category probability from an English Sentence            |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | BOW-IMDB                                                     |
+| URL                | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
+| Description        | Get category probability from an English Sentence            |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | Jieba-LAC                                                    |
+| URL                | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz    |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
+| Description        | Get word segmentation from a Chinese Sentence                |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | DNN-CTR                                                      |
+| URL                | None(Get model by [local_train.py](./python/examples/criteo_ctr/local_train.py))                            |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
+| Description        | Get click probability from a feature vector of item          |
+
+
+
+| Key                | Value                                                        |
+| :----------------- | :----------------------------------------------------------- |
+| Model Name         | DNN-CTR(with cube)                                           |
+| URL                | None(Get model by [local_train.py](python/examples/criteo_ctr_with_cube/local_train.py))                            |
+| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr_with_cube |
+| Description        | Get click probability from a feature vector of item          |
+
 <h2 align="center">Document</h2>
 
 ### New to Paddle Serving
 - [How to save a servable model?](doc/SAVE.md)
-- [An end-to-end tutorial from training to serving(Chinese)](doc/TRAIN_TO_SERVICE.md)
-- [Write Bert-as-Service in 10 minutes(Chinese)](doc/BERT_10_MINS.md)
+- [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md)
+- [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
 
 ### Developers
 - [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
-- [How to develop a new Serving operator](doc/NEW_OPERATOR.md)
+- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
 - [Golang client](doc/IMDB_GO_CLIENT.md)
-- [Compile from source code(Chinese)](doc/COMPILE.md)
+- [Compile from source code](doc/COMPILE.md)
 
 ### About Efficiency
-- [How profile serving efficiency?(Chinese)](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
-- [Benchmarks](doc/BENCHMARK.md)
+- [How to profile Paddle Serving latency?(Chinese)](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
+- [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
+- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
 
 ### FAQ
 - [FAQ(Chinese)](doc/FAQ.md)
 
 
 ### Design
-- [Design Doc(Chinese)](doc/DESIGN_DOC.md)
-- [Design Doc(English)](doc/DESIGN_DOC_EN.md)
+- [Design Doc](doc/DESIGN_DOC.md)
 
 <h2 align="center">Community</h2>
 
diff --git a/README_CN.md b/README_CN.md
index 8400038f840a9f26a1342d9fcf4bd9729adcb06c..edc7b9f19d9236cbc80f3add08181a6a49359e1a 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -1,18 +1,31 @@
-<img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "127">
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130">
+    <br>
+<p>
+    
+<p align="center">
+    <br>
+    <a href="https://travis-ci.com/PaddlePaddle/Serving">
+        <img alt="Build Status" src="https://img.shields.io/travis/com/PaddlePaddle/Serving/develop">
+    </a>
+    <img alt="Release" src="https://img.shields.io/badge/Release-0.0.3-yellowgreen">
+    <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving">
+    <img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/Serving">
+    <img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
+    <br>
+<p>
 
-[![Build Status](https://img.shields.io/travis/com/PaddlePaddle/Serving/develop)](https://travis-ci.com/PaddlePaddle/Serving)
-[![Release](https://img.shields.io/badge/Release-0.0.3-yellowgreen)](Release)
-[![Issues](https://img.shields.io/github/issues/PaddlePaddle/Serving)](Issues)
-[![License](https://img.shields.io/github/license/PaddlePaddle/Serving)](LICENSE)
-[![Slack](https://img.shields.io/badge/Join-Slack-green)](https://paddleserving.slack.com/archives/CU0PB4K35)
+<h2 align="center">动机</h2>
+
+Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 当用户使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络，就同时拥有了该模型的预测服务。
 
-## 动机
-Paddle Serving 帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 只要你使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络，你就同时拥有了该模型的预测服务。
 <p align="center">
     <img src="doc/demo.gif" width="700">
 </p>
 
-## 核心功能
+<h2 align="center">核心功能</h2>
+
 - 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
 - 支持 **工业级的服务能力** 例如模型管理，在线加载，在线A/B测试等.
 - 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
@@ -20,7 +33,7 @@ Paddle Serving 帮助深度学习开发者轻易部署在线预测服务。 **
 - 支持 **多种编程语言** 开发客户端，例如Golang，C++和Python.
 - **可伸缩框架设计** 可支持不限于Paddle的模型服务.
 
-## 安装
+<h2 align="center">安装</h2>
 
 强烈建议您在Docker内构建Paddle Serving，请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)
 
@@ -29,17 +42,51 @@ pip install paddle-serving-client
 pip install paddle-serving-server
 ```
 
-## 快速启动示例
+<h2 align="center">快速启动示例</h2>
+
+<h3 align="center">波士顿房价预测</h3>
 
 ``` shell
 wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
 tar -xzf uci_housing.tar.gz
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```
 
-Python客户端请求
+Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
+
+
+<h3 align="center">HTTP服务</h3>
+
+Paddle Serving提供了一个名为`paddle_serving_server.serve`的内置python模块，可以使用单行命令启动RPC服务或HTTP服务。如果我们指定参数`--name uci`，则意味着我们将拥有一个HTTP服务，其URL为$IP:$PORT/uci/prediction`。
+
+``` shell
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
+```
+<center>
+
+| Argument | Type | Default | Description |
+|--------------|------|-----------|--------------------------------|
+| `thread` | int | `4` | Concurrency of current service |
+| `port` | int | `9292` | Exposed port of current service to users|
+| `name` | str | `""` | Service name, can be used to generate HTTP request url |
+| `model` | str | `""` | Path of paddle model directory to be served |
+
+我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
+</center>
+
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
+
+<h3 align="center">RPC服务</h3>
+
+用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发，但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。
+
+``` shell
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
+```
 
 ``` python
+# A user can visit rpc service through paddle_serving_client API
 from paddle_serving_client import Client
 
 client = Client()
@@ -51,24 +98,105 @@ fetch_map = client.predict(feed={"x": data}, fetch=["price"])
 print(fetch_map)
 
 ```
+在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。
+
+<h2 align="center">Paddle Serving预装的服务</h2>
+
+<h3 align="center">中文分词模型</h4>
+
+- **介绍**: 
+``` shell
+本示例为中文分词HTTP服务一键部署
+```
+
+- **下载服务包**: 
+``` shell
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
+```
+- **启动web服务**: 
+``` shell
+tar -xzf lac_model_jieba_web.tar.gz
+python lac_web_service.py jieba_server_model/ lac_workdir 9292
+```
+- **客户端请求示例**: 
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天安门", "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
+```
+- **返回结果示例**: 
+``` shell
+{"word_seg":"我|爱|北京|天安门"}
+```
+
+<h3 align="center">图像分类模型</h4>
+
+- **介绍**: 
+``` shell
+图像分类模型由Imagenet数据集训练而成，该服务会返回一个标签及其概率
+```
+
+- **下载服务包**: 
+``` shell
+wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
+```
+- **启动web服务**: 
+``` shell
+tar -xzf imagenet_demo.tar.gz
+python image_classification_service_demo.py resnet50_serving_model
+```
+- **客户端请求示例**: 
+
+<p align="center">
+    <br>
+<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
+    <br>
+<p>
+    
+``` shell
+curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg", "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
+```
+- **返回结果示例**: 
+``` shell
+{"label":"daisy","prob":0.9341403245925903}
+```
+
+<h2 align="center">文档</h2>
+
+### 新手教程
+- [怎样保存用于Paddle Serving的模型？](doc/SAVE_CN.md)
+- [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md)
+- [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
+
+### 开发者教程
+- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
+- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
+- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
+- [如何编译PaddleServing?](doc/COMPILE_CN.md)
+
+### 关于Paddle Serving性能
+- [如何测试Paddle Serving性能？](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
+- [CPU版Benchmarks](doc/BENCHMARKING.md)
+- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
+
+### FAQ
+- [常见问答](doc/deprecated/FAQ.md)
 
-## 文档
+### 设计文档
+- [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
 
-[开发文档](doc/DESIGN.md)
+<h2 align="center">社区</h2>
 
-[如何在服务器端配置本地Op?](doc/SERVER_DAG.md)
+### Slack
 
-[如何开发一个新的Op?](doc/NEW_OPERATOR.md)
+想要同开发者和其他用户沟通吗？欢迎加入我们的 [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
 
-[Golang 客户端](doc/IMDB_GO_CLIENT.md)
+### 贡献代码
 
-[从源码编译](doc/COMPILE.md)
+如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines](doc/CONTRIBUTE.md)
 
-[常见问答](doc/FAQ.md)
+### 反馈
 
-## 加入社区
-如果您想要联系其他用户和开发者，欢迎加入我们的 [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
+如有任何反馈或是bug，请在 [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues)提交
 
-## 如何贡献代码
+### License
 
-如果您想要贡献代码给Paddle Serving，请参考[Contribution Guidelines](doc/CONTRIBUTE.md)
+[Apache 2.0 License](https://github.com/PaddlePaddle/Serving/blob/develop/LICENSE)
diff --git a/doc/ABTEST_IN_PADDLE_SERVING.md b/doc/ABTEST_IN_PADDLE_SERVING.md
index 17497fecf680c7579319e4f148a6e0af764dcda5..e02acbd8a1a6cfdb296cedf32ad7b7afc63995d7 100644
--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -1,5 +1,7 @@
 # ABTEST in Paddle Serving
 
+([简体中文](./ABTEST_IN_PADDLE_SERVING_CN.md)|English)
+
 This document will use an example of text classification task based on IMDB dataset to show how to build a A/B Test framework using Paddle Serving. The structure relationship between the client and servers in the example is shown in the figure below.
 
 <img src="abtest.png" style="zoom:33%;" />
diff --git a/doc/ABTEST_IN_PADDLE_SERVING_CN.md b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
index d31ddba6f72dfc23fa15defeda23468ab1785e62..e32bf783fcde20bb5dff3d2addaf764838975a81 100644
--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -1,5 +1,7 @@
 # 如何使用Paddle Serving做ABTEST
 
+(简体中文|[English](./ABTEST_IN_PADDLE_SERVING.md))
+
 该文档将会用一个基于IMDB数据集的文本分类任务的例子，介绍如何使用Paddle Serving搭建A/B Test框架，例中的Client端、Server端结构如下图所示。
 
 <img src="abtest.png" style="zoom:33%;" />
diff --git a/doc/deprecated/BENCHMARKING.md b/doc/BENCHMARKING.md
similarity index 100%
rename from doc/deprecated/BENCHMARKING.md
rename to doc/BENCHMARKING.md
diff --git a/doc/DESIGN.md b/doc/DESIGN.md
index 8686eb7fc585c9df89218bd25262678fb49468d1..5d00d02171dccf07bfdafb9cdd85222a92c20113 100644
--- a/doc/DESIGN.md
+++ b/doc/DESIGN.md
@@ -14,17 +14,17 @@ The result is a complete serving solution.
 
 ## 2. Terms explanation
 
-- baidu-rpc: Baidu's official open source RPC framework, supports multiple common communication protocols, and provides a custom interface experience based on protobuf
-- Variant: Paddle Serving architecture is an abstraction of a minimal prediction cluster, which is characterized by all internal instances (replicas) being completely homogeneous and logically corresponding to a fixed version of a model
-- Endpoint: Multiple Variants form an Endpoint. Logically, Endpoint represents a model, and Variants within the Endpoint represent different versions.
-- OP: PaddlePaddle is used to encapsulate a numerical calculation operator, Paddle Serving is used to represent a basic business operation operator, and the core interface is inference. OP configures its dependent upstream OP to connect multiple OPs into a workflow
-- Channel: An abstraction of all request-level intermediate data of the OP; data exchange between OPs through Channels
-- Bus: manages all channels in a thread, and schedules the access relationship between the two sets of OP and Channel according to the DAG dependency graph between DAGs
-- Stage: Workflow according to the topology diagram described by DAG, a collection of OPs that belong to the same link and can be executed in parallel
-- Node: An Op operator instance composed of an Op operator class combined with parameter configuration, which is also an execution unit in Workflow
-- Workflow: executes the inference interface of each OP in order according to the topology described by DAG
-- DAG/Workflow: consists of several interdependent Nodes. Each Node can obtain the Request object through a specific interface. The node Op obtains the output object of its pre-op through the dependency relationship. The output of the last Node is the Response object by default.
-- Service: encapsulates a pv request, can configure several Workflows, reuse the current PV's Request object with each other, and then execute each in parallel/serial execution, and finally write the Response to the corresponding output slot; a Paddle-serving process Multiple sets of Service interfaces can be configured. The upstream determines the Service interface currently accessed based on the ServiceName.
+- **baidu-rpc**: Baidu's official open source RPC framework, supports multiple common communication protocols, and provides a custom interface experience based on protobuf
+- **Variant**: Paddle Serving architecture is an abstraction of a minimal prediction cluster, which is characterized by all internal instances (replicas) being completely homogeneous and logically corresponding to a fixed version of a model
+- **Endpoint**: Multiple Variants form an Endpoint. Logically, Endpoint represents a model, and Variants within the Endpoint represent different versions.
+- **OP**: PaddlePaddle is used to encapsulate a numerical calculation operator, Paddle Serving is used to represent a basic business operation operator, and the core interface is inference. OP configures its dependent upstream OP to connect multiple OPs into a workflow
+- **Channel**: An abstraction of all request-level intermediate data of the OP; data exchange between OPs through Channels
+- **Bus**: manages all channels in a thread, and schedules the access relationship between the two sets of OP and Channel according to the DAG dependency graph between DAGs
+- **Stage**: Workflow according to the topology diagram described by DAG, a collection of OPs that belong to the same link and can be executed in parallel
+- **Node**: An OP operator instance composed of an OP operator class combined with parameter configuration, which is also an execution unit in Workflow
+- **Workflow**: executes the inference interface of each OP in order according to the topology described by DAG
+- **DAG/Workflow**: consists of several interdependent Nodes. Each Node can obtain the Request object through a specific interface. The node Op obtains the output object of its pre-op through the dependency relationship. The output of the last Node is the Response object by default.
+- **Service**: encapsulates a pv request, can configure several Workflows, reuse the current PV's Request object with each other, and then execute each in parallel/serial execution, and finally write the Response to the corresponding output slot; a Paddle-serving process Multiple sets of Service interfaces can be configured. The upstream determines the Service interface currently accessed based on the ServiceName.
 
 ## 3. Python Interface Design
 
@@ -38,10 +38,10 @@ Models that can be predicted using the Paddle Inference Library, models saved du
 
 ### 3.3 Overall design:
 
-The user starts the Client and Server through the Python Client. The Python API has a function to check whether the interconnection and the models to be accessed match.
-The Python API calls the pybind corresponding to the client and server functions implemented by Paddle Serving, and the information transmitted through RPC is implemented through RPC.
-The Client Python API currently has two simple functions, load_inference_conf and predict, which are used to perform loading of the model to be predicted and prediction, respectively.
-The Server Python API is mainly responsible for loading the estimation model and generating various configurations required by Paddle Serving, including engines, workflow, resources, etc.
+- The user starts the Client and Server through the Python Client. The Python API has a function to check whether the interconnection and the models to be accessed match.
+- The Python API calls the pybind corresponding to the client and server functions implemented by Paddle Serving, and the information transmitted through RPC is implemented through RPC.
+- The Client Python API currently has two simple functions, load_inference_conf and predict, which are used to perform loading of the model to be predicted and prediction, respectively.
+- The Server Python API is mainly responsible for loading the inference model and generating various configurations required by Paddle Serving, including engines, workflow, resources, etc.
 
 ### 3.4 Server Inferface
 
@@ -69,8 +69,8 @@ def save_model(server_model_folder,
 ![Paddle-Serging Overall Architecture](framework.png)
 
 **Model Management Framework**: Connects model files of multiple machine learning platforms and provides a unified inference interface
-**Business Scheduling Framework**: Abstracts the calculation logic of various different prediction models, provides a general DAG scheduling framework, and connects different operators through DAG diagrams to complete a prediction service together. This abstract model allows users to conveniently implement their own calculation logic, and at the same time facilitates operator sharing. (Users build their own forecasting services. A large part of their work is to build DAGs and provide operators.)
-**PredictService**: Encapsulation of the externally provided prediction service interface. Define communication fields with the client through protobuf.
+**Business Scheduling Framework**: Abstracts the calculation logic of various different inference models, provides a general DAG scheduling framework, and connects different operators through DAG diagrams to complete a prediction service together. This abstract model allows users to conveniently implement their own calculation logic, and at the same time facilitates operator sharing. (Users build their own forecasting services. A large part of their work is to build DAGs and provide operators.)
+**Predict Service**: Encapsulation of the externally provided prediction service interface. Define communication fields with the client through protobuf.
 
 ### 4.1 Model Management Framework
 
diff --git a/doc/DESIGN_CN.md b/doc/DESIGN_CN.md
index 2e10013fc46c4b121ffe5c9268e5b531fe7f9992..124e826c4591c89cb14d25153f4c9a3096ea8dfb 100644
--- a/doc/DESIGN_CN.md
+++ b/doc/DESIGN_CN.md
@@ -4,7 +4,7 @@
 
 ## 1. 项目背景
 
-PaddlePaddle是公司开源的机器学习框架，广泛支持各种深度学习模型的定制化开发; Paddle serving是Paddle的在线预测部分，与Paddle模型训练环节无缝衔接，提供机器学习预测云服务。本文将从模型、服务、接入等层面，自底向上描述Paddle Serving设计方案。
+PaddlePaddle是百度开源的机器学习框架，广泛支持各种深度学习模型的定制化开发; Paddle Serving是Paddle的在线预测部分，与Paddle模型训练环节无缝衔接，提供机器学习预测云服务。本文将从模型、服务、接入等层面，自底向上描述Paddle Serving设计方案。
 
 1. 模型是Paddle Serving预测的核心，包括模型数据和推理计算的管理；
 2. 预测框架封装模型推理计算，对外提供RPC接口，对接不同上游；
@@ -14,23 +14,23 @@ PaddlePaddle是公司开源的机器学习框架，广泛支持各种深度学
 
 ## 2. 名词解释
 
-- baidu-rpc 百度官方开源RPC框架，支持多种常见通信协议，提供基于protobuf的自定义接口体验
-- Variant Paddle Serving架构对一个最小预测集群的抽象，其特点是内部所有实例（副本）完全同质，逻辑上对应一个model的一个固定版本
-- Endpoint 多个Variant组成一个Endpoint，逻辑上看，Endpoint代表一个model，Endpoint内部的Variant代表不同的版本
-- OP PaddlePaddle用来封装一种数值计算的算子，Paddle Serving用来表示一种基础的业务操作算子，核心接口是inference。OP通过配置其依赖的上游OP，将多个OP串联成一个workflow
-- Channel 一个OP所有请求级中间数据的抽象；OP之间通过Channel进行数据交互
-- Bus 对一个线程中所有channel的管理，以及根据DAG之间的DAG依赖图对OP和Channel两个集合间的访问关系进行调度
-- Stage Workflow按照DAG描述的拓扑图中，属于同一个环节且可并行执行的OP集合
-- Node 由某个Op算子类结合参数配置组成的Op算子实例，也是Workflow中的一个执行单元
-- Workflow 按照DAG描述的拓扑，有序执行每个OP的inference接口
-- DAG/Workflow 由若干个相互依赖的Node组成，每个Node均可通过特定接口获得Request对象，节点Op通过依赖关系获得其前置Op的输出对象，最后一个Node的输出默认就是Response对象
-- Service 对一次pv的请求封装，可配置若干条Workflow，彼此之间复用当前PV的Request对象，然后各自并行/串行执行，最后将Response写入对应的输出slot中；一个Paddle-serving进程可配置多套Service接口，上游根据ServiceName决定当前访问的Service接口。
+- **baidu-rpc**: 百度官方开源RPC框架，支持多种常见通信协议，提供基于protobuf的自定义接口体验
+- **Variant**: Paddle Serving架构对一个最小预测集群的抽象，其特点是内部所有实例（副本）完全同质，逻辑上对应一个model的一个固定版本
+- **Endpoint**: 多个Variant组成一个Endpoint，逻辑上看，Endpoint代表一个model，Endpoint内部的Variant代表不同的版本
+- **OP**: PaddlePaddle用来封装一种数值计算的算子，Paddle Serving用来表示一种基础的业务操作算子，核心接口是inference。OP通过配置其依赖的上游OP，将多个OP串联成一个workflow
+- **Channel**: 一个OP所有请求级中间数据的抽象；OP之间通过Channel进行数据交互
+- **Bus**: 对一个线程中所有channel的管理，以及根据DAG之间的DAG依赖图对OP和Channel两个集合间的访问关系进行调度
+- **Stage**: Workflow按照DAG描述的拓扑图中，属于同一个环节且可并行执行的OP集合
+- **Node**: 由某个OP算子类结合参数配置组成的OP算子实例，也是Workflow中的一个执行单元
+- **Workflow**: 按照DAG描述的拓扑，有序执行每个OP的inference接口
+- **DAG/Workflow**: 由若干个相互依赖的Node组成，每个Node均可通过特定接口获得Request对象，节点OP通过依赖关系获得其前置OP的输出对象，最后一个Node的输出默认就是Response对象
+- **Service**: 对一次PV的请求封装，可配置若干条Workflow，彼此之间复用当前PV的Request对象，然后各自并行/串行执行，最后将Response写入对应的输出slot中；一个Paddle-serving进程可配置多套Service接口，上游根据ServiceName决定当前访问的Service接口。
 
 ## 3. Python Interface设计
 
 ### 3.1 核心目标：
 
-一套Paddle Serving的动态库，支持Paddle保存的通用模型的远程预估服务，通过Python Interface调用PaddleServing底层的各种功能。
+完成一整套Paddle Serving的动态库，支持Paddle保存的通用模型的远程预估服务，通过Python Interface调用PaddleServing底层的各种功能。
 
 ### 3.2 通用模型：
 
@@ -38,10 +38,10 @@ PaddlePaddle是公司开源的机器学习框架，广泛支持各种深度学
 
 ### 3.3 整体设计：
 
-用户通过Python Client启动Client和Server，Python API有检查互联和待访问模型是否匹配的功能
-Python API背后调用的是Paddle Serving实现的client和server对应功能的pybind，互传的信息通过RPC实现
-Client Python API当前有两个简单的功能，load_inference_conf和predict，分别用来执行加载待预测的模型和预测
-Server Python API主要负责加载预估模型，以及生成Paddle Serving需要的各种配置，包括engines，workflow，resource等
+- 用户通过Python Client启动Client和Server，Python API有检查互联和待访问模型是否匹配的功能
+- Python API背后调用的是Paddle Serving实现的client和server对应功能的pybind，互传的信息通过RPC实现
+- Client Python API当前有两个简单的功能，load_inference_conf和predict，分别用来执行加载待预测的模型和预测
+- Server Python API主要负责加载预估模型，以及生成Paddle Serving需要的各种配置，包括engines，workflow，resource等
 
 ### 3.4 Server Inferface
 
diff --git a/doc/DESIGN_DOC.md b/doc/DESIGN_DOC.md
index 2f8a36ea6686b5add2a7e4e407eabfd14167490d..2e7baaeb885c732bb723979e90edae529e7cbc74 100644
--- a/doc/DESIGN_DOC.md
+++ b/doc/DESIGN_DOC.md
@@ -1,5 +1,7 @@
 # Paddle Serving Design Doc
 
+([简体中文](./DESIGN_DOC_CN.md)|English)
+
 ## 1. Design Objectives
 
 - Long Term Vision: Online deployment of deep learning models will be a user-facing application in the future. Any AI developer will face the problem of deploying an online service for his or her trained model.
diff --git a/doc/DESIGN_DOC_CN.md b/doc/DESIGN_DOC_CN.md
index 312379cd7543e70095e5a6d8168aab06b79a0525..2a63d56593dc47a5ca69f9c5c324710ee6dc3fc6 100644
--- a/doc/DESIGN_DOC_CN.md
+++ b/doc/DESIGN_DOC_CN.md
@@ -1,5 +1,7 @@
 # Paddle Serving设计文档
 
+(简体中文|[English](./DESIGN_DOC.md))
+
 ## 1. 整体设计目标
 
 - 长期使命：Paddle Serving是一个PaddlePaddle开源的在线服务框架，长期目标就是围绕着人工智能落地的最后一公里提供越来越专业、可靠、易用的服务。
diff --git a/doc/README.md b/doc/README.md
index 5d529175054fa97c495b2a7581fdcb2fe0e4c394..2d51eba9e2a2902685f9385c83542f32b98e5b4f 100644
--- a/doc/README.md
+++ b/doc/README.md
@@ -109,7 +109,7 @@ for data in test_reader():
 
 [Design Doc](DESIGN.md)
 
-[FAQ](FAQ.md)
+[FAQ](./deprecated/FAQ.md)
 
 ### Senior Developer Guildlines
 
diff --git a/doc/README_CN.md b/doc/README_CN.md
index 82a82622faffe7b3d8ccffea6e2108caa9e5b57c..da5641cad333518ded9fbae4438f05ae20e30ddd 100644
--- a/doc/README_CN.md
+++ b/doc/README_CN.md
@@ -109,7 +109,7 @@ for data in test_reader():
 
 [设计文档](DESIGN_CN.md)
 
-[FAQ](FAQ.md)
+[FAQ](./deprecated/FAQ.md)
 
 ### 资深开发者使用指南
 
diff --git a/doc/RUN_IN_DOCKER_CN.md b/doc/RUN_IN_DOCKER_CN.md
index cb7baddf8e681f1b11369437fc66178ef0ed54f5..3aa10d25246833d969c160253e6cff68f8e7277f 100644
--- a/doc/RUN_IN_DOCKER_CN.md
+++ b/doc/RUN_IN_DOCKER_CN.md
@@ -1,6 +1,6 @@
 # 如何在Docker中运行PaddleServing
 
-(简体中文|[English](./RUN_IN_DOCKER.md))
+(简体中文|[English](RUN_IN_DOCKER.md))
 
 ## 环境要求
 
diff --git a/doc/TRAIN_TO_SERVICE.md b/doc/TRAIN_TO_SERVICE.md
index a5773accae5d135cdfad4c978656a667f442ff8e..4219e66948a9bc3b0ae43e5cda61aad8ae35b3a0 100644
--- a/doc/TRAIN_TO_SERVICE.md
+++ b/doc/TRAIN_TO_SERVICE.md
@@ -1,8 +1,8 @@
-# End-to-end process from training to deployment
+# An End-to-end Tutorial from Training to Inference Service Deployment
 
 ([简体中文](./TRAIN_TO_SERVICE_CN.md)|English)
 
-Paddle Serving is Paddle's high-performance online prediction service framework, which can flexibly support the deployment of most models. In this article, the IMDB review sentiment analysis task is used as an example to show the entire process from model training to deployment of prediction service through 9 steps.
+Paddle Serving is Paddle's high-performance online inference service framework, which can flexibly support the deployment of most models. In this article, the IMDB review sentiment analysis task is used as an example to show the entire process from model training to deployment of inference service through 9 steps.
 
 ## Step1：Prepare for Running Environment
 Paddle Serving can be deployed on Linux environments such as Centos and Ubuntu. On other systems or in environments where you do not want to install the serving module, you can still access the server-side prediction service through the http service.
diff --git a/doc/timeline-example.png b/doc/timeline-example.png
index 96b3f9240e0531d53d47e33680dca13b083b2519..e3a6767411f389845e06f1d6828959fce1b54c28 100644
Binary files a/doc/timeline-example.png and b/doc/timeline-example.png differ
diff --git a/python/examples/bert/benchmark_batch.py b/python/examples/bert/benchmark_batch.py
index 265521d484259b0e6ea2b182dbf61e2a5cf43b8d..9b8e301a62eb0eee161cd701555543d329c6ae83 100644
--- a/python/examples/bert/benchmark_batch.py
+++ b/python/examples/bert/benchmark_batch.py
@@ -57,7 +57,7 @@ def single_func(idx, resource):
                         os.getpid(),
                         int(round(b_start * 1000000)),
                         int(round(b_end * 1000000))))
-                result = client.predict(feed_batch=feed_batch, fetch=fetch)
+                result = client.predict(feed=feed_batch, fetch=fetch)
             else:
                 print("unsupport batch size {}".format(args.batch_size))
 
diff --git a/python/examples/criteo_ctr/benchmark_batch.py b/python/examples/criteo_ctr/benchmark_batch.py
index ab67507355d0eba187d47ec9577eb5a3eda5dc46..1e4348c99dc0d960b1818ea6f0eb1ae2f5bd2ccb 100644
--- a/python/examples/criteo_ctr/benchmark_batch.py
+++ b/python/examples/criteo_ctr/benchmark_batch.py
@@ -55,7 +55,7 @@ def single_func(idx, resource):
                     for i in range(1, 27):
                         feed_dict["sparse_{}".format(i - 1)] = data[0][i]
                     feed_batch.append(feed_dict)
-                result = client.predict(feed_batch=feed_batch, fetch=fetch)
+                result = client.predict(feed=feed_batch, fetch=fetch)
             else:
                 print("unsupport batch size {}".format(args.batch_size))
 
diff --git a/python/examples/criteo_ctr/test_client.py b/python/examples/criteo_ctr/test_client.py
index 9b3681c4117d123abd490668f44e43ab9f1e855f..d53c5541c36f4eb52618e3498eda571dd2bcab53 100644
--- a/python/examples/criteo_ctr/test_client.py
+++ b/python/examples/criteo_ctr/test_client.py
@@ -21,6 +21,10 @@ import time
 import criteo_reader as criteo
 from paddle_serving_client.metric import auc
 
+import sys
+
+py_version = sys.version_info[0]
+
 client = Client()
 client.load_client_config(sys.argv[1])
 client.connect(["127.0.0.1:9292"])
@@ -39,7 +43,10 @@ label_list = []
 prob_list = []
 start = time.time()
 for ei in range(1000):
-    data = reader().next()
+    if py_version == 2:
+        data = reader().next()
+    else:
+        data = reader().__next__()
     feed_dict = {}
     for i in range(1, 27):
         feed_dict["sparse_{}".format(i - 1)] = data[0][i]
diff --git a/python/examples/criteo_ctr_with_cube/benchmark_batch.py b/python/examples/criteo_ctr_with_cube/benchmark_batch.py
index b4b15892375e830486afa320151fac619aab2ba7..df5c6b90badb36fd7e349555973ccbd7ea0a8b70 100755
--- a/python/examples/criteo_ctr_with_cube/benchmark_batch.py
+++ b/python/examples/criteo_ctr_with_cube/benchmark_batch.py
@@ -56,8 +56,7 @@ def single_func(idx, resource):
                         feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][
                             i]
                     feed_batch.append(feed_dict)
-                result = client.batch_predict(
-                    feed_batch=feed_batch, fetch=fetch)
+                result = client.predict(feed=feed_batch, fetch=fetch)
             else:
                 print("unsupport batch size {}".format(args.batch_size))
 
diff --git a/python/examples/criteo_ctr_with_cube/test_server_gpu.py b/python/examples/criteo_ctr_with_cube/test_server_gpu.py
new file mode 100755
index 0000000000000000000000000000000000000000..382be99bd37a52630d78bb84ef7e53047b018c95
--- /dev/null
+++ b/python/examples/criteo_ctr_with_cube/test_server_gpu.py
@@ -0,0 +1,37 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+
+import os
+import sys
+from paddle_serving_server_gpu import OpMaker
+from paddle_serving_server_gpu import OpSeqMaker
+from paddle_serving_server_gpu import Server
+
+op_maker = OpMaker()
+read_op = op_maker.create('general_reader')
+general_dist_kv_infer_op = op_maker.create('general_dist_kv_infer')
+response_op = op_maker.create('general_response')
+
+op_seq_maker = OpSeqMaker()
+op_seq_maker.add_op(read_op)
+op_seq_maker.add_op(general_dist_kv_infer_op)
+op_seq_maker.add_op(response_op)
+
+server = Server()
+server.set_op_sequence(op_seq_maker.get_op_sequence())
+server.set_num_threads(4)
+server.load_model_config(sys.argv[1])
+server.prepare_server(workdir="work_dir1", port=9292, device="cpu")
+server.run_server()
diff --git a/python/examples/imagenet/benchmark_batch.py b/python/examples/imagenet/benchmark_batch.py
index 915544d4f6f9d4636b14f4be92ad75ba08013389..e531425770cbf9102b7ebd2f5b082c5c4aa14e71 100644
--- a/python/examples/imagenet/benchmark_batch.py
+++ b/python/examples/imagenet/benchmark_batch.py
@@ -50,7 +50,7 @@ def single_func(idx, resource):
                     img = reader.process_image(img_list[i])
                     img = img.reshape(-1)
                     feed_batch.append({"image": img})
-                result = client.predict(feed_batch=feed_batch, fetch=fetch)
+                result = client.predict(feed=feed_batch, fetch=fetch)
             else:
                 print("unsupport batch size {}".format(args.batch_size))
 
diff --git a/python/examples/imagenet/image_http_client.py b/python/examples/imagenet/image_http_client.py
index c567b9003bfe87f9ddd20c3553b9e2d400bce4b9..2a2e9ea20d7e428cfe42393e2fee60035c33283d 100644
--- a/python/examples/imagenet/image_http_client.py
+++ b/python/examples/imagenet/image_http_client.py
@@ -17,10 +17,16 @@ import base64
 import json
 import time
 import os
+import sys
+
+py_version = sys.version_info[0]
 
 
 def predict(image_path, server):
-    image = base64.b64encode(open(image_path).read())
+    if py_version == 2:
+        image = base64.b64encode(open(image_path).read())
+    else:
+        image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
     req = json.dumps({"image": image, "fetch": ["score"]})
     r = requests.post(
         server, data=req, headers={"Content-Type": "application/json"})
@@ -28,18 +34,8 @@ def predict(image_path, server):
     return r
 
 
-def batch_predict(image_path, server):
-    image = base64.b64encode(open(image_path).read())
-    req = json.dumps({"image": [image, image], "fetch": ["score"]})
-    r = requests.post(
-        server, data=req, headers={"Content-Type": "application/json"})
-    print(r.json()["result"][1]["score"][0])
-    return r
-
-
 if __name__ == "__main__":
     server = "http://127.0.0.1:9393/image/prediction"
-    #image_path = "./data/n01440764_10026.JPEG"
     image_list = os.listdir("./image_data/n01440764/")
     start = time.time()
     for img in image_list:
diff --git a/python/examples/imagenet/image_rpc_client.py b/python/examples/imagenet/image_rpc_client.py
index 2367f509cece4d37d61d4a2ff2c2bfb831112e5a..76f3a043474bf75e1e96a44f18ac7dfe3da11f78 100644
--- a/python/examples/imagenet/image_rpc_client.py
+++ b/python/examples/imagenet/image_rpc_client.py
@@ -19,16 +19,15 @@ import time
 
 client = Client()
 client.load_client_config(sys.argv[1])
-client.connect(["127.0.0.1:9295"])
+client.connect(["127.0.0.1:9393"])
 reader = ImageReader()
 
 start = time.time()
 for i in range(1000):
-    with open("./data/n01440764_10026.JPEG") as f:
+    with open("./data/n01440764_10026.JPEG", "rb") as f:
         img = f.read()
     img = reader.process_image(img).reshape(-1)
     fetch_map = client.predict(feed={"image": img}, fetch=["score"])
-    print(i)
 end = time.time()
 print(end - start)
 
diff --git a/python/examples/imdb/benchmark_batch.py b/python/examples/imdb/benchmark_batch.py
index 3ac52ec5472ff97f5d273dc230494223f3a71907..d36704a7631e963fd51220aa3c3d9a350515ebfd 100644
--- a/python/examples/imdb/benchmark_batch.py
+++ b/python/examples/imdb/benchmark_batch.py
@@ -42,8 +42,7 @@ def single_func(idx, resource):
                 for bi in range(args.batch_size):
                     word_ids, label = imdb_dataset.get_words_and_label(line)
                     feed_batch.append({"words": word_ids})
-                result = client.predict(
-                    feed_batch=feed_batch, fetch=["prediction"])
+                result = client.predict(feed=feed_batch, fetch=["prediction"])
             else:
                 print("unsupport batch size {}".format(args.batch_size))
 
diff --git a/python/examples/imdb/imdb_reader.py b/python/examples/imdb/imdb_reader.py
index 38a46c5cf3cc3d7216c47c290876951e99253115..a4ef3e163a50b0dc244ac2653df1e38d7f91699b 100644
--- a/python/examples/imdb/imdb_reader.py
+++ b/python/examples/imdb/imdb_reader.py
@@ -19,15 +19,23 @@ import paddle
 import re
 import paddle.fluid.incubate.data_generator as dg
 
+py_version = sys.version_info[0]
+
 
 class IMDBDataset(dg.MultiSlotDataGenerator):
     def load_resource(self, dictfile):
         self._vocab = {}
         wid = 0
-        with open(dictfile) as f:
-            for line in f:
-                self._vocab[line.strip()] = wid
-                wid += 1
+        if py_version == 2:
+            with open(dictfile) as f:
+                for line in f:
+                    self._vocab[line.strip()] = wid
+                    wid += 1
+        else:
+            with open(dictfile, encoding="utf-8") as f:
+                for line in f:
+                    self._vocab[line.strip()] = wid
+                    wid += 1
         self._unk_id = len(self._vocab)
         self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
         self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
diff --git a/python/examples/util/README.md b/python/examples/util/README.md
index 13f9b4b4db671acdf5d7dc8bcb1e21a6201c5945..3c533a169924b2bc718c225bdb406e791bb74df9 100644
--- a/python/examples/util/README.md
+++ b/python/examples/util/README.md
@@ -1,6 +1,6 @@
 ## Timeline工具使用
 
-serving框架中内置了预测服务中各阶段时间打点的功能，通过环境变量来控制是否开启。
+serving框架中内置了预测服务中各阶段时间打点的功能，在client端通过环境变量来控制是否开启，开启后会将打点信息输出到屏幕。
 ```
 export FLAGS_profile_client=1 #开启client端各阶段时间打点
 export FLAGS_profile_server=1 #开启server端各阶段时间打点
@@ -13,6 +13,8 @@ export FLAGS_profile_server=1 #开启server端各阶段时间打点
 ```
 python show_profile.py profile ${thread_num}
 ```
+这里thread_num参数为client运行时的进程数，脚本将按照这个参数来计算各阶段的平均耗时。
+
 脚本将计算各阶段的耗时，并除以线程数做平均，打印到标准输出。
 
 ```
@@ -22,6 +24,6 @@ python timeline_trace.py profile trace
 
 具体操作：打开chrome浏览器，在地址栏输入chrome://tracing/，跳转至tracing页面，点击load按钮，打开保存的trace文件，即可将预测服务的各阶段时间信息可视化。
 
-效果如下图，图中展示了client端启动4进程时的bert示例的各阶段timeline，其中bert_pre代表client端的数据预处理阶段，client_infer代表client完成预测请求的发送和接收结果的阶段，每个进进程的第二行展示的是server各个op的timeline。
+效果如下图，图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务，server端开启4卡预测，client端启动4进程，batch size为1时的各阶段timeline，其中bert_pre代表client端的数据预处理阶段，client_infer代表client完成预测请求的发送和接收结果的阶段，图中的process代表的是client的进程号，每个进进程的第二行展示的是server各个op的timeline。
 
 ![timeline](../../../doc/timeline-example.png)
diff --git a/python/paddle_serving_client/metric/__init__.py b/python/paddle_serving_client/metric/__init__.py
index 4f173887755e5aef5c6917fa604012cf0c1d86f0..245e740dae2e713fde3237c26d6815b4528f90d7 100644
--- a/python/paddle_serving_client/metric/__init__.py
+++ b/python/paddle_serving_client/metric/__init__.py
@@ -11,5 +11,5 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-from auc import auc
-from acc import acc
+from .auc import auc
+from .acc import acc
diff --git a/python/paddle_serving_server_gpu/__init__.py b/python/paddle_serving_server_gpu/__init__.py
index 02b55801c35fb5d1ed7e35c249ac07e4d3eb45ab..2fd35c6d66e4bf282224a8775f1a6bf0d1c6a8c5 100644
--- a/python/paddle_serving_server_gpu/__init__.py
+++ b/python/paddle_serving_server_gpu/__init__.py
@@ -55,6 +55,7 @@ class OpMaker(object):
             "general_text_reader": "GeneralTextReaderOp",
             "general_text_response": "GeneralTextResponseOp",
             "general_single_kv": "GeneralSingleKVOp",
+            "general_dist_kv_infer": "GeneralDistKVInferOp",
             "general_dist_kv": "GeneralDistKVOp"
         }
 
@@ -104,6 +105,7 @@ class Server(object):
         self.infer_service_fn = "infer_service.prototxt"
         self.model_toolkit_fn = "model_toolkit.prototxt"
         self.general_model_config_fn = "general_model.prototxt"
+        self.cube_config_fn = "cube.conf"
         self.workdir = ""
         self.max_concurrency = 0
         self.num_threads = 4
@@ -184,6 +186,11 @@ class Server(object):
                       "w") as fout:
                 fout.write(str(self.model_conf))
             self.resource_conf = server_sdk.ResourceConf()
+            for workflow in self.workflow_conf.workflows:
+                for node in workflow.nodes:
+                    if "dist_kv" in node.name:
+                        self.resource_conf.cube_config_path = workdir
+                        self.resource_conf.cube_config_file = self.cube_config_fn
             self.resource_conf.model_toolkit_path = workdir
             self.resource_conf.model_toolkit_file = self.model_toolkit_fn
             self.resource_conf.general_model_path = workdir
diff --git a/python/paddle_serving_server_gpu/serve.py b/python/paddle_serving_server_gpu/serve.py
index 9c8d10e4b36a7830aed25996a309cb4163ca126c..d09efbfc8e1512ecb75b063ad760ce66e1a3159e 100644
--- a/python/paddle_serving_server_gpu/serve.py
+++ b/python/paddle_serving_server_gpu/serve.py
@@ -64,14 +64,22 @@ def start_gpu_card_model(index, gpuid, args):  # pylint: disable=doc-string-miss
 def start_multi_card(args):  # pylint: disable=doc-string-missing
     gpus = ""
     if args.gpu_ids == "":
-        if "CUDA_VISIBLE_DEVICES" in os.environ:
-            gpus = os.environ["CUDA_VISIBLE_DEVICES"]
-        else:
-            gpus = []
+        gpus = []
     else:
         gpus = args.gpu_ids.split(",")
+        if "CUDA_VISIBLE_DEVICES" in os.environ:
+            env_gpus = os.environ["CUDA_VISIBLE_DEVICES"].split(",")
+            for ids in gpus:
+                if int(ids) >= len(env_gpus):
+                    print(
+                        " Max index of gpu_ids out of range, the number of CUDA_VISIBLE_DEVICES is {}.".
+                        format(len(env_gpus)))
+                    exit(-1)
+        else:
+            env_gpus = []
     if len(gpus) <= 0:
-        start_gpu_card_model(-1, args)
+        print("gpu_ids not set, going to run cpu service.")
+        start_gpu_card_model(-1, -1, args)
     else:
         gpu_processes = []
         for i, gpu_id in enumerate(gpus):
diff --git a/python/setup.py.client.in b/python/setup.py.client.in
index 86b3c331babccd06bdc6e206866a1c43da7b27d7..381fb2a8853cc4d5494e3eac520ab183db6eab09 100644
--- a/python/setup.py.client.in
+++ b/python/setup.py.client.in
@@ -18,6 +18,7 @@ from __future__ import print_function
 
 import platform
 import os
+import sys
 
 from setuptools import setup, Distribution, Extension
 from setuptools import find_packages
@@ -25,6 +26,7 @@ from setuptools import setup
 from paddle_serving_client.version import serving_client_version
 from pkg_resources import DistributionNotFound, get_distribution
 
+py_version = sys.version_info[0]
         
 def python_version():
     return [int(v) for v in platform.python_version().split(".")]
@@ -37,8 +39,9 @@ def find_package(pkgname):
         return False
 
 def copy_lib():
+    lib_list = ['libpython2.7.so.1.0', 'libssl.so.10', 'libcrypto.so.10'] if py_version == 2 else ['libpython3.6m.so.1.0', 'libssl.so.10', 'libcrypto.so.10']
     os.popen('mkdir -p paddle_serving_client/lib')
-    for lib in ['libpython2.7.so.1.0', 'libssl.so.10', 'libcrypto.so.10']:
+    for lib in lib_list:
         r = os.popen('whereis {}'.format(lib))
         text = r.read()
         os.popen('cp {} ./paddle_serving_client/lib'.format(text.strip().split(' ')[1]))
diff --git a/tools/serving_build.sh b/tools/serving_build.sh
index 66dcf5ff8d3845eac8bb8f2980407bf67a75a36c..381a366c15ec826debb8a801221ed58a2925bc53 100644
--- a/tools/serving_build.sh
+++ b/tools/serving_build.sh
@@ -211,12 +211,11 @@ function python_run_criteo_ctr_with_cube() {
             cp ../../../build-server-$TYPE/output/bin/cube* ./cube/ 
             mkdir -p $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.1.3/
             yes | cp ../../../build-server-$TYPE/output/demo/serving/bin/serving $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.1.3/
-
             sh cube_prepare.sh &
             check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"    
             python test_server.py ctr_serving_model_kv &
             check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score"
-            tail -n 2 score
+            tail -n 2 score | awk 'NR==1'
             AUC=$(tail -n 2  score | awk 'NR==1')
             VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
             RES=$( echo "$AUC>$VAR2" | bc )
@@ -229,6 +228,30 @@ function python_run_criteo_ctr_with_cube() {
             ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
             ;;
         GPU)
+            check_cmd "wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz"
+            check_cmd "tar xf ctr_cube_unittest.tar.gz"
+            check_cmd "mv models/ctr_client_conf ./"
+            check_cmd "mv models/ctr_serving_model_kv ./"
+            check_cmd "mv models/data ./cube/"
+            check_cmd "mv models/ut_data ./"
+            cp ../../../build-server-$TYPE/output/bin/cube* ./cube/
+            mkdir -p $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server_gpu/serving-gpu-0.1.3/
+            yes | cp ../../../build-server-$TYPE/output/demo/serving/bin/serving $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server_gpu/serving-gpu-0.1.3/
+            sh cube_prepare.sh &
+            check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
+            python test_server_gpu.py ctr_serving_model_kv &
+            check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score"
+            tail -n 2 score | awk 'NR==1'
+            AUC=$(tail -n 2  score | awk 'NR==1')
+            VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
+            RES=$( echo "$AUC>$VAR2" | bc )
+            if [[ $RES -eq 0 ]]; then
+                echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.70"
+                exit 1
+            fi
+            echo "criteo_ctr_with_cube inference auc test success"
+            ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill
+            ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
             ;;
         *)
             echo "error type"