merge paddle_serving_client/__init__.py file

merge paddle_serving_client/init.py file
8dea15dc · barrierye · 5b8acfa8 · 21b998fa · 8dea15dc · 8dea15dc
20 changed file
--- a/README.md
+++ b/README.md
@@ -162,8 +162,8 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv

 ### New to Paddle Serving
 - [How to save a servable model?](doc/SAVE.md)
- [An end-to-end tutorial from training to serving](doc/END_TO_END.md)
- [Write Bert-as-Service in 10 minutes](doc/Bert_10_mins.md)
+- [An end-to-end tutorial from training to serving(Chinese)](doc/TRAIN_TO_SERVICE.md)
+- [Write Bert-as-Service in 10 minutes(Chinese)](doc/BERT_10_MINS.md)

 ### Developers
 - [How to config Serving native operators on server side?](doc/SERVER_DAG.md)

--- a/doc/4v100_bert_as_service_benchmark.png
+++ b/doc/4v100_bert_as_service_benchmark.png
--- a/doc/BERT_10_MINS.md
+++ b/doc/BERT_10_MINS.md
+## 十分钟构建Bert-As-Service
+
+Bert-As-Service的目标是给定一个句子，服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型，在多种公开的NLP任务上都取得了很好的效果，使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标，我们通过四个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。
+
+#### Step1：保存可服务模型
+
+Paddle Serving支持基于Paddle进行训练的各种模型，并通过指定模型的输入和输出变量来保存可服务模型。为了方便，我们可以从paddlehub加载一个已经训练好的bert中文模型，并利用两行代码保存一个可部署的服务，服务端和客户端的配置分别放在`bert_seq20_model`和`bert_seq20_client`文件夹。
+
+``` python
+import paddlehub as hub
+model_name = "bert_chinese_L-12_H-768_A-12"
+module = hub.Module(model_name)
+inputs, outputs, program = module.context(
+    trainable=True, max_seq_len=20)
+feed_keys = ["input_ids", "position_ids", "segment_ids",
+             "input_mask", "pooled_output", "sequence_output"]
+fetch_keys = ["pooled_output", "sequence_output"]
+feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
+fetch_dict = dict(zip(fetch_keys, [outputs[x]] for x in fetch_keys))
+
+import paddle_serving_client.io as serving_io
+serving_io.save_model("bert_seq20_model", "bert_seq20_client",
+                      feed_dict, fetch_dict, program)
+```
+
+#### Step2：启动服务
+
+``` shell
+python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 --port 9292 --gpu_ids 0
+```
+
+| 参数    | 含义                       |
+| ------- | -------------------------- |
+| model   | server端配置与模型文件路径 |
+| thread  | server端线程数             |
+| port    | server端端口号             |
+| gpu_ids | GPU索引号                  |
+
+#### Step3：客户端数据预处理逻辑
+
+Paddle Serving内建了很多经典典型对应的数据预处理逻辑，对于中文Bert语义表示的计算，我们采用paddle_serving_app下的ChineseBertReader类进行数据预处理，开发者可以很容易获得一个原始的中文句子对应的多个模型输入字段。
+
+安装paddle_serving_app
+
+```shell
+pip install paddle_serving_app
+```
+
+#### Step4：客户端访问
+
+客户端脚本 bert_client.py内容如下
+
+``` python
+import os
+import sys
+from paddle_serving_client import Client
+from paddle_serving_app import ChineseBertReader
+
+reader = ChineseBertReader()
+fetch = ["pooled_output"]
+endpoint_list = ["127.0.0.1:9292"]
+client = Client()
+client.load_client_config("bert_seq20_client/serving_client_conf.prototxt")
+client.connect(endpoint_list)
+
+for line in sys.stdin:
+    feed_dict = reader.process(line)
+    result = client.predict(feed=feed_dict, fetch=fetch)
+```
+
+执行
+
+```shell
+cat data.txt | python bert_client.py
+```
+
+从data.txt文件中读取样例，并将结果打印到标准输出。
+
+### 性能测试
+
+我们基于V100对基于Padde Serving研发的Bert-As-Service的性能进行测试并与基于Tensorflow实现的Bert-As-Service进行对比，从用户配置的角度，采用相同的batch size和并发数进行压力测试，得到4块V100下的整体吞吐性能数据如下。
+
+![4v100_bert_as_service_benchmark](4v100_bert_as_service_benchmark.png)
--- a/doc/COMPILE.md
+++ b/doc/COMPILE.md
 # 如何编译PaddleServing

-### 编译环境设置
+## 编译环境设置
+
 - os: CentOS 6u3
 - gcc: 4.8.2及以上
 - go: 1.9.2及以上
@@ -8,42 +9,116 @@
 - cmake：3.2.2及以上
 - python：2.7.2及以上

-### 获取代码
+推荐使用Docker准备Paddle Serving编译环境：[CPU Dockerfile.devel](../tools/Dockerfile.devel)，[GPU Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
+
+## 获取代码

 ``` python
 git clone https://github.com/PaddlePaddle/Serving
 cd Serving && git submodule update --init --recursive
 ```

-### 编译Server部分
+## PYTHONROOT设置

-#### PYTHONROOT设置
-``` shell
+```shell
 # 例如python的路径为/usr/bin/python，可以设置PYTHONROOT
 export PYTHONROOT=/usr/
 ```

-#### 集成CPU版本Paddle Inference Library
+## 编译Server部分
+
+### 集成CPU版本Paddle Inference Library
+
 ``` shell
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT_ONLY=OFF ..
+mkdir build && cd build
+cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
 make -j10
 ```

-#### 集成GPU版本Paddle Inference Library
+可以执行`make install`把目标产出放在`./output`目录下，cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
+
+### 集成GPU版本Paddle Inference Library
+
 ``` shell
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT_ONLY=OFF -DWITH_GPU=ON ..
+mkdir build && cd build
+cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON ..
 make -j10
 ```

-### 编译Client部分
+执行`make install`可以把目标产出放在`./output`目录下。
+
+## 编译Client部分

 ``` shell
-cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT_ONLY=ON ..
+mkdir build && cd build
+cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON ..
 make -j10
 ```

-### 安装wheel包
-无论是client端还是server端，编译完成后，安装python/dist/下的whl包即可
+执行`make install`可以把目标产出放在`./output`目录下。
+
+## 编译App部分
+
+```bash
+mkdir build && cd build
+cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCMAKE_INSTALL_PREFIX=./output -DAPP=ON ..
+make
+```
+
+## 安装wheel包
+
+无论是Client端，Server端还是App部分，编译完成后，安装`python/dist/`下的whl包即可。
+
+## 注意事项
+
+运行python端Server时，会检查`SERVING_BIN`环境变量，如果想使用自己编译的二进制文件，请将设置该环境变量为对应二进制文件的路径，通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`。

-### 注意事项
-运行python端server时，会检查`SERVING_BIN`环境变量，如果想使用自己编译的二进制文件，请将设置该环境变量为对应二进制文件的路径，通常是`export SERVING_BIN=${BUILD_PATH}/core/general-server/serving`。
+## CMake选项说明
+
+|     编译选项     |                    说明                    | 默认 |
+| :--------------: | :----------------------------------------: | :--: |
+|     WITH_AVX     | Compile Paddle Serving with AVX intrinsics | OFF  |
+|     WITH_MKL     |  Compile Paddle Serving with MKL support   | OFF  |
+|     WITH_GPU     |   Compile Paddle Serving with NVIDIA GPU   | OFF  |
+|    CUDNN_ROOT    |    Define CuDNN library and header path    |      |
+|      CLIENT      |       Compile Paddle Serving Client        | OFF  |
+|      SERVER      |       Compile Paddle Serving Server        | OFF  |
+|       APP        |     Compile Paddle Serving App package     | OFF  |
+| WITH_ELASTIC_CTR |        Compile ELASITC-CTR solution        | OFF  |
+|       PACK       |              Compile for whl               | OFF  |
+
+### WITH_GPU选项
+
+Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选项用于检测系统上CUDA/CUDNN等基础库，如检测到合适版本，在编译PaddlePaddle时就会编译出GPU版本的OP Kernel。
+
+在裸机上编译Paddle Serving GPU版本，需要安装这些基础库：
+
+- CUDA
+- CuDNN
+- NCCL2
+
+这里要注意的是：
+
+1. 编译Serving所在的系统上所安装的CUDA/CUDNN等基础库版本，需要兼容实际的GPU设备。例如，Tesla V100卡至少要CUDA 9.0。如果编译时所用CUDA等基础库版本过低，由于生成的GPU代码和实际硬件设备不兼容，会导致Serving进程无法启动，或出现coredump等严重问题。
+2. 运行Paddle Serving的系统上安装与实际GPU设备兼容的CUDA driver，并安装与编译期所用的CUDA/CuDNN等版本兼容的基础库。如运行Paddle Serving的系统上安装的CUDA/CuDNN的版本低于编译时所用版本，可能会导致奇怪的cuda函数调用失败等问题。
+
+以下是PaddlePaddle发布版本所使用的基础库版本匹配关系，供参考：
+
+|        |  CUDA   |          CuDNN           | NCCL2  |
+| :----: | :-----: | :----------------------: | :----: |
+| CUDA 8 | 8.0.61  | CuDNN 7.1.2 for CUDA 8.0 | 2.1.4  |
+| CUDA 9 | 9.0.176 | CuDNN 7.3.1 for CUDA 9.0 | 2.2.12 |
+
+### 如何让Paddle Serving编译系统探测到CuDNN库
+
+从NVIDIA developer官网下载对应版本CuDNN并在本地解压后，在cmake编译命令中增加`-DCUDNN_ROOT`参数，指定CuDNN库所在路径。
+
+### 如何让Paddle Serving编译系统探测到nccl库
+
+从NVIDIA developer官网下载对应版本nccl2库并解压后，增加如下环境变量 (以nccl2.1.4为例)：
+
+```shell
+export C_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$C_INCLUDE_PATH
+export CPLUS_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$CPLUS_INCLUDE_PATH
+export LD_LIBRARY_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/lib/:$LD_LIBRARY_PATH
+```
--- a/doc/CUBE_LOCAL.md
+++ b/doc/CUBE_LOCAL.md
+# Cube: Sparse Parameter Indexing Service (Local Mode)
+
+([简体中文](./CUBE_LOCAL_CN.md)|English)
+
+## Overview
+
+There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services.
+
+The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos. If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing).
+
+
+## Example
+in directory python/example/criteo_ctr_with_cube, run
+
+```
+python local_train.py # train model
+cp ../../../build_server/core/predictor/seq_generator seq_generator # copy Sequence File generator
+cp ../../../build_server/output/bin/cube* ./cube/ # copy Cube tool kits
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # copy Cube Client
+cube_prepare.sh & # start deliver script
+```
+you will convert the Sparse Parameters from trained model to the Cube Server.
+
+## Components of Cube
+
+### cube-builder
+
+cube-builder is a tool for generating model shard files and version management. As the cube is used for distributed sparse parameter services, for each node in the distribution, different shards need to be loaded. However, the generated sparse parameter file is often a large file, and it needs to be divided into different shards by a hash function. At the same time, industrial-level scenarios need to support regular model distribution and streaming training, so it is very important for the version management of the model. This is also the missing part when training and saving the model. Therefore, while the cube-builder generates the shards, You can also manually specify version information.
+
+### cube-server
+
+The cube-server is based on the sparse parameter indexing, providing the sparse parameter service. It provides high-performance distributed query service through brpc, and makes remote calls through RestAPI.
+
+### cube-cli
+
+cube-cli is the client of cube-server. This part has been integrated into paddle serving. When we prepare the cube.conf configuration file and specify the kv_infer related op in the code of paddle serving server, cube-cli will Ready on the serving side.
+
+
+## Serving the Model Step by Step
+### precondition
+
+we need a trained model, and copy the tool kits from build_server folder.
+```
+python local_train.py # train model
+cp ../../../build_server/core/predictor/seq_generator seq_generator  # copy Sequence File generator
+cp ../../../build_server/output/bin/cube* ./cube/ # copy Cube tool kits
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # copy Cube Client
+```
+
+### Generate Sequence File from Sparse Parameter
+
+In order to get the model parameters from the training end to the prediction end, we need to convert the trained model from the Paddle model save format to the Sequence File format.
+
+**why is Sequence File?**
+Sequence File is a common format for the Hadoop File System. It was mentioned at the beginning of the article that distributed cubes can provide support for ultra-large-scale sparse parameter services, and large-scale sparse parameters are stored in distributed file systems in actual production environments. Hadoop File System is one of the most stable distributed open source. So the Sequence File format became the file format for the Cube loading model.
+
+```
+mkdir -p cube_model
+mkdir -p cube/data
+./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature
+```
+
+### Generating Shards
+
+For the local version of Cube, the number of shard is 1. run
+
+```
+cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+```
+
+
+### Deliver to Cube-Server
+
+The process of the cube local version is very simple, you only need to store the index files. in ./data folder where the cube binary program is located.
+```
+mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
+cd cube && ./cube &
+```
+
+
+### Cube-Client Verification
+
+this step is not necessary, but it can help you to verify if the model is ready.
+```
+./cube-cli -dict_name=test_dict -keys  keys -conf ./cube/cube.conf
+```
+if you succeed, you will see this
+<p align="center">
+    <img src="cube-cli.png" width="700">
+</p>
+
+If you see that each key has a corresponding value output, it means that the delivery was successful. This file can also be used by Serving to perform cube query in general kv infer op in Serving.
+
+
+## Appendix: Configuration
+the config file is cube.config located in python/examples/criteo_ctr_with_cube/cube/conf, this file is used by cube-cli.the Cube Local Mode users do not need to understand that just use it, it would be quite important in Cube Distributed Mode.
+
+```
+[{
+    "dict_name": "test_dict",  //table name
+    "shard": 1,  //shard num
+    "dup": 1,  //duplicates
+    "timeout": 200,
+    "retry": 3,
+    "backup_request": 100,
+    "type": "ipport_list",
+    "load_balancer": "rr",
+    "nodes": [{
+        "ipport_list": "list://127.0.0.1:8027" //IP list
+    }]
+}]
+```
--- a/doc/CUBE_LOCAL_CN.md
+++ b/doc/CUBE_LOCAL_CN.md
+# 稀疏参数索引服务Cube单机版使用指南
+
+(简体中文|[English](./CUBE_LOCAL.md))
+
+## 引言
+
+在python/examples下有两个关于CTR的示例，他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型，包括稀疏参数。后者是将稀疏参数裁剪出来，保存成两个部分，一个是稀疏参数，另一个是稠密参数。由于在工业级的场景中，稀疏参数的规模非常大，达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的，因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube，提供分布式的稀疏参数服务。
+
+单机版Cube是分布式Cube的弱化版本，旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求，请在读完此文档之后，继续阅读  [稀疏参数索引服务Cube使用指南](分布式Cube)（正在建设中）。
+
+
+## 示例
+在python/example/criteo_ctr_with_cube下执行
+```
+python local_train.py # 训练模型
+cp ../../../build_server/core/predictor/seq_generator seq_generator #复制Sequence File模型生成工具
+cp ../../../build_server/output/bin/cube* ./cube/ #复制Cube应用程序
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # 复制Cube-Cli
+cube_prepare.sh & #启动配送脚本
+```
+此示例是从模型训练到配送给Cube的全套流程，接下来会一一介绍。
+
+## 单机版Cube组件介绍
+
+
+### cube-builder
+
+cube-builder是把模型生成分片文件和版本管理的工具。由于cube是用于分布式的稀疏参数服务，对于分布式当中的每一个节点，需要加载不同的分片，然而生成的稀疏参数文件往往一个大文件，就需要用哈希函数将其分割为不同的分片。与此同时，工业级的场景需要支持定期模型的配送和流式训练，因此对于模型的版本管理十分重要，这也是在训练保存模型时缺失的部分，因此cube-builder在生成分片的同时，也可以人为指定增加版本信息。
+
+### cube-server
+
+cube-server基于Cube的KV能力，对外提供稀疏参数服务，它通过brpc提供高性能分布式查询服务，通过RestAPI来进行远端调用。
+
+### cube-cli
+
+cube-cli是cube-server的客户端，这部分已经被整合到paddle serving当中，当我们准备好cube.conf配置文件并在paddle serving server的代码中指定kv_infer相关的op时，cube-cli就会在serving端准备就绪。
+
+## 模型配送步骤
+### 前序步骤
+
+需要训练出模型文件，并复制相关build_server目录下的应用程序
+```
+python local_train.py
+cp ../../../build_server/core/predictor/seq_generator seq_generator #复制Sequence File模型生成工具
+cp ../../../build_server/output/bin/cube* ./cube/ #复制Cube应用程序
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # 复制Cube-Cli
+```
+
+### 模型文件生成Sequence File
+
+为了让模型参数从训练端配送到预测端，我们需要把训练好的模型从Paddle 模型保存格式转换成Sequence File格式。
+
+**为什么是 Sequence File?**
+Sequence File是Hadoop File System的通用格式。在文章的开头提到了分布式Cube可以为超大规模稀疏参数服务提供支持，而大规模的稀疏参数在实际生产环境中保存在分布式文件系统当中，Hadoop File System是业界开源的最稳定的分布式文件系统之一，因此Sequence File格式成为了Cube加载模型的文件格式。
+
+```
+mkdir -p cube_model
+mkdir -p cube/data
+./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature
+```
+
+### 生成分片文件
+
+在单机版的环境下，分片数为1。执行
+
+```
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+
+```
+
+### 配送给Cube-Server
+
+
+单机版本的配送过程非常简单，只需要在cube二进制程序所在目录下的data文件夹存放index.前缀的文件即可。
+
+```
+mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
+cd cube && ./cube &
+```
+
+### Cube-Client 验证配送是否成功
+此步非必须，用于测试配送是否成功
+```
+cd cube
+./cube-cli -dict_name=test_dict -keys  keys -conf ./cube/cube.conf
+```
+
+如果查看到每个key都有对应的value输出，就说明配送成功。此文件也可以被Serving使用，用作Serving中 general kv infer op中进行cube查询。
+
+如果执行成功，会看到如下结果
+<p align="center">
+    <img src="cube-cli.png" width="700">
+</p>
+
+
+## 注： 配置文件
+以python/examples/criteo_ctr_with_cube/cube/conf下的cube.conf示例，此文件被上述的cube-cli所使用，单机版用户可以直接使用不用关注此部分，它在分布式部署中更为重要。
+
+```
+[{
+    "dict_name": "test_dict",  //表名
+    "shard": 1,  //分片数
+    "dup": 1,  //副本数
+    "timeout": 200,
+    "retry": 3,
+    "backup_request": 100,
+    "type": "ipport_list",
+    "load_balancer": "rr",
+    "nodes": [{
+        "ipport_list": "list://127.0.0.1:8027" //IP列表
+    }]
+}]
+```
--- a/doc/INSTALL.md
+++ b/doc/INSTALL.md
-# Install
-
-## 系统需求
-
-OS: Linux
-
-CMake: (验证过的版本：3.2/3.5.2)
-
-C++编译器 (验证过的版本：GCC 4.8.2/5.4.0)
-
-python (验证过的版本：2.7)
-
-Go编译器 (>=1.8 验证过的版本：1.9.2/1.12.0)
-
-openssl & openssl-devel
-
-curl-devel
-
-bzip2-devel
-
-## 编译
-
-推荐使用Docker准备Paddle Serving编译环境。[Docker编译使用说明](./DOCKER.md)
-
-以下命令将会下载Paddle Serving最新代码，并执行编译。
-
-```shell
-$ git clone https://github.com/PaddlePaddle/Serving.git
-$ cd Serving
-$ mkdir build
-$ cd build
-$ cmake ..
-$ make -j4
-$ make install
-```
-
-`make install`将把目标产出放在/path/to/Paddle-Serving/build/output/目录下，目录结构：
-
-```
-.
-|-- bin                             # Paddle Serving工具和protobuf编译插件pdcodegen所在目录
-|-- conf
-|-- demo                            # demo总目录
-|   |-- client                      # Demo client端
-|   |   |-- bert                    # bert模型客户端
-|   |   |-- ctr_prediction          # CTR prediction模型客户端
-|   |   |-- dense_format            # dense_format客户端
-|   |   |-- echo                    # 最简单的echo service客户端
-|   |   |-- echo_kvdb               # local KV读取demo客户端
-|   |   |-- image_classification    # 图像分类任务客户端
-|   |   |-- int64tensor_format      # int64tensor_format示例客户端
-|   |   |-- sparse_format           # sparse_format示例客户端
-|   |   `-- text_classification     # 文本分类任务示例客户端
-|   |-- db_func
-|   |-- db_thread
-|   |-- kvdb_test
-|   `-- serving                     # Demo serving端；该serving可同时响应所有demo client请求
-|-- include                         # Paddle Serving发布的头文件
-|-- lib                             # Paddle Serving发布的libs
-`-- tool                            # Paddle Serving发布的工具目录
-
-```
-
-如要编写新的预测服务，请参考[从零开始写一个预测服务](CREATING.md)
-
-# CMake编译选项说明
-
-| 编译选项 | 说明 |
-|----------|------|
-| WITH_AVX | For configuring PaddlePaddle. Compile PaddlePaddle with AVX intrinsics |
-| WITH_MKL | For configuring PaddlePaddle. Compile PaddlePaddle with MKLML library |
-| WITH_GPU | For configuring PaddlePaddle. Compile PaddlePaddle with NVIDIA GPU |
-| CUDNN_ROOT| For configuring PaddlePaddle. Define CuDNN library and header path |
-| CLINET_ONLY | Compile client libraries and demos only |
-
-## WITH_GPU选项
-
-Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选项用于检测系统上CUDA/CUDNN等基础库，如检测到合适版本，在编译PaddlePaddle时就会编译出GPU版本的OP Kernel。
-
-在裸机上编译Paddle Serving GPU版本，需要安装这些基础库：
-
- CUDA
- CuDNN
- NCCL2
-
-这里要注意的是：
-1) 编译Serving所在的系统上所安装的CUDA/CUDNN等基础库版本，需要兼容实际的GPU设备。例如，Tesla V100卡至少要CUDA 9.0。如果编译时所用CUDA等基础库版本过低，由于生成的GPU代码和实际硬件设备不兼容，会导致Serving进程无法启动，或出现coredump等严重问题。
-2) 运行Paddle Serving的系统上安装与实际GPU设备兼容的CUDA driver，并安装与编译期所用的CUDA/CuDNN等版本兼容的基础库。如运行Paddle Serving的系统上安装的CUDA/CuDNN的版本低于编译时所用版本，可能会导致奇怪的cuda函数调用失败等问题。
-
-以下是PaddlePaddle发布版本所使用的基础库版本匹配关系，供参考：
-
-| | CUDA  | CuDNN | NCCL2 |
-|-|-------|--------------------------|-------|
-| CUDA 8 | 8.0.61 | CuDNN 7.1.2 for CUDA 8.0 | 2.1.4 |
-| CUDA 9 | 9.0.176 | CuDNN 7.3.1 for CUDA 9.0| 2.2.12 |
-
-### 如何让Paddle Serving编译系统探测到CuDNN库
-
-从NVIDIA developer官网下载对应版本CuDNN并在本地解压后，在cmake编译命令中增加-DCUDNN_ROOT参数，指定CuDNN库所在路径：
-
-```
-$ pwd
-/path/to/paddle-serving
-
-$ mkdir build && cd build
-$ cmake -DWITH_GPU=ON -DCUDNN_ROOT=/path/to/cudnn/cudnn_v7/cuda ..
-```
-
-### 如何让Paddle Serving编译系统探测到nccl库
-
-从NVIDIA developer官网下载对应版本nccl2库并解压后，增加如下环境变量 (以nccl2.1.4为例)：
-
-```
-$ export C_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$C_INCLUDE_PATH
-$ export CPLUS_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$CPLUS_INCLUDE_PATH
-$ export LD_LIBRARY_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/lib/:$LD_LIBRARY_PATH
-```
--- a/doc/RUN_IN_DOCKER.md
+++ b/doc/RUN_IN_DOCKER.md
@@ -13,7 +13,7 @@ You can get images in two ways:
 1. Pull image directly

   ```bash
-   docker pull hub.baidubce.com/ctr/paddleserving:0.1.3
+   docker pull hub.baidubce.com/paddlepaddle/serving:0.1.3
   ```

 2. Building image based on dockerfile
@@ -21,13 +21,13 @@ You can get images in two ways:
   Create a new folder and copy [Dockerfile](../tools/Dockerfile) to this folder, and run the following command:

   ```bash
-   docker build -t hub.baidubce.com/ctr/paddleserving:0.1.3 .
+   docker build -t hub.baidubce.com/paddlepaddle/serving:0.1.3 .
   ```

 ### Create container

 ```bash
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/ctr/paddleserving:0.1.3
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.1.3
 docker exec -it test bash
 ```

@@ -99,7 +99,7 @@ You can also get images in two ways:
 1. Pull image directly

   ```bash
-   nvidia-docker pull hub.baidubce.com/ctr/paddleserving:0.1.3-gpu
+   nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.1.3-gpu
   ```

 2. Building image based on dockerfile
@@ -107,13 +107,13 @@ You can also get images in two ways:
   Create a new folder and copy [Dockerfile.gpu](../tools/Dockerfile.gpu) to this folder, and run the following command:

   ```bash
-   nvidia-docker build -t hub.baidubce.com/ctr/paddleserving:0.1.3-gpu .
+   nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:0.1.3-gpu .
   ```

 ### Create container

 ```bash
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/ctr/paddleserving:0.1.3-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.1.3-gpu
 nvidia-docker exec -it test bash
 ```


--- a/doc/RUN_IN_DOCKER_CN.md
+++ b/doc/RUN_IN_DOCKER_CN.md
@@ -13,7 +13,7 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
 1. 直接拉取镜像

   ```bash
-   docker pull hub.baidubce.com/ctr/paddleserving:0.1.3
+   docker pull hub.baidubce.com/paddlepaddle/serving:0.1.3
   ```

 2. 基于Dockerfile构建镜像
@@ -21,13 +21,13 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
   建立新目录，复制[Dockerfile](../tools/Dockerfile)内容到该目录下Dockerfile文件。执行

   ```bash
-   docker build -t hub.baidubce.com/ctr/paddleserving:0.1.3 .
+   docker build -t hub.baidubce.com/paddlepaddle/serving:0.1.3 .
   ```

 ### 创建容器并进入

 ```bash
-docker run -p 9292:9292 --name test -dit hub.baidubce.com/ctr/paddleserving:0.1.3
+docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.1.3
 docker exec -it test bash
 ```

@@ -97,7 +97,7 @@ GPU版本与CPU版本基本一致，只有部分接口命名的差别（GPU版
 1. 直接拉取镜像

   ```bash
-   nvidia-docker pull hub.baidubce.com/ctr/paddleserving:0.1.3-gpu
+   nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.1.3-gpu
   ```

 2. 基于Dockerfile构建镜像
@@ -105,13 +105,13 @@ GPU版本与CPU版本基本一致，只有部分接口命名的差别（GPU版
   建立新目录，复制[Dockerfile.gpu](../tools/Dockerfile.gpu)内容到该目录下Dockerfile文件。执行

   ```bash
-   nvidia-docker build -t hub.baidubce.com/ctr/paddleserving:0.1.3-gpu .
+   nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:0.1.3-gpu .
   ```

 ### 创建容器并进入

 ```bash
-nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/ctr/paddleserving:0.1.3-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.1.3-gpu
 nvidia-docker exec -it test bash
 ```


--- a/doc/TRAIN_TO_SERVICE.md
+++ b/doc/TRAIN_TO_SERVICE.md
+# 端到端完成从训练到部署全流程
+
+Paddle Serving是Paddle的高性能在线预测服务框架，可以灵活支持大多数模型的部署。本文中将以IMDB评论情感分析任务为例通过9步展示从模型的训练到部署预测服务的全流程。
+
+## Step1：准备环境
+
+Paddle Serving可以部署在Centos和Ubuntu等Linux环境上，在其他系统上或者不希望安装serving模块的环境中仍然可以通过http服务来访问server端的预测服务。
+
+可以根据需求和机器环境来选择安装cpu或gpu版本的server模块，在client端机器上安装client模块。当希望同http来访问server端
+
+```shell
+pip install paddle_serving_server #cpu版本server端
+pip install paddle_serving_server_gpu #gpu版本server端
+pip install paddle_serving_client #client端
+```
+
+简单准备后，我们将以IMDB评论情感分析任务为例，展示从模型训练到部署预测服务的流程。示例中的所有代码都可以在Paddle Serving代码库的[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中找到，示例中使用的数据和词典文件可以通过执行IMDB示例代码中的get_data.sh脚本得到。
+
+## Step2：确定任务和原始数据格式
+
+IMDB评论情感分析任务是对电影评论的内容进行二分类，判断该评论是属于正面评论还是负面评论。
+
+首先我们来看一下原始的数据：
+
+```
+saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
+```
+
+这是一条英文评论样本，样本中使用|作为分隔符，分隔符之前为评论的内容，分隔符之后是样本的标签，0代表负样本，即负面评论，1代表正样本，即正面评论。
+
+## Step3：定义Reader，划分训练集、测试集
+
+对于原始文本我们需要将它转化为神经网络可以使用的数字id。imdb_reader.py脚本中定义了文本id化的方法，通过词典文件imdb.vocab将单词映射为整形数。
+
+<details>
+  <summary>imdb_reader.py</summary>
+
+```python
+import sys
+import os
+import paddle
+import re
+import paddle.fluid.incubate.data_generator as dg
+
+
+class IMDBDataset(dg.MultiSlotDataGenerator):
+    def load_resource(self, dictfile):
+        self._vocab = {}
+        wid = 0
+        with open(dictfile) as f:
+            for line in f:
+                self._vocab[line.strip()] = wid
+                wid += 1
+        self._unk_id = len(self._vocab)
+        self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
+        self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
+
+    def get_words_only(self, line):
+        sent = line.lower().replace("<br />", " ").strip()
+        words = [x for x in self._pattern.split(sent) if x and x != " "]
+        feas = [
+            self._vocab[x] if x in self._vocab else self._unk_id for x in words
+        ]
+        return feas
+
+    def get_words_and_label(self, line):
+        send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
+                                                              " ").strip()
+        label = [int(line.split('|')[-1])]
+
+        words = [x for x in self._pattern.split(send) if x and x != " "]
+        feas = [
+            self._vocab[x] if x in self._vocab else self._unk_id for x in words
+        ]
+        return feas, label
+
+    def infer_reader(self, infer_filelist, batch, buf_size):
+        def local_iter():
+            for fname in infer_filelist:
+                with open(fname, "r") as fin:
+                    for line in fin:
+                        feas, label = self.get_words_and_label(line)
+                        yield feas, label
+
+        import paddle
+        batch_iter = paddle.batch(
+            paddle.reader.shuffle(
+                local_iter, buf_size=buf_size),
+            batch_size=batch)
+        return batch_iter
+
+    def generate_sample(self, line):
+        def memory_iter():
+            for i in range(1000):
+                yield self.return_value
+
+        def data_iter():
+            feas, label = self.get_words_and_label(line)
+            yield ("words", feas), ("label", label)
+
+        return data_iter
+```
+</details>
+
+映射之后的样本类似于以下的格式：
+
+```
+257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
+```
+
+这样神经网络就可以将转化后的文本信息作为特征值进行训练。
+
+## Step4：定义CNN网络进行训练并保存
+
+接下来我们使用[CNN模型](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn)来进行训练。在nets.py脚本中定义网络结构。
+
+<details>
+  <summary>nets.py</summary>
+
+```python
+import sys
+import time
+import numpy as np
+
+import paddle
+import paddle.fluid as fluid
+
+def cnn_net(data,
+            label,
+            dict_dim,
+            emb_dim=128,
+            hid_dim=128,
+            hid_dim2=96,
+            class_dim=2,
+            win_size=3):
+    """ conv net. """
+    emb = fluid.layers.embedding(
+        input=data, size=[dict_dim, emb_dim], is_sparse=True)
+
+    conv_3 = fluid.nets.sequence_conv_pool(
+        input=emb,
+        num_filters=hid_dim,
+        filter_size=win_size,
+        act="tanh",
+        pool_type="max")
+
+    fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
+
+    prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
+    cost = fluid.layers.cross_entropy(input=prediction, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    acc = fluid.layers.accuracy(input=prediction, label=label)
+
+    return avg_cost, acc, prediction
+```
+
+</details>
+
+使用训练样本进行训练，训练脚本为local_train.py。在训练结束后使用paddle_serving_client.io.save_model函数来保存部署预测服务使用的模型文件和配置文件。
+
+<details>
+  <summary>local_train.py</summary>
+
+```python
+import os
+import sys
+import paddle
+import logging
+import paddle.fluid as fluid
+
+logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger("fluid")
+logger.setLevel(logging.INFO)
+
+# 加载词典文件
+def load_vocab(filename):
+    vocab = {}
+    with open(filename) as f:
+        wid = 0
+        for line in f:
+            vocab[line.strip()] = wid
+            wid += 1
+    vocab["<unk>"] = len(vocab)
+    return vocab
+
+
+if __name__ == "__main__":
+    from nets import cnn_net
+    model_name = "imdb_cnn"
+    vocab = load_vocab('imdb.vocab')
+    dict_dim = len(vocab)
+    
+    #定义模型输入
+    data = fluid.layers.data(
+        name="words", shape=[1], dtype="int64", lod_level=1)
+    label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+    #定义dataset，train_data为训练数据目录
+    dataset = fluid.DatasetFactory().create_dataset()
+    filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
+    dataset.set_use_var([data, label])
+    pipe_command = "python imdb_reader.py"
+    dataset.set_pipe_command(pipe_command)
+    dataset.set_batch_size(4)
+    dataset.set_filelist(filelist)
+    dataset.set_thread(10)
+    #定义模型
+    avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
+    optimizer = fluid.optimizer.SGD(learning_rate=0.001)
+    optimizer.minimize(avg_cost)
+    #执行训练
+    exe = fluid.Executor(fluid.CPUPlace())
+    exe.run(fluid.default_startup_program())
+    epochs = 100
+		
+    import paddle_serving_client.io as serving_io
+
+    for i in range(epochs):
+        exe.train_from_dataset(
+            program=fluid.default_main_program(), dataset=dataset, debug=False)
+        logger.info("TRAIN --> pass: {}".format(i))
+        if i == 64:
+            #在训练结束时使用PaddleServing中的模型保存接口保存出Serving所需的模型和配置文件
+            serving_io.save_model("{}_model".format(model_name),
+                                  "{}_client_conf".format(model_name),
+                                  {"words": data}, {"prediction": prediction},
+                                  fluid.default_main_program())
+```
+
+</details>
+
+![训练过程](./imdb_loss.png)由上图可以看出模型的损失在第65轮之后开始收敛，我们在第65轮训练完成后保存模型和配置文件。保存的文件分为imdb_cnn_client_conf和imdb_cnn_model文件夹，前者包含client端的配置文件，后者包含server端的配置文件和保存的模型文件。
+save_model函数的参数列表如下：
+
+| 参数                 | 含义                                                         |
+| -------------------- | ------------------------------------------------------------ |
+| server_model_folder  | 保存server端配置文件和模型文件的目录                         |
+| client_config_folder | 保存client端配置文件的目录                                   |
+| feed_var_dict        | 用于预测的模型的输入，dict类型，key可以自定义，value为模型中的input variable，每个key对应一个variable，使用预测服务时，输入数据使用key作为输入的名称 |
+| fetch_var_dict       | 用于预测的模型的输出，dict类型，key可以自定义，value为模型中的input variable，每个key对应一个variable，使用预测服务时，通过key来获取返回数据 |
+| main_program         | 模型的program                                                |
+
+## Step5：部署RPC预测服务
+
+Paddle Serving框架支持两种预测服务方式，一种是通过RPC进行通信，一种是通过HTTP进行通信，下面将先介绍RPC预测服务的部署和使用方法，在Step8开始介绍HTTP预测服务的部署和使用。
+
+```shell
+python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292 #cpu预测服务
+python -m paddle_serving_server_gpu.serve --model imdb_cnn_model/ --port 9292 --gpu_ids 0 #gpu预测服务
+```
+
+命令中参数--model 指定在之前保存的server端的模型和配置文件目录，--port指定预测服务的端口，当使用gpu版本部署gpu预测服务时可以使用--gpu_ids指定使用的gpu 。
+
+执行完以上命令之一，就完成了IMDB 情感分析任务的RPC预测服务部署。
+
+## Step6:复用Reader，定义远程RPC客户端
+下面我们通过Python代码来访问RPC预测服务，脚本为test_client.py
+
+<details>
+  <summary>test_client.py</summary>
+
+```python
+from paddle_serving_client import Client
+from imdb_reader import IMDBDataset
+import sys
+
+client = Client()
+client.load_client_config(sys.argv[1])
+client.connect(["127.0.0.1:9292"])
+
+#在这里复用了数据预处理部分的代码将原始文本转换成数字id
+imdb_dataset = IMDBDataset()
+imdb_dataset.load_resource(sys.argv[2])
+
+for line in sys.stdin:
+    word_ids, label = imdb_dataset.get_words_and_label(line)
+    feed = {"words": word_ids}
+    fetch = ["acc", "cost", "prediction"]
+    fetch_map = client.predict(feed=feed, fetch=fetch)
+    print("{} {}".format(fetch_map["prediction"][1], label[0]))
+```
+
+</details>
+
+脚本从标准输入接收数据，并打印出样本预测为1的概率与真实的label。
+
+## Step7：调用RPC服务，测试模型效果
+
+以上一步实现的客户端为例运行预测服务，使用方式如下：
+
+```shell
+cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
+```
+
+使用test_data/part-0文件中的2084个样本进行测试测试，模型预测的准确率为88.19%。
+
+**注意**：每次模型训练的效果可能略有不同，使用训练出的模型预测的准确率会与示例中接近但有可能不完全一致。
+
+## Step8：部署HTTP预测服务
+
+使用HTTP预测服务时，client端不需要安装Paddle Serving的任何模块，仅需要能发送HTTP请求即可。当然HTTP的通信方式会相较于RPC的通信方式在通信阶段消耗更多的时间。
+
+对于IMDB情感分析任务原始文本在预测之前需要进行预处理，在RPC预测服务中我们将预处理放在client的脚本中，而在HTTP预测服务中我们将预处理放在server端。Paddle Serving的HTTP预测服务框架为这种情况准备了数据预处理和后处理的接口，我们只要根据任务需要重写即可。
+
+Serving提供了示例代码，通过执行[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中的imdb_web_service_demo.sh脚本来获取。
+
+下面我们来看一下启动HTTP预测服务的脚本text_classify_service.py。
+
+<details>
+  <summary>text_clssify_service.py</summary>
+
+```python
+from paddle_serving_server.web_service import WebService
+from imdb_reader import IMDBDataset
+import sys
+
+#继承框架中的WebService类
+class IMDBService(WebService):
+    def prepare_dict(self, args={}):
+        if len(args) == 0:
+            exit(-1)
+        self.dataset = IMDBDataset()
+        self.dataset.load_resource(args["dict_file_path"])
+        
+		#重写preprocess方法来实现数据预处理，这里也复用了训练时使用的reader脚本
+    def preprocess(self, feed={}, fetch=[]):
+        if "words" not in feed:
+            exit(-1)
+        res_feed = {}
+        res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
+        return res_feed, fetch
+
+#这里需要使用name参数指定预测服务的名称，
+imdb_service = IMDBService(name="imdb")
+imdb_service.load_model_config(sys.argv[1])
+imdb_service.prepare_server(
+    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
+imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
+imdb_service.run_server()
+```
+</details>
+
+启动命令
+
+```shell
+python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
+```
+
+以上命令中参数1为保存的server端模型和配置文件，参数2为工作目录会保存一些预测服务工作时的配置文件，该目录可以不存在但需要指定名称，预测服务会自行创建，参数3为端口号，参数4为词典文件。
+
+## Step9：明文数据调用预测服务
+启动完HTTP预测服务，即可通过一行命令进行预测：
+
+```
+curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
+```
+预测流程正常时，会返回预测概率，示例如下。
+
+```
+{"prediction":[0.5592559576034546,0.44074398279190063]}
+```
+
+**注意**：每次模型训练的效果可能略有不同，使用训练出的模型预测概率数值可能与示例不一致。
--- a/doc/cube-cli.png
+++ b/doc/cube-cli.png
--- a/doc/imdb_loss.png
+++ b/doc/imdb_loss.png
--- a/doc/timeline-example.png
+++ b/doc/timeline-example.png
--- a/python/examples/criteo_ctr_with_cube/README.md
+++ b/python/examples/criteo_ctr_with_cube/README.md
-## 带稀疏参数服务器的CTR预测服务
+## Criteo CTR with Sparse Parameter Indexing Service
+
+([简体中文](./README_CN.md)|English)
+
+### Get Sample Dataset

-### 获取样例数据
 ```
 sh get_data.sh
 ```

-### 保存模型和配置文件
+### Train and Save Model
 ```
 python local_train.py
 ```
-执行脚本后会在当前目录生成ctr_server_model和ctr_client_config文件夹,以及ctr_server_model_kv, ctr_client_conf_kv。
+the trained model will be in ./ctr_server_model and ./ctr_client_config, and ctr_server_model_kv, ctr_client_conf_kv。

-### 启动稀疏参数服务器
+### Start Sparse Parameter Indexing Service
 ```
 cp ../../../build_server/core/predictor/seq_generator seq_generator
 cp ../../../build_server/output/bin/cube* ./cube/
 sh cube_prepare.sh &
 ```

-### 启动RPC预测服务，服务端线程数为4（可在test_server.py配置）
+Here, the sparse parameter is loaded by cube sparse parameter indexing service Cube，for more details please read [Cube: Sparse Parameter Indexing Service (Local Mode)](../../../doc/CUBE_LOCAL.md)
+
+### Start RPC Predictor, the number of serving thread is 4（configurable in test_server.py）

 ```
 python test_server.py ctr_serving_model_kv 
 ```

-### 执行预测
+### Run Prediction

 ```
 python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
@@ -32,17 +37,17 @@ python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data

 ### Benchmark

-设备 ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 
+CPU ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 

-模型 ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py)
+Model ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py)

 server core/thread num ： 4/8

-执行
+Run
 ```
 bash benchmark.sh
 ```
-客户端每个线程会发送1000个batch
+1000 batches will be sent by every client

 | client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | avg_latency | qps   |
 | ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | ----- |
@@ -52,10 +57,10 @@ bash benchmark.sh
 | 8                  | 0.044  | 8.230        | 0.028  | 0.464 | 0.0023 | 0.0034  | 14.191 | 563.8 |
 | 16                 | 0.048  | 21.037       | 0.028  | 0.455 | 0.0025 | 0.0041  | 27.236 | 587.5 |

-平均每个线程耗时图如下
+the average latency of threads

 ![avg cost](../../../doc/criteo-cube-benchmark-avgcost.png)

-每个线程QPS耗时如下
+The QPS is 

 ![qps](../../../doc/criteo-cube-benchmark-qps.png)
--- a/python/examples/criteo_ctr_with_cube/README_CN.md
+++ b/python/examples/criteo_ctr_with_cube/README_CN.md
+## 带稀疏参数索引服务的CTR预测服务
+(简体中文|[English](./README.md))
+
+### 获取样例数据
+```
+sh get_data.sh
+```
+
+### 保存模型和配置文件
+```
+python local_train.py
+```
+执行脚本后会在当前目录生成ctr_server_model和ctr_client_config文件夹,以及ctr_server_model_kv, ctr_client_conf_kv。
+
+### 启动稀疏参数索引服务
+```
+cp ../../../build_server/core/predictor/seq_generator seq_generator
+cp ../../../build_server/output/bin/cube* ./cube/
+sh cube_prepare.sh &
+```
+
+此处，模型当中的稀疏参数会被存放在稀疏参数索引服务Cube当中，关于稀疏参数索引服务Cube的介绍，请阅读[稀疏参数索引服务Cube单机版使用指南](../../../doc/CUBE_LOCAL_CN.md)
+
+### 启动RPC预测服务，服务端线程数为4（可在test_server.py配置）
+
+```
+python test_server.py ctr_serving_model_kv 
+```
+
+### 执行预测
+
+```
+python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
+```
+
+### Benchmark
+
+设备 ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 
+
+模型 ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py)
+
+server core/thread num ： 4/8
+
+执行
+```
+bash benchmark.sh
+```
+客户端每个线程会发送1000个batch
+
+| client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | avg_latency | qps   |
+| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | ----- |
+| 1                  | 0.035  | 1.596        | 0.021  | 0.518 | 0.0024 | 0.0025  | 6.774 | 147.7 |
+| 2                  | 0.034  | 1.780        | 0.027  | 0.463 | 0.0020 | 0.0023  | 6.931 | 288.3 |
+| 4                  | 0.038  | 2.954        | 0.025  | 0.455 | 0.0019 | 0.0027  | 8.378 | 477.5 |
+| 8                  | 0.044  | 8.230        | 0.028  | 0.464 | 0.0023 | 0.0034  | 14.191 | 563.8 |
+| 16                 | 0.048  | 21.037       | 0.028  | 0.455 | 0.0025 | 0.0041  | 27.236 | 587.5 |
+
+平均每个线程耗时图如下
+
+![avg cost](../../../doc/criteo-cube-benchmark-avgcost.png)
+
+每个线程QPS耗时如下
+
+![qps](../../../doc/criteo-cube-benchmark-qps.png)
--- a/python/examples/imdb/text_classify_service.py
+++ b/python/examples/imdb/text_classify_service.py
@@ -29,7 +29,7 @@ class IMDBService(WebService):
        if "words" not in feed:
            exit(-1)
        res_feed = {}
-        res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
+        res_feed["words"] = self.dataset.get_words_only(feed["words"])
        return res_feed, fetch



--- a/python/examples/util/README.md
+++ b/python/examples/util/README.md
@@ -21,3 +21,7 @@ python timeline_trace.py profile trace
 脚本将日志中的时间打点信息转换成json格式保存到trace文件，trace文件可以通过chrome浏览器的tracing功能进行可视化。

 具体操作：打开chrome浏览器，在地址栏输入chrome://tracing/，跳转至tracing页面，点击load按钮，打开保存的trace文件，即可将预测服务的各阶段时间信息可视化。
+
+效果如下图，图中展示了client端启动4进程时的bert示例的各阶段timeline，其中bert_pre代表client端的数据预处理阶段，client_infer代表client完成预测请求的发送和接收结果的阶段，每个进进程的第二行展示的是server各个op的timeline。
+
+![timeline](../../../doc/timeline-example.png)
--- a/python/paddle_serving_app/utils/__init__.py
+++ b/python/paddle_serving_app/utils/__init__.py
+#   Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/python/paddle_serving_client/__init__.py
+++ b/python/paddle_serving_client/__init__.py
@@ -86,6 +86,8 @@ class Client(object):
        self.rpath()
        self.pid = os.getpid()
        self.predictor_sdk_ = None
+        self.producers = []
+        self.consumer = None

    def rpath(self):
        lib_path = os.path.dirname(paddle_serving_client.__file__)
@@ -178,47 +180,26 @@ class Client(object):
            raise SystemExit("The shape of feed tensor {} not match.".format(
                key))

-    def predict(self, feed={}, fetch=[], need_variant_tag=False):
-        int_slot = []
-        float_slot = []
-        int_feed_names = []
-        float_feed_names = []
-        fetch_names = []
-
-        for key in feed:
-            self.shape_check(feed, key)
-            if key not in self.feed_names_:
-                continue
-            if self.feed_types_[key] == int_type:
-                int_feed_names.append(key)
-                int_slot.append(feed[key])
-            elif self.feed_types_[key] == float_type:
-                float_feed_names.append(key)
-                float_slot.append(feed[key])
-
-        for key in fetch:
-            if key in self.fetch_names_:
-                fetch_names.append(key)
+    def predict(self, feed=None, fetch=None, need_variant_tag=False):
+        if feed is None or fetch is None:
+            raise ValueError("You should specify feed and fetch for prediction")

-        ret = self.client_handle_.predict(float_slot, float_feed_names,
-                                          int_slot, int_feed_names, fetch_names,
-                                          self.result_handle_, self.pid)
-
-        result_map = {}
-        for i, name in enumerate(fetch_names):
-            if self.fetch_names_to_type_[name] == int_type:
-                result_map[name] = self.result_handle_.get_int64_by_name(name)[
-                    0]
-            elif self.fetch_names_to_type_[name] == float_type:
-                result_map[name] = self.result_handle_.get_float_by_name(name)[
-                    0]
+        fetch_list = []
+        if isinstance(fetch, str):
+            fetch_list = [fetch]
+        elif isinstance(fetch, list):
+            fetch_list = fetch
+        else:
+            raise ValueError("fetch only accepts string and list of string")

-        return [
-            result_map,
-            self.result_handle_.variant_tag(),
-        ] if need_variant_tag else result_map
+        feed_batch = []
+        if isinstance(feed, dict):
+            feed_batch.append(feed)
+        elif isinstance(feed, list):
+            feed_batch = feed
+        else:
+            raise ValueError("feed only accepts dict and list of dict")

-    def batch_predict(self, feed_batch=[], fetch=[], need_variant_tag=False):
        int_slot_batch = []
        float_slot_batch = []
        int_feed_names = []
@@ -226,28 +207,33 @@ class Client(object):
        fetch_names = []
        counter = 0
        batch_size = len(feed_batch)
-        for feed in feed_batch:
+
+        for key in fetch_list:
+            if key in self.fetch_names_:
+                fetch_names.append(key)
+
+        if len(fetch_names) == 0:
+            raise ValueError(
+                "fetch names should not be empty or out of saved fetch list")
+            return {}
+
+        for i, feed_i in enumerate(feed_batch):
            int_slot = []
            float_slot = []
-            for key in feed:
+            for key in feed_i:
                if key not in self.feed_names_:
                    continue
                if self.feed_types_[key] == int_type:
-                    if counter == 0:
+                    if i == 0:
                        int_feed_names.append(key)
                    int_slot.append(feed[key])
                elif self.feed_types_[key] == float_type:
-                    if counter == 0:
+                    if i == 0:
                        float_feed_names.append(key)
-                    float_slot.append(feed[key])
-            counter += 1
+                    float_slot.append(feed_i[key])
            int_slot_batch.append(int_slot)
            float_slot_batch.append(float_slot)

-        for key in fetch:
-            if key in self.fetch_names_:
-                fetch_names.append(key)
-
        result_batch = self.result_handle_
        res = self.client_handle_.batch_predict(
            float_slot_batch, float_feed_names, int_slot_batch, int_feed_names,
@@ -266,10 +252,12 @@ class Client(object):
                single_result[key] = result_map[key][i]
            result_map_batch.append(single_result)

-        return [
-            result_map,
-            self.result_handle_.variant_tag(),
-        ] if need_variant_tag else result_map
+        if batch_size == 1:
+            return [result_map_batch[0], self.result_handle_.variant_tag()
+                    ] if need_variant_tag else result_map_batch[0]
+        else:
+            return [result_map_batch, self.result_handle_.variant_tag()
+                    ] if need_variant_tag else result_map_batch

    def release(self):
        self.client_handle_.destroy_predictor()

--- a/python/setup.py.app.in
+++ b/python/setup.py.app.in
+#   Copyright (c) 2020  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Setup for pip package."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import platform
+import os
+
+from setuptools import setup, Distribution, Extension
+from setuptools import find_packages
+from setuptools import setup
+from paddle_serving_app.version import serving_app_version
+from pkg_resources import DistributionNotFound, get_distribution
+        
+def python_version():
+    return [int(v) for v in platform.python_version().split(".")]
+
+def find_package(pkgname):
+    try:
+        get_distribution(pkgname)
+        return True
+    except DistributionNotFound:
+        return False
+
+max_version, mid_version, min_version = python_version()
+
+if '${PACK}' == 'ON':
+    copy_lib()
+
+
+REQUIRED_PACKAGES = [
+    'six >= 1.10.0', 'sentencepiece'
+]
+
+packages=['paddle_serving_app',
+          'paddle_serving_app.reader',
+	  'paddle_serving_app.utils']
+
+package_data={}
+package_dir={'paddle_serving_app':
+             '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app',
+             'paddle_serving_app.reader':
+             '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/reader',
+	     'paddle_serving_app.utils':
+	     '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/utils',}
+
+setup(
+    name='paddle-serving-app',
+    version=serving_app_version.replace('-', ''),
+    description=
+    ('Paddle Serving Package for saved model with PaddlePaddle'),
+    url='https://github.com/PaddlePaddle/Serving',
+    author='PaddlePaddle Author',
+    author_email='guru4elephant@gmail.com',
+    install_requires=REQUIRED_PACKAGES,
+    packages=packages,
+    package_data=package_data,
+    package_dir=package_dir,
+    # PyPI package information.
+    classifiers=[
+        'Development Status :: 4 - Beta',
+        'Intended Audience :: Developers',
+        'Intended Audience :: Education',
+        'Intended Audience :: Science/Research',
+        'License :: OSI Approved :: Apache Software License',
+        'Programming Language :: Python :: 2.7',
+        'Programming Language :: Python :: 3',
+        'Programming Language :: Python :: 3.4',
+        'Programming Language :: Python :: 3.5',
+        'Programming Language :: Python :: 3.6',
+        'Topic :: Scientific/Engineering',
+        'Topic :: Scientific/Engineering :: Mathematics',
+        'Topic :: Scientific/Engineering :: Artificial Intelligence',
+        'Topic :: Software Development',
+        'Topic :: Software Development :: Libraries',
+        'Topic :: Software Development :: Libraries :: Python Modules',
+    ],
+    license='Apache 2.0',
+    keywords=('paddle-serving serving-client deployment industrial easy-to-use'))