Merge pull request #3 from PaddlePaddle/develop

update

Merge pull request #3 from PaddlePaddle/develop
update
fcc97c73 · Thomas Young · GitHub · 6bf4c3a8 · 7ea59bda · fcc97c73
67 changed file
--- a/README.md
+++ b/README.md
@@ -45,10 +45,11 @@ nvidia-docker exec -it test bash
 ```

 ```shell
-pip install paddle-serving-client==0.3.2 
-pip install paddle-serving-server==0.3.2 # CPU
-pip install paddle-serving-server-gpu==0.3.2.post9 # GPU with CUDA9.0
-pip install paddle-serving-server-gpu==0.3.2.post10 # GPU with CUDA10.0
+pip install paddle-serving-client==0.4.0 
+pip install paddle-serving-server==0.4.0 # CPU
+pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.4.0.trt # GPU with CUDA10.1+TensorRT
 ```

 You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
@@ -57,7 +58,7 @@ If you need install modules compiled with develop branch, please download packag

 Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.

-Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.6/3.7.
+Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7.

 Recommended to install paddle >= 1.8.4.

@@ -65,15 +66,6 @@ For **Windows Users**, please read the document [Paddle Serving for Windows User

 <h2 align="center"> Pre-built services with Paddle Serving</h2>

-<h3 align="center">Latest release</h4>
-<p align="center">
-    <a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr">Optical Character Recognition</a>
-    <br>
-    <a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/faster_rcnn_model">Object Detection</a>
-    <br>
-    <a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/deeplabv3">Image Segmentation</a>
-<p>
-
 <h3 align="center">Chinese Word Segmentation</h4>

 ``` shell
@@ -113,11 +105,11 @@ tar -xzf uci_housing.tar.gz

 Paddle Serving provides HTTP and RPC based service for users to access

-### HTTP service
+### RPC service

-Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a RPC service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
+A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```
 <center>

@@ -125,42 +117,63 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 |--------------|------|-----------|--------------------------------|
 | `thread` | int | `4` | Concurrency of current service |
 | `port` | int | `9292` | Exposed port of current service to users|
-| `name` | str | `""` | Service name, can be used to generate HTTP request url |
 | `model` | str | `""` | Path of paddle model directory to be served |
 | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
 | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
 | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT  |

-Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
 </center>

-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-```
-
-### RPC service
-
-A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
-``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
-```
-
-``` python
+```python
 # A user can visit rpc service through paddle_serving_client API
 from paddle_serving_client import Client
-
+import numpy as np
 client = Client()
 client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9292"])
 data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-fetch_map = client.predict(feed={"x": data}, fetch=["price"])
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
 print(fetch_map)
-
 ```
 Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.

+
+### WEB service
+
+Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is `python/examples/fit_a_line`
+
+```python
+from paddle_serving_server.web_service import WebService
+import numpy as np
+
+class UciService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
+
+uci_service = UciService(name="uci")
+uci_service.load_model_config("uci_housing_model")
+uci_service.prepare_server(workdir="workdir", port=9292)
+uci_service.run_rpc_service()
+uci_service.run_web_service()
+```
+for client side,
+```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
+the response is
+```
+{"result":{"price":[[18.901151657104492]]}}
+```
+
 <h2 align="center">Some Key Features of Paddle Serving</h2>

 - Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
@@ -215,6 +228,10 @@ To connect with other users and contributors, welcome to join our [Slack channel

 If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/CONTRIBUTE.md)

+- Special Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command
+- Special Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark
+- Special Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark and modifying resize comment error
+
 ### Feedback

 For any feedback or to report a bug, please propose a [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues).

--- a/README_CN.md
+++ b/README_CN.md
@@ -47,10 +47,11 @@ nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/se
 nvidia-docker exec -it test bash
 ```
 ```shell
-pip install paddle-serving-client==0.3.2
-pip install paddle-serving-server==0.3.2 # CPU
-pip install paddle-serving-server-gpu==0.3.2.post9 # GPU with CUDA9.0
-pip install paddle-serving-server-gpu==0.3.2.post10 # GPU with CUDA10.0
+pip install paddle-serving-client==0.4.0
+pip install paddle-serving-server==0.4.0 # CPU
+pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
+pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
+pip install paddle-serving-server-gpu==0.4.0.trt # GPU with CUDA10.1+TensorRT
 ```

 您可能需要使用国内镜像源（例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`）来加速下载。
@@ -107,13 +108,12 @@ tar -xzf uci_housing.tar.gz

 Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务

+<h3 align="center">RPC服务</h3>

-<h3 align="center">HTTP服务</h3>
-
-Paddle Serving提供了一个名为`paddle_serving_server.serve`的内置python模块，可以使用单行命令启动RPC服务或HTTP服务。如果我们指定参数`--name uci`，则意味着我们将拥有一个HTTP服务，其URL为$IP:$PORT/uci/prediction`。
+用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发，但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。

 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
 ```
 <center>

@@ -128,21 +128,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
 | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
 | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT  |

-我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
+我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求，请参考英文文
+档 [requests](https://requests.readthedocs.io/en/master/)。
 </center>

-``` shell
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-```
-
-<h3 align="center">RPC服务</h3>
-
-用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发，但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。
-
-``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
-```
-
 ``` python
 # A user can visit rpc service through paddle_serving_client API
 from paddle_serving_client import Client
@@ -152,12 +141,45 @@ client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
 client.connect(["127.0.0.1:9292"])
 data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-fetch_map = client.predict(feed={"x": data}, fetch=["price"])
+fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
 print(fetch_map)

 ```
 在这里，`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中，在训练过程中保存可服务模型时，被赋值的tensor名为`"x"`和`"price"`。

+<h3 align="center">HTTP服务</h3>
+用户也可以将数据格式处理逻辑放在服务器端进行，这样就可以直接用curl去访问服务，参考如下案例，在目录`python/examples/fit_a_line`
+
+```python
+from paddle_serving_server.web_service import WebService
+import numpy as np
+
+class UciService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch
+
+uci_service = UciService(name="uci")
+uci_service.load_model_config("uci_housing_model")
+uci_service.prepare_server(workdir="workdir", port=9292)
+uci_service.run_rpc_service()
+uci_service.run_web_service()
+```
+客户端输入
+```
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
+```
+返回结果
+```
+{"result":{"price":[[18.901151657104492]]}}
+```
+
 <h2 align="center">Paddle Serving的核心功能</h2>

 - 与Paddle训练紧密连接，绝大部分Paddle模型可以 **一键部署**.
@@ -210,6 +232,10 @@ print(fetch_map)

 如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines](doc/CONTRIBUTE.md)

+- 特别感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程，更新FAQ教程，整理文件目录。
+- 特别感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本
+- 特别感谢 [@cg82616424](https://github.com/cg82616424) 提供unet benchmark脚本和修改部分注释错误
+
 ### 反馈

 如有任何反馈或是bug，请在 [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues)提交

--- a/cmake/paddlepaddle.cmake
+++ b/cmake/paddlepaddle.cmake
@@ -114,7 +114,7 @@ ADD_LIBRARY(openblas STATIC IMPORTED GLOBAL)
 SET_PROPERTY(TARGET openblas PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/openblas/lib/libopenblas.a)

 ADD_LIBRARY(paddle_fluid SHARED IMPORTED GLOBAL)
-SET_PROPERTY(TARGET paddle_fluid PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/lib/libpaddle_fluid.a)
+SET_PROPERTY(TARGET paddle_fluid PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/lib/libpaddle_fluid.so)

 if (WITH_TRT)
 ADD_LIBRARY(nvinfer SHARED IMPORTED GLOBAL)
@@ -127,17 +127,12 @@ endif()
 ADD_LIBRARY(xxhash STATIC IMPORTED GLOBAL)
 SET_PROPERTY(TARGET xxhash PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/xxhash/lib/libxxhash.a)

-ADD_LIBRARY(cryptopp STATIC IMPORTED GLOBAL)
-SET_PROPERTY(TARGET cryptopp PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/cryptopp/lib/libcryptopp.a)
-
 LIST(APPEND external_project_dependencies paddle)

 LIST(APPEND paddle_depend_libs
-        xxhash cryptopp)
-
+    xxhash)

 if(WITH_TRT)
 LIST(APPEND paddle_depend_libs
    nvinfer nvinfer_plugin)
 endif()
-
--- a/core/predictor/tools/seq_generator.cpp
+++ b/core/predictor/tools/seq_generator.cpp
@@ -17,11 +17,11 @@
 #include <fstream>
 #include <iostream>
 #include <memory>
-#include <thread>  //NOLINT
+#include <thread>

 #include "core/predictor/framework.pb.h"
-#include "quant.h"     // NOLINT
-#include "seq_file.h"  // NOLINT
+#include "quant.h"
+#include "seq_file.h"

 inline uint64_t time_diff(const struct timeval &start_time,
                          const struct timeval &end_time) {
@@ -113,15 +113,13 @@ int dump_parameter(const char *input_file, const char *output_file) {
    // std::cout << "key_len " << key_len << " value_len " << value_buf_len
    // << std::endl;
    memcpy(value_buf, tensor_buf + offset, value_buf_len);
-    seq_file_writer.write(
-        std::to_string(i).c_str(), sizeof(i), value_buf, value_buf_len);
+    seq_file_writer.write((char *)&i, sizeof(i), value_buf, value_buf_len);
    offset += value_buf_len;
  }
  return 0;
 }

-float *read_embedding_table(const char *file1,
-                            std::vector<int64_t> &dims) {  // NOLINT
+float *read_embedding_table(const char *file1, std::vector<int64_t> &dims) {
  std::ifstream is(file1);
  // Step 1: is read version, os write version
  uint32_t version;
@@ -244,7 +242,7 @@ int compress_parameter_parallel(const char *file1,
          float x = *(emb_table + k * emb_size + e);
          int val = round((x - xmin) / scale);
          val = std::max(0, val);
-          val = std::min(static_cast<int>(pow2bits) - 1, val);
+          val = std::min((int)pow2bits - 1, val);
          *(tensor_temp + 2 * sizeof(float) + e) = val;
        }
        result[k] = tensor_temp;
@@ -264,8 +262,7 @@ int compress_parameter_parallel(const char *file1,
  }
  SeqFileWriter seq_file_writer(file2);
  for (int64_t i = 0; i < dict_size; i++) {
-    seq_file_writer.write(
-        std::to_string(i).c_str(), sizeof(i), result[i], per_line_size);
+    seq_file_writer.write((char *)&i, sizeof(i), result[i], per_line_size);
  }
  return 0;
 }

--- a/doc/BERT_10_MINS.md
+++ b/doc/BERT_10_MINS.md
@@ -56,21 +56,25 @@ the script of client side bert_client.py is as follow:

 [//file]:#bert_client.py
 ``` python
-import os
 import sys
 from paddle_serving_client import Client
+from paddle_serving_client.utils import benchmark_args
 from paddle_serving_app.reader import ChineseBertReader
+import numpy as np
+args = benchmark_args()

-reader = ChineseBertReader()
+reader = ChineseBertReader({"max_seq_len": 128})
 fetch = ["pooled_output"]
-endpoint_list = ["127.0.0.1:9292"]
+endpoint_list = ['127.0.0.1:9292']
 client = Client()
-client.load_client_config("bert_seq20_client/serving_client_conf.prototxt")
+client.load_client_config(args.model)
 client.connect(endpoint_list)

 for line in sys.stdin:
    feed_dict = reader.process(line)
-    result = client.predict(feed=feed_dict, fetch=fetch)
+    for key in feed_dict.keys():
+        feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
+    result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
 ```

 run

--- a/doc/BERT_10_MINS_CN.md
+++ b/doc/BERT_10_MINS_CN.md
@@ -52,18 +52,23 @@ pip install paddle_serving_app
 ``` python
 import sys
 from paddle_serving_client import Client
+from paddle_serving_client.utils import benchmark_args
 from paddle_serving_app.reader import ChineseBertReader
+import numpy as np
+args = benchmark_args()

-reader = ChineseBertReader()
+reader = ChineseBertReader({"max_seq_len": 128})
 fetch = ["pooled_output"]
-endpoint_list = ["127.0.0.1:9292"]
+endpoint_list = ['127.0.0.1:9292']
 client = Client()
-client.load_client_config("bert_seq20_client/serving_client_conf.prototxt")
+client.load_client_config(args.model)
 client.connect(endpoint_list)

 for line in sys.stdin:
    feed_dict = reader.process(line)
-    result = client.predict(feed=feed_dict, fetch=fetch)
+    for key in feed_dict.keys():
+        feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
+    result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
 ```

 执行

--- a/doc/RUN_IN_DOCKER.md
+++ b/doc/RUN_IN_DOCKER.md
@@ -32,63 +32,9 @@ The `-p` option is to map the `9292` port of the container to the `9292` port of

 ### Install PaddleServing

-In order to make the image smaller, the PaddleServing package is not installed in the image. You can run the following command to install it:
-
-```bash
-pip install paddle-serving-server
-```
-
-You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source of the following example) to speed up the download:
-
-```shell
-pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-### Test example
-
-Get the trained Boston house price prediction model by the following command:
-
-```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
-```
-
- Test HTTP service
-
-  Running on the Server side (inside the container):
-
-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
-  ```
-
-  Running on the Client side (inside or outside the container):
-
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
-
- Test RPC service
-
-  Running on the Server side (inside the container):
-
-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
-  ```
-
-  Running following Python code on the Client side (inside or outside the container, The `paddle-serving-client` package needs to be installed):
-
-  ```bash
-  from paddle_serving_client import Client
-  
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
+The mirror comes with `paddle_serving_server`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.

+If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.
  

 ## GPU
@@ -100,7 +46,7 @@ The GPU version is basically the same as the CPU version, with only some differe
 Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:

 ```shell
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
 ```

 ### Create container
@@ -110,77 +56,21 @@ nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/se
 nvidia-docker exec -it test bash
 ```

-The `-p` option is to map the `9292` port of the container to the `9292` port of the host.
-
-### Install PaddleServing
-
-In order to make the image smaller, the PaddleServing package is not installed in the image. You can run the following command to install it:
+or

 ```bash
-pip install paddle-serving-server-gpu
-```
-
-You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source of the following example) to speed up the download:
-
-```shell
-pip install paddle-serving-server-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-### Test example
-
-When running the GPU Server, you need to set the GPUs used by the prediction service through the `--gpu_ids` option, and the CPU is used by default. An error will be reported when the value of `--gpu_ids` exceeds the environment variable `CUDA_VISIBLE_DEVICES`. The following example specifies to use a GPU with index 0:
-```shell
-export CUDA_VISIBLE_DEVICES=0,1
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --gpu_ids 0
-```
-
-
-Get the trained Boston house price prediction model by the following command:
-
-```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
+docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker exec -it test bash
 ```

- Test HTTP service
-
-  Running on the Server side (inside the container):
-
-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --name uci --gpu_ids 0
-  ```
-
-  Running on the Client side (inside or outside the container):
-
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
-
- Test RPC service
-
-  Running on the Server side (inside the container):
-
-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0
-  ```
-
-  Running following Python code on the Client side (inside or outside the container, The `paddle-serving-client` package needs to be installed):
-
-  ```bash
-  from paddle_serving_client import Client
-  
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
+The `-p` option is to map the `9292` port of the container to the `9292` port of the host.

+### Install PaddleServing

+The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.

+If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.

-## Attention
+## Precautious

 Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md).
--- a/doc/RUN_IN_DOCKER_CN.md
+++ b/doc/RUN_IN_DOCKER_CN.md
@@ -20,7 +20,6 @@ Docker（GPU版本需要在GPU机器上安装nvidia-docker）
 docker pull hub.baidubce.com/paddlepaddle/serving:latest
 ```

-
 ### 创建容器并进入

 ```bash
@@ -32,74 +31,11 @@ docker exec -it test bash

 ### 安装PaddleServing

-为了减小镜像的体积，镜像中没有安装Serving包，要执行下面命令进行安装。
-
-```bash
-pip install paddle-serving-server
-```
-
-您可能需要使用国内镜像源（例如清华源）来加速下载。
-
-```shell
-pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-### 测试example
-
-通过下面命令获取训练好的Boston房价预估模型：
-
-```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
-```
-
- 测试HTTP服务
-
-  在Server端（容器内）运行：
+镜像里自带对应镜像tag版本的`paddle_serving_server`，`paddle_serving_client`，`paddle_serving_app`，如果用户不需要更改版本，可以直接使用，适用于没有外网服务的环境。

-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
-  ```
+如果需要更换版本，请参照首页的指导，下载对应版本的pip包。

-  在Client端（容器内或容器外）运行：
-
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
-
- 测试RPC服务
-
-  在Server端（容器内）运行：
-
-  ```bash
-  python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
-  ```
-
-  在Client端（容器内或容器外，需要安装`paddle-serving-client`包）运行下面Python代码：
-
-  ```python
-  from paddle_serving_client import Client
-  
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
-
-## GPU版本
-
-GPU版本与CPU版本基本一致，只有部分接口命名的差别（GPU版本需要在GPU机器上安装nvidia-docker）。
-
-### 获取镜像
-
-参考[该文档](DOCKER_IMAGES_CN.md)获取镜像，这里以 `cuda9.0-cudnn7` 的镜像为例：
-
-```shell
-nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
-```
+## GPU 版本

 ### 创建容器并进入

@@ -107,74 +43,19 @@ nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
 nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
 nvidia-docker exec -it test bash
 ```
-
-`-p`选项是为了将容器的`9292`端口映射到宿主机的`9292`端口。
-
-### 安装PaddleServing
-
-为了减小镜像的体积，镜像中没有安装Serving包，要执行下面命令进行安装。
-
-```bash
-pip install paddle-serving-server-gpu
-```
-
-您可能需要使用国内镜像源（例如清华源）来加速下载。
-
-```shell
-pip install paddle-serving-server-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
-```
-
-### 测试example
-
-在运行GPU版Server时需要通过`--gpu_ids`选项设置预测服务使用的GPU，缺省状态默认使用CPU。当设置的`--gpu_ids`超出环境变量`CUDA_VISIBLE_DEVICES`时会报错。下面的示例为指定使用索引为0的GPU：
-```shell
-export CUDA_VISIBLE_DEVICES=0,1
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9292 --gpu_ids 0
-```
-
-
-通过下面命令获取训练好的Boston房价预估模型：
-
+或者
 ```bash
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
-tar -xzf uci_housing.tar.gz
+docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
+docker exec -it test bash
 ```

- 测试HTTP服务
-
-  在Server端（容器内）运行：
-
-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --name uci --gpu_ids 0
-  ```
-
-  在Client端（容器内或容器外）运行：
-
-  ```bash
-  curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
-  ```
-
- 测试RPC服务
-
-  在Server端（容器内）运行：
+`-p`选项是为了将容器的`9292`端口映射到宿主机的`9292`端口。

-  ```bash
-  python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0
-  ```
+### 安装PaddleServing

-  在Client端（容器内或容器外，需要安装`paddle-serving-client`包）运行下面Python代码：
+镜像里自带对应镜像tag版本的`paddle_serving_server_gpu`，`paddle_serving_client`，`paddle_serving_app`，如果用户不需要更改版本，可以直接使用，适用于没有外网服务的环境。

-  ```bash
-  from paddle_serving_client import Client
-  
-  client = Client()
-  client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
-  client.connect(["127.0.0.1:9292"])
-  data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-          -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
-  fetch_map = client.predict(feed={"x": data}, fetch=["price"])
-  print(fetch_map)
-  ```
+如果需要更换版本，请参照首页的指导，下载对应版本的pip包。

 ## 注意事项


--- a/doc/WINDOWS_TUTORIAL.md
+++ b/doc/WINDOWS_TUTORIAL.md
@@ -8,7 +8,7 @@ This document guides users how to build Paddle Serving service on the Windows pl

 ### Running Paddle Serving on Native Windows System

-**Configure Python environment variables to PATH**: First, you need to add the directory where the Python executable program is located to the PATH. Usually in **System Properties/My Computer Properties**-**Advanced**-**Environment Variables**, click Path and add the path at the beginning. For example, `C:\Users\$USER\AppData\Local\Programs\Python\Python36`, and finally click **OK** continuously. If you enter python on Powershell, you can enter the python interactive interface, indicating that the environment variable configuration is successful.
+**Configure Python environment variables to PATH**: **We only support Python 3.5+ on Native Windows System.**. First, you need to add the directory where the Python executable program is located to the PATH. Usually in **System Properties/My Computer Properties**-**Advanced**-**Environment Variables**, click Path and add the path at the beginning. For example, `C:\Users\$USER\AppData\Local\Programs\Python\Python36`, and finally click **OK** continuously. If you enter python on Powershell, you can enter the python interactive interface, indicating that the environment variable configuration is successful.

 **Install wget**: Because all the downloads in the tutorial and the built-in model download function in `paddle_serving_app` all use the wget tool, download the binary package at the [link](http://gnuwin32.sourceforge.net/packages/wget.htm), unzip and copy it to `C:\Windows\System32`, if there is a security prompt, you need to pass it.

@@ -32,6 +32,7 @@ python -m pip install -U paddle_serving_server_gpu paddle_serving_client paddle_

 ```
 git clone https://github.com/paddlepaddle/Serving
+pip install -r python/requirements_win.txt
 ```

 **Run OCR example**:
@@ -42,7 +43,7 @@ python -m paddle_serving_app.package --get_model ocr_rec
 tar -xzvf ocr_rec.tar.gz
 python -m paddle_serving_app.package --get_model ocr_det
 tar -xzvf ocr_det.tar.gz
-python ocr_debugger_server.py &
+python ocr_debugger_server.py cpu &
 python ocr_web_client.py
 ```


--- a/doc/WINDOWS_TUTORIAL_CN.md
+++ b/doc/WINDOWS_TUTORIAL_CN.md
@@ -8,7 +8,7 @@

 ### 原生Windows系统运行Paddle Serving

-**配置Python环境变量到PATH**：首先需要将Python的可执行程序所在目录加入到PATH当中。通常在**系统属性/我的电脑属性**-**高级**-**环境变量** ，点选Path，并在开头加上路径。例如`C:\Users\$USER\AppData\Local\Programs\Python\Python36`，最后连续点击**确定** 。在Powershell上如果输入python可以进入python交互界面，说明环境变量配置成功。
+**配置Python环境变量到PATH**：**目前原生Windows仅支持Python 3.5或更高版本**。首先需要将Python的可执行程序所在目录加入到PATH当中。通常在**系统属性/我的电脑属性**-**高级**-**环境变量** ，点选Path，并在开头加上路径。例如`C:\Users\$USER\AppData\Local\Programs\Python\Python36`，最后连续点击**确定** 。在Powershell上如果输入python可以进入python交互界面，说明环境变量配置成功。

 **安装wget工具**：由于教程当中所有的下载，以及`paddle_serving_app`当中内嵌的模型下载功能，都是用到wget工具，在链接[下载wget](http://gnuwin32.sourceforge.net/packages/wget.htm)，解压后复制到`C:\Windows\System32`下，如有安全提示需要通过。

@@ -32,6 +32,7 @@ python -m pip install -U paddle_serving_server_gpu paddle_serving_client paddle_

 ```
 git clone https://github.com/paddlepaddle/Serving
+pip install -r python/requirements_win.txt
 ```

 **运行OCR示例**：
@@ -42,7 +43,7 @@ python -m paddle_serving_app.package --get_model ocr_rec
 tar -xzvf ocr_rec.tar.gz
 python -m paddle_serving_app.package --get_model ocr_det
 tar -xzvf ocr_det.tar.gz
-python ocr_debugger_server.py &
+python ocr_debugger_server.py cpu &
 python ocr_web_client.py
 ```


--- a/java/README.md
+++ b/java/README.md
-## Java Demo
+## Tutorial of Java Client for Paddle Serving
+
+(English|[简体中文](./README_CN.md))
+
+### Development Environment
+
+In order to facilitate users to use java for development, we provide the compiled Serving project to be placed in the java mirror. The way to get the mirror and enter the development environment is
+
+```
+docker pull hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker exec -it java_serving bash
+cd Serving/java
+```
+
+The Serving folder is at the develop branch when the docker image is generated. You need to git pull to the latest version or git checkout to the desired branch.
+
+### Install client dependencies
+
+Due to the large number of dependent libraries, the image has been compiled once at the time of generation, and the user can perform the following operations

-### Install package
 ```
 mvn compile
 mvn install
@@ -9,18 +27,49 @@ mvn compile
 mvn install
 ```

-### Start Server
+### Start the server

-take the fit_a_line demo as example
+Take the fit_a_line model as an example, the server starts

 ```
- python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #CPU
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #GPU
+cd ../../python/examples/fit_a_line
+sh get_data.sh
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang &
 ```

-### Client Predict
+Client prediction
+
 ```
+cd ../../../java/examples/target
 java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample fit_a_line
 ```

-The Java example also contains the prediction client of Bert, Model_enaemble, asyn_predict, batch_predict, Cube_local, Cube_quant, and Yolov4 models.
+Take yolov4 as an example, the server starts
+
+```
+python -m paddle_serving_app.package --get_model yolov4
+tar -xzvf yolov4.tar.gz
+python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang & #It needs to be executed in GPU Docker, otherwise the execution method of CPU must be used.
+```
+
+Client prediction
+
+```
+# in /Serving/java/examples/target
+java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample yolov4 ../../../python/examples/yolov4/000000570688.jpg
+# The case of yolov4 needs to specify a picture as input
+```
+
+### Customization guidance
+
+The above example is running in CPU mode. If GPU mode is required, there are two options.
+
+The first is that GPU Serving and Java Client are in the same image. After starting the corresponding image, the user needs to move /Serving/java in the java image to the corresponding image.
+
+The second is to deploy GPU Serving and Java Client separately. If they are on the same host, you can learn the IP address of the corresponding container through ifconfig, and then when you connect to client.connect in `examples/src/main/java/PaddleServingClientExample.java` Make changes to the endpoint, and then compile it again. Or select `--net=host` to bind the network device of docker and host when docker starts, so that it can run directly without customizing java code.
+
+**It should be noted that in the example, all models need to use `--use_multilang` to start GRPC multi-programming language support, and the port number is 9393. If you need another port, you need to modify it in the java file**
+
+**Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/PIPELINE_SERVING.md) for details). The next version (0.4.1) of the Pipeline Serving Client for Java will be released. **
+
+
--- a/java/README_CN.md
+++ b/java/README_CN.md
-## Java 示例
+## 用于Paddle Serving的Java客户端
+
+([English](./README.md)|简体中文)
+
+### 开发环境
+
+为了方便用户使用java进行开发，我们提供了编译好的Serving工程放置在java镜像当中，获取镜像并进入开发环境的方式是
+
+```
+docker pull hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.0-java
+docker exec -it java_serving bash
+cd Serving/java
+```
+
+Serving文件夹是镜像生成时的develop分支工程目录，需要git pull 到最新版本，或者git checkout 到想要的分支。

 ### 安装客户端依赖
+
+由于依赖库数量庞大，因此镜像已经在生成时编译过一次，用户执行以下操作即可
+
 ```
 mvn compile
 mvn install
@@ -11,16 +29,47 @@ mvn install

 ### 启动服务端

-以fit_a_line模型为例
+以fit_a_line模型为例，服务端启动

 ```
- python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #CPU
-python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang #GPU
+cd ../../python/examples/fit_a_line
+sh get_data.sh
+python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang &
 ```

-### 客户端预测
+客户端预测
+
 ```
+cd ../../../java/examples/target
 java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample fit_a_line
 ```

-java示例中还包含了bert、model_enaemble、asyn_predict、batch_predict、cube_local、cube_quant、yolov4模型的预测客户端。
+以yolov4为例子，服务端启动
+
+```
+python -m paddle_serving_app.package --get_model yolov4
+tar -xzvf yolov4.tar.gz
+python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang &  #需要在GPU Docker当中执行，否则要使用CPU的执行方式。
+```
+
+客户端预测
+
+```
+# in /Serving/java/examples/target
+java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample yolov4 ../../../python/examples/yolov4/000000570688.jpg
+# yolov4的案例需要指定一个图片作为输入
+```
+
+### 二次开发指导
+
+上述示例是在CPU模式下运行，如果需要GPU模式，可以有两种选择。
+
+第一种是GPU Serving和Java Client在同一个镜像，需要用户在启动对应的镜像后，把java镜像当中的/Serving/java移动到对应的镜像中。
+
+第二种是GPU Serving和Java Client分开部署，如果在同一台宿主机，可以通过ifconfig了解对应容器的IP地址，然后在`examples/src/main/java/PaddleServingClientExample.java`当中对client.connect时的endpoint做修改，然后再编译一次。 或者在docker启动时选择 `--net=host`来绑定docker和宿主机的网络设备，这样不需要定制java代码可以直接运行。
+
+**需要注意的是，在示例中，所有模型都需要使用`--use_multilang`来启动GRPC多编程语言支持，以及端口号都是9393，如果需要别的端口，需要在java文件里修改**
+
+**目前Serving已推出Pipeline模式（详见[Pipeline Serving](../doc/PIPELINE_SERVING_CN.md)），下个版本（0.4.1）面向Java的Pipeline Serving Client将会发布，敬请期待。**
+
+
--- a/paddle_inference/inferencer-fluid-cpu/include/fluid_cpu_engine.h
+++ b/paddle_inference/inferencer-fluid-cpu/include/fluid_cpu_engine.h
@@ -13,6 +13,7 @@
 // limitations under the License.

 #pragma once
+
 #include <pthread.h>
 #include <fstream>
 #include <map>
@@ -28,6 +29,7 @@ namespace paddle_serving {
 namespace fluid_cpu {

 using configure::SigmoidConf;
+
 class AutoLock {
 public:
  explicit AutoLock(pthread_mutex_t& mutex) : _mut(mutex) {
@@ -528,60 +530,7 @@ class FluidCpuAnalysisDirWithSigmoidCore : public FluidCpuWithSigmoidCore {
    return 0;
  }
 };
-class FluidCpuAnalysisEncryptCore : public FluidFamilyCore {
- public:
-  void ReadBinaryFile(const std::string& filename, std::string* contents) {
-    std::ifstream fin(filename, std::ios::in | std::ios::binary);
-    fin.seekg(0, std::ios::end);
-    contents->clear();
-    contents->resize(fin.tellg());
-    fin.seekg(0, std::ios::beg);
-    fin.read(&(contents->at(0)), contents->size());
-    fin.close();
-  }
-
-  int create(const predictor::InferEngineCreationParams& params) {
-    std::string data_path = params.get_path();
-    if (access(data_path.c_str(), F_OK) == -1) {
-      LOG(ERROR) << "create paddle predictor failed, path note exits: "
-                 << data_path;
-      return -1;
-    }
-
-    std::string model_buffer, params_buffer, key_buffer;
-    ReadBinaryFile(data_path + "encrypt_model", &model_buffer);
-    ReadBinaryFile(data_path + "encrypt_params", &params_buffer);
-    ReadBinaryFile(data_path + "key", &key_buffer);

-    VLOG(2) << "prepare for encryption model";
-
-    auto cipher = paddle::MakeCipher("");
-    std::string real_model_buffer = cipher->Decrypt(model_buffer, key_buffer);
-    std::string real_params_buffer = cipher->Decrypt(params_buffer, key_buffer);
-
-    paddle::AnalysisConfig analysis_config;
-    analysis_config.SetModelBuffer(&real_model_buffer[0],
-                                   real_model_buffer.size(),
-                                   &real_params_buffer[0],
-                                   real_params_buffer.size());
-    analysis_config.DisableGpu();
-    analysis_config.SetCpuMathLibraryNumThreads(1);
-    if (params.enable_memory_optimization()) {
-      analysis_config.EnableMemoryOptim();
-    }
-    analysis_config.SwitchSpecifyInputNames(true);
-    AutoLock lock(GlobalPaddleCreateMutex::instance());
-    VLOG(2) << "decrypt model file sucess";
-    _core =
-        paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(analysis_config);
-    if (NULL == _core.get()) {
-      LOG(ERROR) << "create paddle predictor failed, path: " << data_path;
-      return -1;
-    }
-    VLOG(2) << "create paddle predictor sucess, path: " << data_path;
-    return 0;
-  }
-};
 }  // namespace fluid_cpu
 }  // namespace paddle_serving
 }  // namespace baidu
--- a/paddle_inference/inferencer-fluid-cpu/src/fluid_cpu_engine.cpp
+++ b/paddle_inference/inferencer-fluid-cpu/src/fluid_cpu_engine.cpp
@@ -52,13 +52,6 @@ REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
    ::baidu::paddle_serving::predictor::InferEngine,
    "FLUID_CPU_NATIVE_DIR_SIGMOID");

-#if 1
-REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
-    ::baidu::paddle_serving::predictor::FluidInferEngine<
-        FluidCpuAnalysisEncryptCore>,
-    ::baidu::paddle_serving::predictor::InferEngine,
-    "FLUID_CPU_ANALYSIS_ENCRYPT");
-#endif
 }  // namespace fluid_cpu
 }  // namespace paddle_serving
 }  // namespace baidu
--- a/paddle_inference/inferencer-fluid-gpu/include/fluid_gpu_engine.h
+++ b/paddle_inference/inferencer-fluid-gpu/include/fluid_gpu_engine.h
@@ -25,6 +25,7 @@
 #include "core/configure/inferencer_configure.pb.h"
 #include "core/predictor/framework/infer.h"
 #include "paddle_inference_api.h"  // NOLINT
+
 DECLARE_int32(gpuid);

 namespace baidu {
@@ -590,60 +591,6 @@ class FluidGpuAnalysisDirWithSigmoidCore : public FluidGpuWithSigmoidCore {
  }
 };

-class FluidGpuAnalysisEncryptCore : public FluidFamilyCore {
- public:
-  void ReadBinaryFile(const std::string& filename, std::string* contents) {
-    std::ifstream fin(filename, std::ios::in | std::ios::binary);
-    fin.seekg(0, std::ios::end);
-    contents->clear();
-    contents->resize(fin.tellg());
-    fin.seekg(0, std::ios::beg);
-    fin.read(&(contents->at(0)), contents->size());
-    fin.close();
-  }
-
-  int create(const predictor::InferEngineCreationParams& params) {
-    std::string data_path = params.get_path();
-    if (access(data_path.c_str(), F_OK) == -1) {
-      LOG(ERROR) << "create paddle predictor failed, path note exits: "
-                 << data_path;
-      return -1;
-    }
-
-    std::string model_buffer, params_buffer, key_buffer;
-    ReadBinaryFile(data_path + "encrypt_model", &model_buffer);
-    ReadBinaryFile(data_path + "encrypt_params", &params_buffer);
-    ReadBinaryFile(data_path + "key", &key_buffer);
-
-    VLOG(2) << "prepare for encryption model";
-
-    auto cipher = paddle::MakeCipher("");
-    std::string real_model_buffer = cipher->Decrypt(model_buffer, key_buffer);
-    std::string real_params_buffer = cipher->Decrypt(params_buffer, key_buffer);
-
-    paddle::AnalysisConfig analysis_config;
-    analysis_config.SetModelBuffer(&real_model_buffer[0],
-                                   real_model_buffer.size(),
-                                   &real_params_buffer[0],
-                                   real_params_buffer.size());
-    analysis_config.EnableUseGpu(100, FLAGS_gpuid);
-    analysis_config.SetCpuMathLibraryNumThreads(1);
-    if (params.enable_memory_optimization()) {
-      analysis_config.EnableMemoryOptim();
-    }
-    analysis_config.SwitchSpecifyInputNames(true);
-    AutoLock lock(GlobalPaddleCreateMutex::instance());
-    VLOG(2) << "decrypt model file sucess";
-    _core =
-        paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(analysis_config);
-    if (NULL == _core.get()) {
-      LOG(ERROR) << "create paddle predictor failed, path: " << data_path;
-      return -1;
-    }
-    VLOG(2) << "create paddle predictor sucess, path: " << data_path;
-    return 0;
-  }
-};
 }  // namespace fluid_gpu
 }  // namespace paddle_serving
 }  // namespace baidu
--- a/paddle_inference/inferencer-fluid-gpu/src/fluid_gpu_engine.cpp
+++ b/paddle_inference/inferencer-fluid-gpu/src/fluid_gpu_engine.cpp
@@ -54,12 +54,6 @@ REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
    ::baidu::paddle_serving::predictor::InferEngine,
    "FLUID_GPU_NATIVE_DIR_SIGMOID");

-REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
-    ::baidu::paddle_serving::predictor::FluidInferEngine<
-        FluidGpuAnalysisEncryptCore>,
-    ::baidu::paddle_serving::predictor::InferEngine,
-    "FLUID_GPU_ANALYSIS_ENCRPT")
-
 }  // namespace fluid_gpu
 }  // namespace paddle_serving
 }  // namespace baidu
--- a/python/examples/criteo_ctr_with_cube/README.md
+++ b/python/examples/criteo_ctr_with_cube/README.md
-## Criteo CTR with Sparse Parameter Indexing Service
-
-([简体中文](./README_CN.md)|English)
-
-### Get Sample Dataset
-
-go to directory `python/examples/criteo_ctr_with_cube`
-```
-sh get_data.sh
-```
-
-### Download Model and Sparse Parameter Sequence Files
-```
-wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz
-tar xf ctr_cube_unittest.tar.gz
-mv models/ctr_client_conf ./
-mv models/ctr_serving_model_kv ./
-mv models/data ./cube/
-```
-the model will be in ./ctr_server_model_kv and ./ctr_client_config.
-
-### Start Sparse Parameter Indexing Service
-```
-wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz
-tar xf cube_app.tar.gz
-mv cube_app/cube* ./cube/
-sh cube_prepare.sh &
-```
-
-Here, the sparse parameter is loaded by cube sparse parameter indexing service Cube.
-
-### Start RPC Predictor, the number of serving thread is 4（configurable in test_server.py）
-
-```
-python test_server.py ctr_serving_model_kv 
-```
-
-### Run Prediction
-
-```
-python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
-```
-
-### Benchmark
-
-CPU ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 
-
-Model ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/criteo_ctr_with_cube/network_conf.py)
-
-server core/thread num ： 4/8
-
-Run
-```
-bash benchmark.sh
-```
-1000 batches will be sent by every client
-
-| client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | avg_latency | qps   |
-| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | ----- |
-| 1                  | 0.035  | 1.596        | 0.021  | 0.518 | 0.0024 | 0.0025  | 6.774 | 147.7 |
-| 2                  | 0.034  | 1.780        | 0.027  | 0.463 | 0.0020 | 0.0023  | 6.931 | 288.3 |
-| 4                  | 0.038  | 2.954        | 0.025  | 0.455 | 0.0019 | 0.0027  | 8.378 | 477.5 |
-| 8                  | 0.044  | 8.230        | 0.028  | 0.464 | 0.0023 | 0.0034  | 14.191 | 563.8 |
-| 16                 | 0.048  | 21.037       | 0.028  | 0.455 | 0.0025 | 0.0041  | 27.236 | 587.5 |
-
-the average latency of threads
-
-![avg cost](../../../doc/criteo-cube-benchmark-avgcost.png)
-
-The QPS is 
-
-![qps](../../../doc/criteo-cube-benchmark-qps.png)
--- a/python/examples/criteo_ctr_with_cube/README_CN.md
+++ b/python/examples/criteo_ctr_with_cube/README_CN.md
-## 带稀疏参数索引服务的CTR预测服务
-(简体中文|[English](./README.md))
-
-### 获取样例数据
-进入目录 `python/examples/criteo_ctr_with_cube`
-```
-sh get_data.sh
-```
-
-### 下载模型和稀疏参数序列文件
-```
-wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz
-tar xf ctr_cube_unittest.tar.gz
-mv models/ctr_client_conf ./
-mv models/ctr_serving_model_kv ./
-mv models/data ./cube/
-```
-执行脚本后会在当前目录有ctr_server_model_kv和ctr_client_config文件夹。
-
-### 启动稀疏参数索引服务
-```
-wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz
-tar xf cube_app.tar.gz
-mv cube_app/cube* ./cube/
-sh cube_prepare.sh &
-```
-
-此处，模型当中的稀疏参数会被存放在稀疏参数索引服务Cube当中。
-
-### 启动RPC预测服务，服务端线程数为4（可在test_server.py配置）
-
-```
-python test_server.py ctr_serving_model_kv 
-```
-
-### 执行预测
-
-```
-python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
-```
-
-### Benchmark
-
-设备 ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 
-
-模型 ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/criteo_ctr_with_cube/network_conf.py)
-
-server core/thread num ： 4/8
-
-执行
-```
-bash benchmark.sh
-```
-客户端每个线程会发送1000个batch
-
-| client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | avg_latency | qps   |
-| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | ----- |
-| 1                  | 0.035  | 1.596        | 0.021  | 0.518 | 0.0024 | 0.0025  | 6.774 | 147.7 |
-| 2                  | 0.034  | 1.780        | 0.027  | 0.463 | 0.0020 | 0.0023  | 6.931 | 288.3 |
-| 4                  | 0.038  | 2.954        | 0.025  | 0.455 | 0.0019 | 0.0027  | 8.378 | 477.5 |
-| 8                  | 0.044  | 8.230        | 0.028  | 0.464 | 0.0023 | 0.0034  | 14.191 | 563.8 |
-| 16                 | 0.048  | 21.037       | 0.028  | 0.455 | 0.0025 | 0.0041  | 27.236 | 587.5 |
-
-平均每个线程耗时图如下
-
-![avg cost](../../../doc/criteo-cube-benchmark-avgcost.png)
-
-每个线程QPS耗时如下
-
-![qps](../../../doc/criteo-cube-benchmark-qps.png)
--- a/python/examples/criteo_ctr_with_cube/args.py
+++ b/python/examples/criteo_ctr_with_cube/args.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-import argparse
-
-
-def parse_args():
-    parser = argparse.ArgumentParser(description="PaddlePaddle CTR example")
-    parser.add_argument(
-        '--train_data_path',
-        type=str,
-        default='./data/raw/train.txt',
-        help="The path of training dataset")
-    parser.add_argument(
-        '--sparse_only',
-        type=bool,
-        default=False,
-        help="Whether we use sparse features only")
-    parser.add_argument(
-        '--test_data_path',
-        type=str,
-        default='./data/raw/valid.txt',
-        help="The path of testing dataset")
-    parser.add_argument(
-        '--batch_size',
-        type=int,
-        default=1000,
-        help="The size of mini-batch (default:1000)")
-    parser.add_argument(
-        '--embedding_size',
-        type=int,
-        default=10,
-        help="The size for embedding layer (default:10)")
-    parser.add_argument(
-        '--num_passes',
-        type=int,
-        default=10,
-        help="The number of passes to train (default: 10)")
-    parser.add_argument(
-        '--model_output_dir',
-        type=str,
-        default='models',
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--sparse_feature_dim',
-        type=int,
-        default=1000001,
-        help='sparse feature hashing space for index processing')
-    parser.add_argument(
-        '--is_local',
-        type=int,
-        default=1,
-        help='Local train or distributed train (default: 1)')
-    parser.add_argument(
-        '--cloud_train',
-        type=int,
-        default=0,
-        help='Local train or distributed train on paddlecloud (default: 0)')
-    parser.add_argument(
-        '--async_mode',
-        action='store_true',
-        default=False,
-        help='Whether start pserver in async mode to support ASGD')
-    parser.add_argument(
-        '--no_split_var',
-        action='store_true',
-        default=False,
-        help='Whether split variables into blocks when update_method is pserver')
-    parser.add_argument(
-        '--role',
-        type=str,
-        default='pserver',  # trainer or pserver
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--endpoints',
-        type=str,
-        default='127.0.0.1:6000',
-        help='The pserver endpoints, like: 127.0.0.1:6000,127.0.0.1:6001')
-    parser.add_argument(
-        '--current_endpoint',
-        type=str,
-        default='127.0.0.1:6000',
-        help='The path for model to store (default: 127.0.0.1:6000)')
-    parser.add_argument(
-        '--trainer_id',
-        type=int,
-        default=0,
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--trainers',
-        type=int,
-        default=1,
-        help='The num of trianers, (default: 1)')
-    return parser.parse_args()
--- a/python/examples/criteo_ctr_with_cube/benchmark.py
+++ b/python/examples/criteo_ctr_with_cube/benchmark.py
-# -*- coding: utf-8 -*-
-#
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from paddle_serving_client import Client
-import sys
-import os
-import criteo as criteo
-import time
-from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
-from paddle_serving_client.metric import auc
-
-py_version = sys.version_info[0]
-args = benchmark_args()
-
-
-def single_func(idx, resource):
-    client = Client()
-    print([resource["endpoint"][idx % len(resource["endpoint"])]])
-    client.load_client_config('ctr_client_conf/serving_client_conf.prototxt')
-    client.connect(['127.0.0.1:9292'])
-    batch = 1
-    buf_size = 100
-    dataset = criteo.CriteoDataset()
-    dataset.setup(1000001)
-    test_filelists = [
-        "./raw_data/part-%d" % x for x in range(len(os.listdir("./raw_data")))
-    ]
-    reader = dataset.infer_reader(test_filelists[len(test_filelists) - 40:],
-                                  batch, buf_size)
-    if args.request == "rpc":
-        fetch = ["prob"]
-        start = time.time()
-        itr = 1000
-        for ei in range(itr):
-            if args.batch_size > 0:
-                feed_batch = []
-                for bi in range(args.batch_size):
-                    if py_version == 2:
-                        data = reader().next()
-                    else:
-                        data = reader().__next__()
-                    feed_dict = {}
-                    feed_dict['dense_input'] = data[0][0]
-                    for i in range(1, 27):
-                        feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][
-                            i]
-                    feed_batch.append(feed_dict)
-                result = client.predict(feed=feed_batch, fetch=fetch)
-            else:
-                print("unsupport batch size {}".format(args.batch_size))
-
-    elif args.request == "http":
-        raise ("Not support http service.")
-    end = time.time()
-    qps = itr * args.batch_size / (end - start)
-    return [[end - start, qps]]
-
-
-if __name__ == '__main__':
-    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9292"]
-    #result = single_func(0, {"endpoint": endpoint_list})
-    start = time.time()
-    result = multi_thread_runner.run(single_func, args.thread,
-                                     {"endpoint": endpoint_list})
-    end = time.time()
-    total_cost = end - start
-    avg_cost = 0
-    qps = 0
-    for i in range(args.thread):
-        avg_cost += result[0][i * 2 + 0]
-        qps += result[0][i * 2 + 1]
-    avg_cost = avg_cost / args.thread
-    print("total cost: {}".format(total_cost))
-    print("average total cost {} s.".format(avg_cost))
-    print("qps {} ins/s".format(qps))
--- a/python/examples/criteo_ctr_with_cube/benchmark.sh
+++ b/python/examples/criteo_ctr_with_cube/benchmark.sh
-rm profile_log
-export FLAGS_profile_client=1
-export FLAGS_profile_server=1
-
-wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz --no-check-certificate
-tar xf ctr_cube_unittest.tar.gz
-mv models/ctr_client_conf ./
-mv models/ctr_serving_model_kv ./
-mv models/data ./cube/
-
-wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz --no-check-certificate
-tar xf cube_app.tar.gz
-mv cube_app/cube* ./cube/
-sh cube_prepare.sh &
-
-python test_server.py ctr_serving_model_kv > serving_log 2>&1 &
-
-for thread_num in 1 4 16
-do
-for batch_size in 1 4 16 64
-do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "batch size : $batch_size"
-    echo "thread num : $thread_num"
-    echo "========================================"
-    echo "batch size : $batch_size" >> profile_log
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 3 profile >> profile_log
-done
-done
-
-ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/criteo_ctr_with_cube/benchmark_cube.sh
+++ b/python/examples/criteo_ctr_with_cube/benchmark_cube.sh
-rm profile_log
-
-#wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz --no-check-certificate
-#tar xf ctr_cube_unittest.tar.gz
-mv models/ctr_client_conf ./
-mv models/ctr_serving_model_kv ./
-mv models/data ./cube/
-
-#wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz --no-check-certificate
-#tar xf cube_app.tar.gz
-mv cube_app/cube* ./cube/
-sh cube_prepare.sh &
-
-cp ../../../build_server/core/cube/cube-api/cube-cli .
-python gen_key.py
-
-for thread_num in 1 4 16 32
-do
-for batch_size in 1000
-do
-    ./cube-cli -config_file ./cube/conf/cube.conf -keys key -dict test_dict -thread_num $thread_num --batch $batch_size > profile 2>&1
-    echo "batch size : $batch_size"
-    echo "thread num : $thread_num"
-    echo "========================================"
-    echo "batch size : $batch_size" >> profile_log
-    echo "thread num : $thread_num" >> profile_log
-    tail -n 8 profile >> profile_log
-
-done
-done
-
-ps -ef|grep 'cube'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/criteo_ctr_with_cube/clean.sh
+++ b/python/examples/criteo_ctr_with_cube/clean.sh
-ps -ef | grep cube | awk {'print $2'} | xargs kill -9
-rm -rf cube/cube_data cube/data cube/log* cube/nohup* cube/output/ cube/donefile cube/input cube/monitor cube/cube-builder.INFO
-ps -ef | grep test | awk {'print $2'} | xargs kill -9
-ps -ef | grep serving | awk {'print $2'} | xargs kill -9
--- a/python/examples/criteo_ctr_with_cube/criteo.py
+++ b/python/examples/criteo_ctr_with_cube/criteo.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import sys
-
-
-class CriteoDataset(object):
-    def setup(self, sparse_feature_dim):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [
-            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.cont_diff_ = [
-            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.hash_dim_ = sparse_feature_dim
-        # here, training data are lines with line_index < train_idx_
-        self.train_idx_ = 41256555
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-
-    def _process_line(self, line):
-        features = line.rstrip('\n').split('\t')
-        dense_feature = []
-        sparse_feature = []
-        for idx in self.continuous_range_:
-            if features[idx] == '':
-                dense_feature.append(0.0)
-            else:
-                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
-                                     self.cont_diff_[idx - 1])
-        for idx in self.categorical_range_:
-            sparse_feature.append(
-                [hash(str(idx) + features[idx]) % self.hash_dim_])
-
-        return dense_feature, sparse_feature, [int(features[0])]
-
-    def infer_reader(self, filelist, batch, buf_size):
-        def local_iter():
-            for fname in filelist:
-                with open(fname.strip(), "r") as fin:
-                    for line in fin:
-                        dense_feature, sparse_feature, label = self._process_line(
-                            line)
-                        #yield dense_feature, sparse_feature, label
-                        yield [dense_feature] + sparse_feature + [label]
-
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-
-    def generate_sample(self, line):
-        def data_iter():
-            dense_feature, sparse_feature, label = self._process_line(line)
-            feature_name = ["dense_input"]
-            for idx in self.categorical_range_:
-                feature_name.append("C" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-
-        return data_iter
-
-
-if __name__ == "__main__":
-    criteo_dataset = CriteoDataset()
-    criteo_dataset.setup(int(sys.argv[1]))
-    criteo_dataset.run_from_stdin()
--- a/python/examples/criteo_ctr_with_cube/criteo_reader.py
+++ b/python/examples/criteo_ctr_with_cube/criteo_reader.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import sys
-import paddle.fluid.incubate.data_generator as dg
-
-
-class CriteoDataset(dg.MultiSlotDataGenerator):
-    def setup(self, sparse_feature_dim):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [
-            20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.cont_diff_ = [
-            20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
-        ]
-        self.hash_dim_ = sparse_feature_dim
-        # here, training data are lines with line_index < train_idx_
-        self.train_idx_ = 41256555
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-
-    def _process_line(self, line):
-        features = line.rstrip('\n').split('\t')
-        dense_feature = []
-        sparse_feature = []
-        for idx in self.continuous_range_:
-            if features[idx] == '':
-                dense_feature.append(0.0)
-            else:
-                dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
-                                     self.cont_diff_[idx - 1])
-        for idx in self.categorical_range_:
-            sparse_feature.append(
-                [hash(str(idx) + features[idx]) % self.hash_dim_])
-
-        return dense_feature, sparse_feature, [int(features[0])]
-
-    def infer_reader(self, filelist, batch, buf_size):
-        def local_iter():
-            for fname in filelist:
-                with open(fname.strip(), "r") as fin:
-                    for line in fin:
-                        dense_feature, sparse_feature, label = self._process_line(
-                            line)
-                        #yield dense_feature, sparse_feature, label
-                        yield [dense_feature] + sparse_feature + [label]
-
-        import paddle
-        batch_iter = paddle.batch(
-            paddle.reader.shuffle(
-                local_iter, buf_size=buf_size),
-            batch_size=batch)
-        return batch_iter
-
-    def generate_sample(self, line):
-        def data_iter():
-            dense_feature, sparse_feature, label = self._process_line(line)
-            feature_name = ["dense_input"]
-            for idx in self.categorical_range_:
-                feature_name.append("C" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-
-        return data_iter
-
-
-if __name__ == "__main__":
-    criteo_dataset = CriteoDataset()
-    criteo_dataset.setup(int(sys.argv[1]))
-    criteo_dataset.run_from_stdin()
--- a/python/examples/criteo_ctr_with_cube/cube/conf/cube.conf
+++ b/python/examples/criteo_ctr_with_cube/cube/conf/cube.conf
-[{
-    "dict_name": "test_dict",
-    "shard": 1,
-    "dup": 1,
-    "timeout": 200,
-    "retry": 3,
-    "backup_request": 100,
-    "type": "ipport_list",
-    "load_balancer": "rr",
-    "nodes": [{
-        "ipport_list": "list://127.0.0.1:8027"
-    }]
-}]
--- a/python/examples/criteo_ctr_with_cube/cube/conf/gflags.conf
+++ b/python/examples/criteo_ctr_with_cube/cube/conf/gflags.conf
--port=8027
--dict_split=1
--in_mem=true
--log_dir=./log/
--- a/python/examples/criteo_ctr_with_cube/cube/keys
+++ b/python/examples/criteo_ctr_with_cube/cube/keys
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
--- a/python/examples/criteo_ctr_with_cube/cube_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_prepare.sh
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-#! /bin/bash
-
-mkdir -p cube_model
-mkdir -p cube/data
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
-cd cube && ./cube
--- a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-#! /bin/bash
-
-mkdir -p cube_model
-mkdir -p cube/data
-./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
-mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
-cd cube && ./cube 
--- a/python/examples/criteo_ctr_with_cube/gen_key.py
+++ b/python/examples/criteo_ctr_with_cube/gen_key.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import sys
-import random
-
-with open("key", "w") as f:
-    for i in range(1000000):
-        f.write("{}\n".format(random.randint(0, 999999)))
--- a/python/examples/criteo_ctr_with_cube/get_data.sh
+++ b/python/examples/criteo_ctr_with_cube/get_data.sh
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/data/ctr_prediction/ctr_data.tar.gz
-tar -zxvf ctr_data.tar.gz
--- a/python/examples/criteo_ctr_with_cube/local_train.py
+++ b/python/examples/criteo_ctr_with_cube/local_train.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from __future__ import print_function
-
-from args import parse_args
-import os
-import paddle.fluid as fluid
-import sys
-from network_conf import dnn_model
-
-dense_feature_dim = 13
-
-
-def train():
-    args = parse_args()
-    sparse_only = args.sparse_only
-    if not os.path.isdir(args.model_output_dir):
-        os.mkdir(args.model_output_dir)
-    dense_input = fluid.layers.data(
-        name="dense_input", shape=[dense_feature_dim], dtype='float32')
-    sparse_input_ids = [
-        fluid.layers.data(
-            name="C" + str(i), shape=[1], lod_level=1, dtype="int64")
-        for i in range(1, 27)
-    ]
-    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-
-    #nn_input = None if sparse_only else dense_input
-    nn_input = dense_input
-    predict_y, loss, auc_var, batch_auc_var, infer_vars = dnn_model(
-        nn_input, sparse_input_ids, label, args.embedding_size,
-        args.sparse_feature_dim)
-
-    optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
-    optimizer.minimize(loss)
-
-    exe = fluid.Executor(fluid.CPUPlace())
-    exe.run(fluid.default_startup_program())
-    dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
-    dataset.set_use_var([dense_input] + sparse_input_ids + [label])
-
-    python_executable = "python"
-    pipe_command = "{} criteo_reader.py {}".format(python_executable,
-                                                   args.sparse_feature_dim)
-
-    dataset.set_pipe_command(pipe_command)
-    dataset.set_batch_size(128)
-    thread_num = 10
-    dataset.set_thread(thread_num)
-
-    whole_filelist = [
-        "raw_data/part-%d" % x for x in range(len(os.listdir("raw_data")))
-    ]
-
-    print(whole_filelist)
-    dataset.set_filelist(whole_filelist[:100])
-    dataset.load_into_memory()
-    fluid.layers.Print(auc_var)
-    epochs = 1
-    for i in range(epochs):
-        exe.train_from_dataset(
-            program=fluid.default_main_program(), dataset=dataset, debug=True)
-        print("epoch {} finished".format(i))
-
-    import paddle_serving_client.io as server_io
-    feed_var_dict = {}
-    feed_var_dict['dense_input'] = dense_input
-    for i, sparse in enumerate(sparse_input_ids):
-        feed_var_dict["embedding_{}.tmp_0".format(i)] = sparse
-    fetch_var_dict = {"prob": predict_y}
-
-    feed_kv_dict = {}
-    feed_kv_dict['dense_input'] = dense_input
-    for i, emb in enumerate(infer_vars):
-        feed_kv_dict["embedding_{}.tmp_0".format(i)] = emb
-    fetch_var_dict = {"prob": predict_y}
-
-    server_io.save_model("ctr_serving_model", "ctr_client_conf", feed_var_dict,
-                         fetch_var_dict, fluid.default_main_program())
-
-    server_io.save_model("ctr_serving_model_kv", "ctr_client_conf_kv",
-                         feed_kv_dict, fetch_var_dict,
-                         fluid.default_main_program())
-
-
-if __name__ == '__main__':
-    train()
--- a/python/examples/criteo_ctr_with_cube/network_conf.py
+++ b/python/examples/criteo_ctr_with_cube/network_conf.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import paddle.fluid as fluid
-import math
-
-
-def dnn_model(dense_input, sparse_inputs, label, embedding_size,
-              sparse_feature_dim):
-    def embedding_layer(input):
-        emb = fluid.layers.embedding(
-            input=input,
-            is_sparse=True,
-            is_distributed=False,
-            size=[sparse_feature_dim, embedding_size],
-            param_attr=fluid.ParamAttr(
-                name="SparseFeatFactors",
-                initializer=fluid.initializer.Uniform()))
-        x = fluid.layers.sequence_pool(input=emb, pool_type='sum')
-        return emb, x
-
-    def mlp_input_tensor(emb_sums, dense_tensor):
-        #if isinstance(dense_tensor, fluid.Variable):
-        #    return fluid.layers.concat(emb_sums, axis=1)
-        #else:
-        return fluid.layers.concat(emb_sums + [dense_tensor], axis=1)
-
-    def mlp(mlp_input):
-        fc1 = fluid.layers.fc(input=mlp_input,
-                              size=400,
-                              act='relu',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(mlp_input.shape[1]))))
-        fc2 = fluid.layers.fc(input=fc1,
-                              size=400,
-                              act='relu',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc1.shape[1]))))
-        fc3 = fluid.layers.fc(input=fc2,
-                              size=400,
-                              act='relu',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc2.shape[1]))))
-        pre = fluid.layers.fc(input=fc3,
-                              size=2,
-                              act='softmax',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc3.shape[1]))))
-        return pre
-
-    emb_pair_sums = list(map(embedding_layer, sparse_inputs))
-    emb_sums = [x[1] for x in emb_pair_sums]
-    infer_vars = [x[0] for x in emb_pair_sums]
-    mlp_in = mlp_input_tensor(emb_sums, dense_input)
-    predict = mlp(mlp_in)
-    cost = fluid.layers.cross_entropy(input=predict, label=label)
-    avg_cost = fluid.layers.reduce_sum(cost)
-    accuracy = fluid.layers.accuracy(input=predict, label=label)
-    auc_var, batch_auc_var, auc_states = \
-        fluid.layers.auc(input=predict, label=label, num_thresholds=2 ** 12, slide_steps=20)
-    return predict, avg_cost, auc_var, batch_auc_var, infer_vars
--- a/python/examples/criteo_ctr_with_cube/test_client.py
+++ b/python/examples/criteo_ctr_with_cube/test_client.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from paddle_serving_client import Client
-import sys
-import os
-import criteo as criteo
-import time
-from paddle_serving_client.metric import auc
-import numpy as np
-
-py_version = sys.version_info[0]
-
-client = Client()
-client.load_client_config(sys.argv[1])
-client.connect(["127.0.0.1:9292"])
-
-batch = 1
-buf_size = 100
-dataset = criteo.CriteoDataset()
-dataset.setup(1000001)
-test_filelists = ["{}/part-0".format(sys.argv[2])]
-reader = dataset.infer_reader(test_filelists, batch, buf_size)
-label_list = []
-prob_list = []
-start = time.time()
-for ei in range(10000):
-    if py_version == 2:
-        data = reader().next()
-    else:
-        data = reader().__next__()
-    feed_dict = {}
-    feed_dict['dense_input'] = np.array(data[0][0]).astype("float32").reshape(
-        1, 13)
-    feed_dict['dense_input.lod'] = [0, 1]
-    for i in range(1, 27):
-        tmp_data = np.array(data[0][i]).astype(np.int64)
-        feed_dict["embedding_{}.tmp_0".format(i - 1)] = tmp_data.reshape(
-            (1, len(data[0][i])))
-        feed_dict["embedding_{}.tmp_0.lod".format(i - 1)] = [0, 1]
-    fetch_map = client.predict(feed=feed_dict, fetch=["prob"], batch=True)
-    prob_list.append(fetch_map['prob'][0][1])
-    label_list.append(data[0][-1][0])
-
-print(auc(label_list, prob_list))
-end = time.time()
-print(end - start)
--- a/python/examples/criteo_ctr_with_cube/test_server.py
+++ b/python/examples/criteo_ctr_with_cube/test_server.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import os
-import sys
-from paddle_serving_server import OpMaker
-from paddle_serving_server import OpSeqMaker
-from paddle_serving_server import Server
-
-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_dist_kv_infer_op = op_maker.create('general_dist_kv_infer')
-response_op = op_maker.create('general_response')
-
-op_seq_maker = OpSeqMaker()
-op_seq_maker.add_op(read_op)
-op_seq_maker.add_op(general_dist_kv_infer_op)
-op_seq_maker.add_op(response_op)
-
-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
-server.set_num_threads(4)
-server.load_model_config(sys.argv[1])
-server.prepare_server(
-    workdir="work_dir1",
-    port=9292,
-    device="cpu",
-    cube_conf="./cube/conf/cube.conf")
-server.run_server()
--- a/python/examples/criteo_ctr_with_cube/test_server_gpu.py
+++ b/python/examples/criteo_ctr_with_cube/test_server_gpu.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import os
-import sys
-from paddle_serving_server_gpu import OpMaker
-from paddle_serving_server_gpu import OpSeqMaker
-from paddle_serving_server_gpu import Server
-
-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_dist_kv_infer_op = op_maker.create('general_dist_kv_infer')
-response_op = op_maker.create('general_response')
-
-op_seq_maker = OpSeqMaker()
-op_seq_maker.add_op(read_op)
-op_seq_maker.add_op(general_dist_kv_infer_op)
-op_seq_maker.add_op(response_op)
-
-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
-server.set_num_threads(4)
-server.load_model_config(sys.argv[1])
-server.prepare_server(
-    workdir="work_dir1",
-    port=9292,
-    device="cpu",
-    cube_conf="./cube/conf/cube.conf")
-server.run_server()
--- a/python/examples/criteo_ctr_with_cube/test_server_quant.py
+++ b/python/examples/criteo_ctr_with_cube/test_server_quant.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-import os
-import sys
-from paddle_serving_server import OpMaker
-from paddle_serving_server import OpSeqMaker
-from paddle_serving_server import Server
-
-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_dist_kv_infer_op = op_maker.create('general_dist_kv_quant_infer')
-response_op = op_maker.create('general_response')
-
-op_seq_maker = OpSeqMaker()
-op_seq_maker.add_op(read_op)
-op_seq_maker.add_op(general_dist_kv_infer_op)
-op_seq_maker.add_op(response_op)
-
-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
-server.set_num_threads(4)
-server.load_model_config(sys.argv[1])
-server.prepare_server(
-    workdir="work_dir1",
-    port=9292,
-    device="cpu",
-    cube_conf="./cube/conf/cube.conf")
-server.run_server()
--- a/python/examples/encryption/README.md
+++ b/python/examples/encryption/README.md
-# Encryption Model Prediction
-
-([简体中文](README_CN.md)|English)
-
-## Get Origin Model
-
-The example uses the model file of the fit_a_line example as a origin model
-
-```
-sh get_data.sh
-```
-
-## Encrypt Model
-
-```
-python encrypt.py
-```
-The key is stored in the `key` file, and the encrypted model file and server-side configuration file are stored in the `encrypt_server` directory.
-client-side configuration file are stored in the `encrypt_client` directory.
-
-## Start Encryption Service
-CPU Service
-```
-python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model
-```
-GPU Service
-```
-python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
-```
-
-## Prediction
-```
-python test_client.py uci_housing_client/serving_client_conf.prototxt
-```
--- a/python/examples/encryption/README_CN.md
+++ b/python/examples/encryption/README_CN.md
-# 加密模型预测
-
-(简体中文|[English](README.md))
-
-## 获取明文模型
-
-示例中使用fit_a_line示例的模型文件作为明文模型
-
-```
-sh get_data.sh
-```
-
-## 模型加密
-
-```
-python encrypt.py
-```
-密钥保存在`key`文件中，加密模型文件以及server端配置文件保存在`encrypt_server`目录下，client端配置文件保存在`encrypt_client`目录下。
-
-## 启动加密预测服务
-CPU预测服务
-```
-python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model
-```
-GPU预测服务
-```
-python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
-```
-
-## 预测
-```
-python test_client.py uci_housing_client/serving_client_conf.prototxt
-```
--- a/python/examples/encryption/encrypt.py
+++ b/python/examples/encryption/encrypt.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from paddle_serving_client.io import inference_model_to_serving
-
-
-def serving_encryption():
-    inference_model_to_serving(
-        dirname="./uci_housing_model",
-        serving_server="encrypt_server",
-        serving_client="encrypt_client",
-        encryption=True)
-
-
-if __name__ == "__main__":
-    serving_encryption()
--- a/python/examples/encryption/get_data.sh
+++ b/python/examples/encryption/get_data.sh
-wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing_example/encrypt.tar.gz
-tar -xzf encrypt.tar.gz
-cp -rvf ../fit_a_line/uci_housing_model .
-cp -rvf ../fit_a_line/uci_housing_client .
--- a/python/examples/encryption/test_client.py
+++ b/python/examples/encryption/test_client.py
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-
-from paddle_serving_client import Client
-import sys
-
-client = Client()
-client.load_client_config(sys.argv[1])
-client.use_key("./key")
-client.connect(["127.0.0.1:9300"], encryption=True)
-
-import paddle
-test_reader = paddle.batch(
-    paddle.reader.shuffle(
-        paddle.dataset.uci_housing.test(), buf_size=500),
-    batch_size=1)
-
-for data in test_reader():
-    fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["price"])
-    print("{} {}".format(fetch_map["price"][0], data[0][1][0]))
--- a/python/examples/fit_a_line/README.md
+++ b/python/examples/fit_a_line/README.md
@@ -14,12 +14,6 @@ sh get_data.sh

 ### Start server

-``` shell
-python test_server.py uci_housing_model/
-```
-
-You can also start the default RPC service with the following line of code:
-
 ```shell
 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393
 ```
@@ -40,7 +34,7 @@ python test_client.py uci_housing_client/serving_client_conf.prototxt

 Start a web service with default web service hosting modules:
 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --name uci
+python test_server.py
 ```

 ### Client prediction

--- a/python/examples/fit_a_line/README_CN.md
+++ b/python/examples/fit_a_line/README_CN.md
@@ -41,7 +41,7 @@ python test_client.py uci_housing_client/serving_client_conf.prototxt
 通过下面的一行代码开启默认web服务：

 ``` shell
-python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --name uci
+python test_server.py
 ```

 ### 客户端预测

--- a/python/examples/fit_a_line/test_client.py
+++ b/python/examples/fit_a_line/test_client.py
@@ -15,6 +15,7 @@

 from paddle_serving_client import Client
 import sys
+import numpy as np

 client = Client()
 client.load_client_config(sys.argv[1])
@@ -27,7 +28,6 @@ test_reader = paddle.batch(
    batch_size=1)

 for data in test_reader():
-    import numpy as np
    new_data = np.zeros((1, 1, 13)).astype("float32")
    new_data[0] = data[0][0]
    fetch_map = client.predict(

--- a/python/examples/fit_a_line/test_server.py
+++ b/python/examples/fit_a_line/test_server.py
@@ -13,24 +13,24 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing

-import os
-import sys
-from paddle_serving_server import OpMaker
-from paddle_serving_server import OpSeqMaker
-from paddle_serving_server import Server
+from paddle_serving_server.web_service import WebService
+import numpy as np

-op_maker = OpMaker()
-read_op = op_maker.create('general_reader')
-general_infer_op = op_maker.create('general_infer')
-response_op = op_maker.create('general_response')

-op_seq_maker = OpSeqMaker()
-op_seq_maker.add_op(read_op)
-op_seq_maker.add_op(general_infer_op)
-op_seq_maker.add_op(response_op)
+class UciService(WebService):
+    def preprocess(self, feed=[], fetch=[]):
+        feed_batch = []
+        is_batch = True
+        new_data = np.zeros((len(feed), 1, 13)).astype("float32")
+        for i, ins in enumerate(feed):
+            nums = np.array(ins["x"]).reshape(1, 1, 13)
+            new_data[i] = nums
+        feed = {"x": new_data}
+        return feed, fetch, is_batch

-server = Server()
-server.set_op_sequence(op_seq_maker.get_op_sequence())
-server.load_model_config(sys.argv[1])
-server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
-server.run_server()
+
+uci_service = UciService(name="uci")
+uci_service.load_model_config("uci_housing_model")
+uci_service.prepare_server(workdir="workdir", port=9292)
+uci_service.run_rpc_service()
+uci_service.run_web_service()
--- a/python/paddle_serving_client/__init__.py
+++ b/python/paddle_serving_client/__init__.py
@@ -13,19 +13,16 @@
 # limitations under the License.
 # pylint: disable=doc-string-missing

+import paddle_serving_client
 import os
+from .proto import sdk_configure_pb2 as sdk
+from .proto import general_model_config_pb2 as m_config
+import google.protobuf.text_format
+import numpy as np
 import time
 import sys
-import requests
-import json
-import base64
-import numpy as np
-import paddle_serving_client
-import google.protobuf.text_format

 import grpc
-from .proto import sdk_configure_pb2 as sdk
-from .proto import general_model_config_pb2 as m_config
 from .proto import multi_lang_general_model_service_pb2
 sys.path.append(
    os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
@@ -164,7 +161,6 @@ class Client(object):
        self.fetch_names_to_idx_ = {}
        self.lod_tensor_set = set()
        self.feed_tensor_len = {}
-        self.key = None

        for i, var in enumerate(model_conf.feed_var):
            self.feed_names_to_idx_[var.alias_name] = i
@@ -197,28 +193,7 @@ class Client(object):
        else:
            self.rpc_timeout_ms = rpc_timeout

-    def use_key(self, key_filename):
-        with open(key_filename, "r") as f:
-            self.key = f.read()
-
-    def get_serving_port(self, endpoints):
-        if self.key is not None:
-            req = json.dumps({"key": base64.b64encode(self.key)})
-        else:
-            req = json.dumps({})
-        r = requests.post("http://" + endpoints[0], req)
-        result = r.json()
-        print(result)
-        if "endpoint_list" not in result:
-            raise ValueError("server not ready")
-        else:
-            endpoints = [
-                endpoints[0].split(":")[0] + ":" +
-                str(result["endpoint_list"][0])
-            ]
-            return endpoints
-
-    def connect(self, endpoints=None, encryption=False):
+    def connect(self, endpoints=None):
        # check whether current endpoint is available
        # init from client config
        # create predictor here
@@ -228,8 +203,6 @@ class Client(object):
                    "You must set the endpoints parameter or use add_variant function to create a variant."
                )
        else:
-            if encryption:
-                endpoints = self.get_serving_port(endpoints)
            if self.predictor_sdk_ is None:
                self.add_variant('default_tag_{}'.format(id(self)), endpoints,
                                 100)

--- a/python/paddle_serving_client/io/__init__.py
+++ b/python/paddle_serving_client/io/__init__.py
@@ -21,9 +21,6 @@ from paddle.fluid.framework import Program
 from paddle.fluid import CPUPlace
 from paddle.fluid.io import save_inference_model
 import paddle.fluid as fluid
-from paddle.fluid.core import CipherUtils
-from paddle.fluid.core import CipherFactory
-from paddle.fluid.core import Cipher
 from ..proto import general_model_config_pb2 as model_conf
 import os

@@ -32,10 +29,7 @@ def save_model(server_model_folder,
               client_config_folder,
               feed_var_dict,
               fetch_var_dict,
-               main_program=None,
-               encryption=False,
-               key_len=128,
-               encrypt_conf=None):
+               main_program=None):
    executor = Executor(place=CPUPlace())

    feed_var_names = [feed_var_dict[x].name for x in feed_var_dict]
@@ -44,29 +38,14 @@ def save_model(server_model_folder,
    for key in sorted(fetch_var_dict.keys()):
        target_vars.append(fetch_var_dict[key])
        target_var_names.append(key)
-    if not encryption:
-        save_inference_model(
-            server_model_folder,
-            feed_var_names,
-            target_vars,
-            executor,
-            main_program=main_program)
-    else:
-        if encrypt_conf == None:
-            aes_cipher = CipherFactory.create_cipher()
-        else:
-            #todo: more encryption algorithms
-            pass
-        key = CipherUtils.gen_key_to_file(128, "key")
-        params = fluid.io.save_persistables(
-            executor=executor, dirname=None, main_program=main_program)
-        model = main_program.desc.serialize_to_string()
-        if not os.path.exists(server_model_folder):
-            os.makedirs(server_model_folder)
-        os.chdir(server_model_folder)
-        aes_cipher.encrypt_to_file(params, key, "encrypt_params")
-        aes_cipher.encrypt_to_file(model, key, "encrypt_model")
-        os.chdir("..")
+
+    save_inference_model(
+        server_model_folder,
+        feed_var_names,
+        target_vars,
+        executor,
+        main_program=main_program)
+
    config = model_conf.GeneralModelConfig()

    #int64 = 0; float32 = 1; int32 = 2;
@@ -137,10 +116,7 @@ def inference_model_to_serving(dirname,
                               serving_server="serving_server",
                               serving_client="serving_client",
                               model_filename=None,
-                               params_filename=None,
-                               encryption=False,
-                               key_len=128,
-                               encrypt_conf=None):
+                               params_filename=None):
    place = fluid.CPUPlace()
    exe = fluid.Executor(place)
    inference_program, feed_target_names, fetch_targets = \
@@ -151,7 +127,7 @@ def inference_model_to_serving(dirname,
    }
    fetch_dict = {x.name: x for x in fetch_targets}
    save_model(serving_server, serving_client, feed_dict, fetch_dict,
-               inference_program, encryption, key_len, encrypt_conf)
+               inference_program)
    feed_names = feed_dict.keys()
    fetch_names = fetch_dict.keys()
    return feed_names, fetch_names
--- a/python/paddle_serving_server/__init__.py
+++ b/python/paddle_serving_server/__init__.py
@@ -157,7 +157,6 @@ class Server(object):
        self.cur_path = os.getcwd()
        self.use_local_bin = False
        self.mkl_flag = False
-        self.encryption_model = False
        self.product_name = None
        self.container_id = None
        self.model_config_paths = None  # for multi-model in a workflow
@@ -198,9 +197,6 @@ class Server(object):
    def set_ir_optimize(self, flag=False):
        self.ir_optimization = flag

-    def use_encryption_model(self, flag=False):
-        self.encryption_model = flag
-
    def set_product_name(self, product_name=None):
        if product_name == None:
            raise ValueError("product_name can't be None.")
@@ -236,15 +232,9 @@ class Server(object):
            engine.force_update_static_cache = False

            if device == "cpu":
-                if self.encryption_model:
-                    engine.type = "FLUID_CPU_ANALYSIS_ENCRYPT"
-                else:
-                    engine.type = "FLUID_CPU_ANALYSIS_DIR"
+                engine.type = "FLUID_CPU_ANALYSIS_DIR"
            elif device == "gpu":
-                if self.encryption_model:
-                    engine.type = "FLUID_GPU_ANALYSIS_ENCRYPT"
-                else:
-                    engine.type = "FLUID_GPU_ANALYSIS_DIR"
+                engine.type = "FLUID_GPU_ANALYSIS_DIR"

            self.model_toolkit_conf.engines.extend([engine])


--- a/python/paddle_serving_server/serve.py
+++ b/python/paddle_serving_server/serve.py
@@ -18,14 +18,8 @@ Usage:
        python -m paddle_serving_server.serve --model ./serving_server_model --port 9292
 """
 import argparse
-import sys
-import json
-import base64
-import time
-from multiprocessing import Process
-from web_service import WebService, port_is_available
+from .web_service import WebService
 from flask import Flask, request
-from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer


 def parse_args():  # pylint: disable=doc-string-missing
@@ -59,11 +53,6 @@ def parse_args():  # pylint: disable=doc-string-missing
        type=int,
        default=512 * 1024 * 1024,
        help="Limit sizes of messages")
-    parser.add_argument(
-        "--use_encryption_model",
-        default=False,
-        action="store_true",
-        help="Use encryption model")
    parser.add_argument(
        "--use_multilang",
        default=False,
@@ -82,18 +71,17 @@ def parse_args():  # pylint: disable=doc-string-missing
    return parser.parse_args()


-def start_standard_model(serving_port):  # pylint: disable=doc-string-missing
+def start_standard_model():  # pylint: disable=doc-string-missing
    args = parse_args()
    thread_num = args.thread
    model = args.model
-    port = serving_port
+    port = args.port
    workdir = args.workdir
    device = args.device
    mem_optim = args.mem_optim_off is False
    ir_optim = args.ir_optim
    max_body_size = args.max_body_size
    use_mkl = args.use_mkl
-    use_encryption_model = args.use_encryption_model
    use_multilang = args.use_multilang

    if model == "":
@@ -123,7 +111,6 @@ def start_standard_model(serving_port):  # pylint: disable=doc-string-missing
    server.use_mkl(use_mkl)
    server.set_max_body_size(max_body_size)
    server.set_port(port)
-    server.use_encryption_model(use_encryption_model)
    if args.product_name != None:
        server.set_product_name(args.product_name)
    if args.container_id != None:
@@ -134,88 +121,11 @@ def start_standard_model(serving_port):  # pylint: disable=doc-string-missing
    server.run_server()


-class MainService(BaseHTTPRequestHandler):
-    def get_available_port(self):
-        default_port = 12000
-        for i in range(1000):
-            if port_is_available(default_port + i):
-                return default_port + i
-
-    def start_serving(self):
-        start_standard_model(serving_port)
-
-    def get_key(self, post_data):
-        if "key" not in post_data:
-            return False
-        else:
-            key = base64.b64decode(post_data["key"])
-            with open(args.model + "/key", "w") as f:
-                f.write(key)
-            return True
-
-    def check_key(self, post_data):
-        if "key" not in post_data:
-            return False
-        else:
-            key = base64.b64decode(post_data["key"])
-            with open(args.model + "/key", "r") as f:
-                cur_key = f.read()
-            return (key == cur_key)
-
-    def start(self, post_data):
-        post_data = json.loads(post_data)
-        global p_flag
-        if not p_flag:
-            if args.use_encryption_model:
-                print("waiting key for model")
-                if not self.get_key(post_data):
-                    print("not found key in request")
-                    return False
-            global serving_port
-            global p
-            serving_port = self.get_available_port()
-            p = Process(target=self.start_serving)
-            p.start()
-            time.sleep(3)
-            if p.is_alive():
-                p_flag = True
-            else:
-                return False
-        else:
-            if p.is_alive():
-                if not self.check_key(post_data):
-                    return False
-            else:
-                return False
-        return True
-
-    def do_POST(self):
-        content_length = int(self.headers['Content-Length'])
-        post_data = self.rfile.read(content_length)
-        if self.start(post_data):
-            response = {"endpoint_list": [serving_port]}
-        else:
-            response = {"message": "start serving failed"}
-        self.send_response(200)
-        self.send_header('Content-type', 'application/json')
-        self.end_headers()
-        self.wfile.write(json.dumps(response))
-
-
 if __name__ == "__main__":
+
    args = parse_args()
    if args.name == "None":
-        if args.use_encryption_model:
-            p_flag = False
-            p = None
-            serving_port = 0
-            server = HTTPServer(('localhost', int(args.port)), MainService)
-            print(
-                'Starting encryption server, waiting for key from client, use <Ctrl-C> to stop'
-            )
-            server.serve_forever()
-        else:
-            start_standard_model(args.port)
+        start_standard_model()
    else:
        service = WebService(name=args.name)
        service.load_model_config(args.model)

--- a/python/paddle_serving_server/web_service.py
+++ b/python/paddle_serving_server/web_service.py
@@ -25,16 +25,6 @@ from paddle_serving_server import pipeline
 from paddle_serving_server.pipeline import Op


-def port_is_available(port):
-    with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
-        sock.settimeout(2)
-        result = sock.connect_ex(('0.0.0.0', port))
-    if result != 0:
-        return True
-    else:
-        return False
-
-
 class WebService(object):
    def __init__(self, name="default_service"):
        self.name = name
@@ -68,7 +58,7 @@ class WebService(object):
        if os.path.isdir(model_config):
            client_config = "{}/serving_server_conf.prototxt".format(
                model_config)
-        elif os.path.isfile(path):
+        elif os.path.isfile(model_config):
            client_config = model_config
        model_conf = m_config.GeneralModelConfig()
        f = open(client_config, 'r')
@@ -120,7 +110,7 @@ class WebService(object):
        self.mem_optim = mem_optim
        self.ir_optim = ir_optim
        for i in range(1000):
-            if port_is_available(default_port + i):
+            if self.port_is_available(default_port + i):
                self.port_list.append(default_port + i)
                break


--- a/python/paddle_serving_server_gpu/__init__.py
+++ b/python/paddle_serving_server_gpu/__init__.py
@@ -25,7 +25,9 @@ from .version import serving_server_version
 from contextlib import closing
 import argparse
 import collections
-import fcntl
+import sys
+if sys.platform.startswith('win') is False:
+    import fcntl
 import shutil
 import numpy as np
 import grpc
@@ -68,11 +70,6 @@ def serve_args():
        type=int,
        default=512 * 1024 * 1024,
        help="Limit sizes of messages")
-    parser.add_argument(
-        "--use_encryption_model",
-        default=False,
-        action="store_true",
-        help="Use encryption model")
    parser.add_argument(
        "--use_multilang",
        default=False,
@@ -282,8 +279,7 @@ class Server(object):
    def set_trt(self):
        self.use_trt = True

-    def _prepare_engine(self, model_config_paths, device, use_encryption_model):
-
+    def _prepare_engine(self, model_config_paths, device):
        if self.model_toolkit_conf == None:
            self.model_toolkit_conf = server_sdk.ModelToolkitConf()

@@ -305,15 +301,9 @@ class Server(object):
            engine.use_trt = self.use_trt

            if device == "cpu":
-                if use_encryption_model:
-                    engine.type = "FLUID_CPU_ANALYSIS_ENCRPT"
-                else:
-                    engine.type = "FLUID_CPU_ANALYSIS_DIR"
+                engine.type = "FLUID_CPU_ANALYSIS_DIR"
            elif device == "gpu":
-                if use_encryption_model:
-                    engine.type = "FLUID_GPU_ANALYSIS_ENCRPT"
-                else:
-                    engine.type = "FLUID_GPU_ANALYSIS_DIR"
+                engine.type = "FLUID_GPU_ANALYSIS_DIR"

            self.model_toolkit_conf.engines.extend([engine])

@@ -470,7 +460,6 @@ class Server(object):
                       workdir=None,
                       port=9292,
                       device="cpu",
-                       use_encryption_model=False,
                       cube_conf=None):
        if workdir == None:
            workdir = "./tmp"
@@ -484,8 +473,7 @@ class Server(object):

        self.set_port(port)
        self._prepare_resource(workdir, cube_conf)
-        self._prepare_engine(self.model_config_paths, device,
-                             use_encryption_model)
+        self._prepare_engine(self.model_config_paths, device)
        self._prepare_infer_service(port)
        self.workdir = workdir


--- a/python/paddle_serving_server_gpu/serve.py
+++ b/python/paddle_serving_server_gpu/serve.py
@@ -19,21 +19,19 @@ Usage:
 """
 import argparse
 import os
-import json
-import base64
 from multiprocessing import Pool, Process
 from paddle_serving_server_gpu import serve_args
 from flask import Flask, request
-from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer


-def start_gpu_card_model(index, gpuid, port, args):  # pylint: disable=doc-string-missing
+def start_gpu_card_model(index, gpuid, args):  # pylint: disable=doc-string-missing
    gpuid = int(gpuid)
    device = "gpu"
+    port = args.port
    if gpuid == -1:
        device = "cpu"
    elif gpuid >= 0:
-        port = port + index
+        port = args.port + index
    thread_num = args.thread
    model = args.model
    mem_optim = args.mem_optim_off is False
@@ -75,20 +73,14 @@ def start_gpu_card_model(index, gpuid, port, args):  # pylint: disable=doc-strin
        server.set_container_id(args.container_id)

    server.load_model_config(model)
-    server.prepare_server(
-        workdir=workdir,
-        port=port,
-        device=device,
-        use_encryption_model=args.use_encryption_model)
+    server.prepare_server(workdir=workdir, port=port, device=device)
    if gpuid >= 0:
        server.set_gpuid(gpuid)
    server.run_server()


-def start_multi_card(args, serving_port=None):  # pylint: disable=doc-string-missing
+def start_multi_card(args):  # pylint: disable=doc-string-missing
    gpus = ""
-    if serving_port == None:
-        serving_port = args.port
    if args.gpu_ids == "":
        gpus = []
    else:
@@ -105,16 +97,14 @@ def start_multi_card(args, serving_port=None):  # pylint: disable=doc-string-mis
            env_gpus = []
    if len(gpus) <= 0:
        print("gpu_ids not set, going to run cpu service.")
-        start_gpu_card_model(-1, -1, serving_port, args)
+        start_gpu_card_model(-1, -1, args)
    else:
        gpu_processes = []
        for i, gpu_id in enumerate(gpus):
            p = Process(
-                target=start_gpu_card_model,
-                args=(
+                target=start_gpu_card_model, args=(
                    i,
                    gpu_id,
-                    serving_port,
                    args, ))
            gpu_processes.append(p)
        for p in gpu_processes:
@@ -123,89 +113,10 @@ def start_multi_card(args, serving_port=None):  # pylint: disable=doc-string-mis
            p.join()


-class MainService(BaseHTTPRequestHandler):
-    def get_available_port(self):
-        default_port = 12000
-        for i in range(1000):
-            if port_is_available(default_port + i):
-                return default_port + i
-
-    def start_serving(self):
-        start_multi_card(args, serving_port)
-
-    def get_key(self, post_data):
-        if "key" not in post_data:
-            return False
-        else:
-            key = base64.b64decode(post_data["key"])
-            with open(args.model + "/key", "w") as f:
-                f.write(key)
-            return True
-
-    def check_key(self, post_data):
-        if "key" not in post_data:
-            return False
-        else:
-            key = base64.b64decode(post_data["key"])
-            with open(args.model + "/key", "r") as f:
-                cur_key = f.read()
-            return (key == cur_key)
-
-    def start(self, post_data):
-        post_data = json.loads(post_data)
-        global p_flag
-        if not p_flag:
-            if args.use_encryption_model:
-                print("waiting key for model")
-                if not self.get_key(post_data):
-                    print("not found key in request")
-                    return False
-            global serving_port
-            global p
-            serving_port = self.get_available_port()
-            p = Process(target=self.start_serving)
-            p.start()
-            time.sleep(3)
-            if p.is_alive():
-                p_flag = True
-            else:
-                return False
-        else:
-            if p.is_alive():
-                if not self.check_key(post_data):
-                    return False
-            else:
-                return False
-        return True
-
-    def do_POST(self):
-        content_length = int(self.headers['Content-Length'])
-        post_data = self.rfile.read(content_length)
-        if self.start(post_data):
-            response = {"endpoint_list": [serving_port]}
-        else:
-            response = {"message": "start serving failed"}
-        self.send_response(200)
-        self.send_header('Content-type', 'application/json')
-        self.end_headers()
-        self.wfile.write(json.dumps(response))
-
-
 if __name__ == "__main__":
    args = serve_args()
    if args.name == "None":
-        from .web_service import port_is_available
-        if args.use_encryption_model:
-            p_flag = False
-            p = None
-            serving_port = 0
-            server = HTTPServer(('localhost', int(args.port)), MainService)
-            print(
-                'Starting encryption server, waiting for key from client, use <Ctrl-C> to stop'
-            )
-            server.serve_forever()
-        else:
-            start_multi_card(args)
+        start_multi_card(args)
    else:
        from .web_service import WebService
        web_service = WebService(name=args.name)

--- a/python/paddle_serving_server_gpu/web_service.py
+++ b/python/paddle_serving_server_gpu/web_service.py
@@ -28,16 +28,6 @@ from paddle_serving_server_gpu import pipeline
 from paddle_serving_server_gpu.pipeline import Op


-def port_is_available(port):
-    with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
-        sock.settimeout(2)
-        result = sock.connect_ex(('0.0.0.0', port))
-    if result != 0:
-        return True
-    else:
-        return False
-
-
 class WebService(object):
    def __init__(self, name="default_service"):
        self.name = name
@@ -74,7 +64,7 @@ class WebService(object):
        if os.path.isdir(model_config):
            client_config = "{}/serving_server_conf.prototxt".format(
                model_config)
-        elif os.path.isfile(path):
+        elif os.path.isfile(model_config):
            client_config = model_config
        model_conf = m_config.GeneralModelConfig()
        f = open(client_config, 'r')
@@ -146,7 +136,7 @@ class WebService(object):
        self.port_list = []
        default_port = 12000
        for i in range(1000):
-            if port_is_available(default_port + i):
+            if self.port_is_available(default_port + i):
                self.port_list.append(default_port + i)
            if len(self.port_list) > len(self.gpus):
                break

--- a/python/pipeline/pipeline_server.py
+++ b/python/pipeline/pipeline_server.py
@@ -333,7 +333,7 @@ class ServerYamlConfChecker(object):
            raise SystemExit("Failed to prepare_server: only one of yml_file"
                             " or yml_dict can be selected as the parameter.")
        if yml_file is not None:
-            with open(yml_file) as f:
+            with open(yml_file, encoding='utf-8') as f:
                conf = yaml.load(f.read())
        elif yml_dict is not None:
            conf = yml_dict

--- a/requirements_win.txt
+++ b/requirements_win.txt
--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
@@ -29,7 +29,7 @@ util.gen_pipeline_code("paddle_serving_server")

 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio <= 1.33.2', 'grpcio-tools <= 1.33.2',
-    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app', 'func_timeout', 'pyyaml'
+    'flask >= 1.1.1', 'func_timeout', 'pyyaml'
 ]

 packages=['paddle_serving_server',

--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
@@ -31,7 +31,7 @@ util.gen_pipeline_code("paddle_serving_server_gpu")

 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio <= 1.33.2', 'grpcio-tools <= 1.33.2',
-    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app', 'func_timeout', 'pyyaml'
+    'flask >= 1.1.1', 'func_timeout', 'pyyaml'
 ]

 packages=['paddle_serving_server_gpu',

--- a/requirements.txt
+++ b/requirements.txt
-sphinx==2.1.0
-mistune
-sphinx_rtd_theme
-paddlepaddle>=1.8.4
-shapely<=1.6.1
--- a/tools/Dockerfile.centos6.cuda9.0-cudnn7.devel
+++ b/tools/Dockerfile.centos6.cuda9.0-cudnn7.devel
@@ -39,8 +39,6 @@ RUN yum -y install wget && \
    make clean && \
    echo 'export PATH=/usr/local/python3.6/bin:$PATH' >> /root/.bashrc && \
    echo 'export LD_LIBRARY_PATH=/usr/local/python3.6/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
-    pip install requests && \
-    pip3 install requests && \
    source /root/.bashrc && \
    cd .. && rm -rf Python-3.6.8* && \
    wget https://github.com/protocolbuffers/protobuf/releases/download/v3.11.2/protobuf-all-3.11.2.tar.gz && \

--- a/tools/Dockerfile.centos6.devel
+++ b/tools/Dockerfile.centos6.devel
@@ -49,8 +49,6 @@ RUN yum -y install wget && \
    cd .. && rm -rf protobuf-* && \
    yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
    yum clean all && \
-    pip install requests && \
-    pip3 install requests && \
    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
    echo "export LANG=en_US.utf8" >> /root/.bashrc && \
    echo "export LANGUAGE=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.ci
+++ b/tools/Dockerfile.ci
@@ -23,8 +23,7 @@ RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
 RUN yum -y install python-devel sqlite-devel >/dev/null \
    && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
    && python get-pip.py >/dev/null \
-    && rm get-pip.py \
-    && pip install requests  
+    && rm get-pip.py

 RUN wget http://nixos.org/releases/patchelf/patchelf-0.10/patchelf-0.10.tar.bz2 \
    && yum -y install bzip2 >/dev/null \
@@ -35,9 +34,6 @@ RUN wget http://nixos.org/releases/patchelf/patchelf-0.10/patchelf-0.10.tar.bz2
    && cd .. \
    && rm -rf patchelf-0.10*

-RUN yum install -y python3 python3-devel \
-    && pip3 install requests
-
 RUN wget https://github.com/protocolbuffers/protobuf/releases/download/v3.11.2/protobuf-all-3.11.2.tar.gz && \
    tar zxf protobuf-all-3.11.2.tar.gz && \
    cd protobuf-3.11.2 && \
@@ -45,6 +41,8 @@ RUN wget https://github.com/protocolbuffers/protobuf/releases/download/v3.11.2/p
    make clean && \
    cd .. && rm -rf protobuf-*

+RUN yum install -y python3 python3-devel
+
 RUN yum -y update >/dev/null \
    && yum -y install dnf >/dev/null \
    && yum -y install dnf-plugins-core >/dev/null \

--- a/tools/Dockerfile.cuda10.0-cudnn7.devel
+++ b/tools/Dockerfile.cuda10.0-cudnn7.devel
@@ -30,13 +30,11 @@ RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
 RUN yum -y install python-devel sqlite-devel  \
    && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
    && python get-pip.py >/dev/null \
-    && rm get-pip.py \
-    && pip install requests
+    && rm get-pip.py 

 RUN yum install -y python3 python3-devel \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all \
-    && pip3 install requests
+    && yum clean all 

 RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
    && echo "export LANG=en_US.utf8" >> /root/.bashrc \

--- a/tools/Dockerfile.cuda9.0-cudnn7.devel
+++ b/tools/Dockerfile.cuda9.0-cudnn7.devel
@@ -29,13 +29,11 @@ RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
 RUN yum -y install python-devel sqlite-devel  \
    && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
    && python get-pip.py >/dev/null \
-    && rm get-pip.py \
-    && pip install requests
+    && rm get-pip.py 

 RUN yum install -y python3 python3-devel \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all \
-    && pip3 install requests
+    && yum clean all 

 RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
    && echo "export LANG=en_US.utf8" >> /root/.bashrc \

--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
@@ -19,13 +19,11 @@ RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
 RUN yum -y install python-devel sqlite-devel  \
    && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
    && python get-pip.py >/dev/null \
-    && rm get-pip.py \
-    && pip install requests 
+    && rm get-pip.py 

 RUN yum install -y python3 python3-devel \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all \
-    && pip3 install requests
+    && yum clean all 

 RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
    && echo "export LANG=en_US.utf8" >> /root/.bashrc \

--- a/tools/serving_build.sh
+++ b/tools/serving_build.sh
@@ -514,40 +514,6 @@ function python_test_lac() {
    cd ..
 }

-
-function python_test_encryption(){
-    #pwd: /Serving/python/examples
-    cd encryption
-    sh get_data.sh
-    local TYPE=$1
-    export SERVING_BIN=${SERIVNG_WORKDIR}/build-server-${TYPE}/core/general-server/serving
-    case $TYPE in
-        CPU)
-            #check_cmd "python encrypt.py"
-            #sleep 5
-            check_cmd "python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model > /dev/null &"
-            sleep 5
-            check_cmd "python test_client.py encrypt_client/serving_client_conf.prototxt"
-            kill_server_process
-            ;;
-        GPU)
-            #check_cmd "python encrypt.py"
-            #sleep 5
-            check_cmd "python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0"
-            sleep 5
-            check_cmd "python test_client.py encrypt_client/serving_client_conf.prototxt"
-            kill_servere_process
-            ;;
-        *)
-            echo "error type"
-            exit 1
-            ;;
-    esac
-    echo "encryption $TYPE test finished as expected"
-    setproxy
-    unset SERVING_BIN
-    cd ..
-}
 function java_run_test() {
    # pwd: /Serving
    local TYPE=$1
@@ -563,7 +529,7 @@ function java_run_test() {
            cd examples # pwd: /Serving/java/examples
            mvn compile > /dev/null
            mvn install > /dev/null
-
+            
            # fit_a_line (general, asyn_predict, batch_predict)
            cd ../../python/examples/grpc_impl_example/fit_a_line # pwd: /Serving/python/examples/grpc_impl_example/fit_a_line
            sh get_data.sh
@@ -820,7 +786,7 @@ function python_test_pipeline(){
            python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 --workdir test9292 &> cnn.log &
            python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 --workdir test9393 &> bow.log &
            sleep 5
-
+            
            # test: thread servicer & thread op
            cat << EOF > config.yml
 rpc_port: 18080
@@ -994,7 +960,6 @@ function python_run_test() {
    python_test_lac $TYPE # pwd: /Serving/python/examples
    python_test_multi_process $TYPE # pwd: /Serving/python/examples
    python_test_multi_fetch $TYPE # pwd: /Serving/python/examples
-    python_test_encryption $TYPE # pwd: /Serving/python/examples
    python_test_yolov4 $TYPE # pwd: /Serving/python/examples
    python_test_grpc_impl $TYPE # pwd: /Serving/python/examples
    python_test_resnet50 $TYPE # pwd: /Serving/python/examples