Merge branch 'develop' of https://github.com/PaddlePaddle/Serving into pyserving

e1542d3c · barrierye · 9040036b · 4470bb08 · e1542d3c · e1542d3c
40 changed file
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
 If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
-Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
+Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client.
 <h2 align="center"> Pre-built services with Paddle Serving</h2>

--- a/README_CN.md
+++ b/README_CN.md
@@ -55,7 +55,7 @@ pip install paddle-serving-server-gpu # GPU
 如果需要使用develop分支编译的安装包，请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载，使用`pip install`命令进行安装。
-客户端安装包支持Centos 7和Ubuntu 18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
+Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18，或者您可以使用HTTP服务，这种情况下不需要安装客户端。
 <h2 align="center"> Paddle Serving预装的服务 </h2>

--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -21,7 +21,7 @@ The following Python code will process the data `test_data/part-0` and write to
 [//file]:#process.py
 ``` python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')
@@ -78,7 +78,7 @@ with open('processed.data') as f:
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1
@@ -88,7 +88,7 @@ with open('processed.data') as f:
 In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
-When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
+When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
 ### Expected Results

--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -20,7 +20,7 @@ sh get_data.sh
 下面Python代码将处理`test_data/part-0`的数据，写入`processed.data`文件中。
 ```python
-from imdb_reader import IMDBDataset
+from paddle_serving_app.reader import IMDBDataset
 imdb_dataset = IMDBDataset()
 imdb_dataset.load_resource('imdb.vocab')
@@ -76,7 +76,7 @@ with open('processed.data') as f:
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
-        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
+        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1

--- a/doc/HOT_LOADING_IN_SERVING.md
+++ b/doc/HOT_LOADING_IN_SERVING.md
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
 ### Product model
-Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
+Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())
 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
+    afs = 'afs://***.***.***.***:***' # User needs to change
-    os.system('{} fs -put -f {} {}'.format(
+    uci = '***,***' # User needs to change
-      hadoop_bin, local_file_path, remote_path))
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
 name = "uci_housing"
 for pass_id in range(30):

--- a/doc/HOT_LOADING_IN_SERVING_CN.md
+++ b/doc/HOT_LOADING_IN_SERVING_CN.md
@@ -46,7 +46,7 @@ Paddle Serving提供了一个自动监控脚本，远端地址更新模型后会
 ### 生产模型
-在`product_path`下运行下面的Python代码生产模型，每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
+在`product_path`下运行下面的Python代码生产模型（运行前需要修改hadoop相关的参数），每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下，上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
 ```python
 import os
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
 exe.run(fluid.default_startup_program())
 def push_to_hdfs(local_file_path, remote_path):
-    hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
+    afs = 'afs://***.***.***.***:***' # User needs to change
-    os.system('{} fs -put -f {} {}'.format(
+    uci = '***,***' # User needs to change
-      hadoop_bin, local_file_path, remote_path))
+    hadoop_bin = '/path/to/haddop/bin' # User needs to change
+    prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
+    os.system('{} -rmr {}/{}'.format(
+      prefix, remote_path, local_file_path))
+    os.system('{} -put {} {}'.format(
+      prefix, local_file_path, remote_path))
 name = "uci_housing"
 for pass_id in range(30):

--- a/doc/PERFORMANCE_OPTIM.md
+++ b/doc/PERFORMANCE_OPTIM.md
@@ -2,9 +2,9 @@
 ([简体中文](./PERFORMANCE_OPTIM_CN.md)|English)
-Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computationa-intensive services.
+Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
-For a prediction service, the easiest way to determine what type it is is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
+For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
 For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.

--- a/doc/SAVE.md
+++ b/doc/SAVE.md
@@ -34,7 +34,7 @@ for line in sys.stdin:
 ## Export from saved model files
 If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
-```
+```python
 import paddle_serving_client.io as serving_io
 serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
 ```

--- a/doc/SAVE_CN.md
+++ b/doc/SAVE_CN.md
@@ -35,7 +35,7 @@ for line in sys.stdin:
 ## 从已保存的模型文件中导出
 如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型，则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
-```
+```python
 import paddle_serving_client.io as serving_io
 serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client",  model_filename=None, params_filename=None)
 ```

--- a/doc/UWSGI_DEPLOY.md
+++ b/doc/UWSGI_DEPLOY.md
@@ -18,7 +18,7 @@ http://10.127.3.150:9393/uci/prediction
 Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. 
 The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
-Next, we will show how to use the [uWSGI] (https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
+Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
 ```python

--- a/python/examples/bert/benchmark.py
+++ b/python/examples/bert/benchmark.py
@@ -21,11 +21,7 @@ import sys
 import time
 from paddle_serving_client import Client
 from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
+from paddle_serving_client.utils import benchmark_args, show_latency
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
 from paddle_serving_app.reader import ChineseBertReader
 args = benchmark_args()
@@ -36,42 +32,75 @@ def single_func(idx, resource):
    dataset = []
    for line in fin:
        dataset.append(line.strip())
+    profile_flags = False
+    latency_flags = False
+    if os.getenv("FLAGS_profile_client"):
+        profile_flags = True
+    if os.getenv("FLAGS_serving_latency"):
+        latency_flags = True
+        latency_list = []
    if args.request == "rpc":
-        reader = ChineseBertReader(vocab_file="vocab.txt", max_seq_len=20)
+        reader = ChineseBertReader({"max_seq_len": 128})
        fetch = ["pooled_output"]
        client = Client()
        client.load_client_config(args.model)
        client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
        start = time.time()
-        for i in range(1000):
+        for i in range(turns):
-            if args.batch_size == 1:
+            if args.batch_size >= 1:
-                feed_dict = reader.process(dataset[i])
+                l_start = time.time()
-                result = client.predict(feed=feed_dict, fetch=fetch)
+                feed_batch = []
+                b_start = time.time()
+                for bi in range(args.batch_size):
+                    feed_batch.append(reader.process(dataset[bi]))
+                b_end = time.time()
+                if profile_flags:
+                    sys.stderr.write(
+                        "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
+                            os.getpid(),
+                            int(round(b_start * 1000000)),
+                            int(round(b_end * 1000000))))
+                result = client.predict(feed=feed_batch, fetch=fetch)
+                l_end = time.time()
+                if latency_flags:
+                    latency_list.append(l_end * 1000 - l_start * 1000)
            else:
                print("unsupport batch size {}".format(args.batch_size))
    elif args.request == "http":
-        start = time.time()
+        raise ("not implemented")
-        header = {"Content-Type": "application/json"}
-        for i in range(1000):
-            dict_data = {"words": dataset[i], "fetch": ["pooled_output"]}
-            r = requests.post(
-                'http://{}/bert/prediction'.format(resource["endpoint"][
-                    idx % len(resource["endpoint"])]),
-                data=json.dumps(dict_data),
-                headers=header)
    end = time.time()
-    return [[end - start]]
+    if latency_flags:
+        return [[end - start], latency_list]
+    else:
+        return [[end - start]]
 if __name__ == '__main__':
    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9292"]
+    endpoint_list = [
-    result = multi_thread_runner.run(single_func, args.thread,
+        "127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295"
-                                     {"endpoint": endpoint_list})
+    ]
+    turns = 10
+    start = time.time()
+    result = multi_thread_runner.run(
+        single_func, args.thread, {"endpoint": endpoint_list,
+                                   "turns": turns})
+    end = time.time()
+    total_cost = end - start
    avg_cost = 0
    for i in range(args.thread):
        avg_cost += result[0][i]
    avg_cost = avg_cost / args.thread
-    print("average total cost {} s.".format(avg_cost))
+    print("total cost :{} s".format(total_cost))
+    print("each thread cost :{} s. ".format(avg_cost))
+    print("qps :{} samples/s".format(args.batch_size * args.thread * turns /
+                                     total_cost))
+    if os.getenv("FLAGS_serving_latency"):
+        show_latency(result[1])
--- a/python/examples/bert/benchmark.sh
+++ b/python/examples/bert/benchmark.sh
 rm profile_log
-for thread_num in 1 2 4 8 16
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+export FLAGS_serving_latency=1
+python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog &
+sleep 5
+#warm up
+python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+for thread_num in 4 8 16
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
+for batch_size in 1 4 16 64 256
-    echo "========================================"
+do
-    echo "batch size : $batch_size" >> profile_log
+    python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
+    echo "model name :" $1
-    tail -n 1 profile >> profile_log
+    echo "thread num :" $thread_num
+    echo "batch size :" $batch_size
+    echo "=================Done===================="
+    echo "model name :$1" >> profile_log_$1
+    echo "batch size :$batch_size" >> profile_log_$1
+    python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
+    tail -n 8 profile >> profile_log_$1
+    echo "" >> profile_log_$1
+done
 done
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/bert/benchmark_batch.py
+++ b/python/examples/bert/benchmark_batch.py
-# -*- coding: utf-8 -*-
-#
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# pylint: disable=doc-string-missing
-from __future__ import unicode_literals, absolute_import
-import os
-import sys
-import time
-from paddle_serving_client import Client
-from paddle_serving_client.utils import MultiThreadRunner
-from paddle_serving_client.utils import benchmark_args
-from batching import pad_batch_data
-import tokenization
-import requests
-import json
-from bert_reader import BertReader
-args = benchmark_args()
-def single_func(idx, resource):
-    fin = open("data-c.txt")
-    dataset = []
-    for line in fin:
-        dataset.append(line.strip())
-    profile_flags = False
-    if os.environ["FLAGS_profile_client"]:
-        profile_flags = True
-    if args.request == "rpc":
-        reader = BertReader(vocab_file="vocab.txt", max_seq_len=20)
-        fetch = ["pooled_output"]
-        client = Client()
-        client.load_client_config(args.model)
-        client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
-        start = time.time()
-        for i in range(1000):
-            if args.batch_size >= 1:
-                feed_batch = []
-                b_start = time.time()
-                for bi in range(args.batch_size):
-                    feed_batch.append(reader.process(dataset[bi]))
-                b_end = time.time()
-                if profile_flags:
-                    print("PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}".format(
-                        os.getpid(),
-                        int(round(b_start * 1000000)),
-                        int(round(b_end * 1000000))))
-                result = client.predict(feed=feed_batch, fetch=fetch)
-            else:
-                print("unsupport batch size {}".format(args.batch_size))
-    elif args.request == "http":
-        raise ("no batch predict for http")
-    end = time.time()
-    return [[end - start]]
-if __name__ == '__main__':
-    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9292"]
-    result = multi_thread_runner.run(single_func, args.thread,
-                                     {"endpoint": endpoint_list})
-    avg_cost = 0
-    for i in range(args.thread):
-        avg_cost += result[0][i]
-    avg_cost = avg_cost / args.thread
-    print("average total cost {} s.".format(avg_cost))
--- a/python/examples/bert/benchmark_batch.sh
+++ b/python/examples/bert/benchmark_batch.sh
-rm profile_log
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9295 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
-sleep 5
-for thread_num in 1 2 4 8 16
-do
-for batch_size in 1 2 4 8 16 32 64 128 256 512
-do
-    $PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
-    echo "thread num: ", $thread_num
-    echo "batch size: ", $batch_size
-    echo "batch size : $batch_size" >> profile_log
-    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
-done
-done
--- a/python/examples/criteo_ctr_with_cube/cube_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_prepare.sh
@@ -17,6 +17,6 @@
 mkdir -p cube_model
 mkdir -p cube/data
 ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
 mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
 cd cube && ./cube 
--- a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
+++ b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh
@@ -17,6 +17,6 @@
 mkdir -p cube_model
 mkdir -p cube/data
 ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8  
-./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1  -only_build=false
 mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
 cd cube && ./cube 
--- a/python/examples/imagenet/benchmark.py
+++ b/python/examples/imagenet/benchmark.py
@@ -93,7 +93,7 @@ def single_func(idx, resource):
 if __name__ == '__main__':
    multi_thread_runner = MultiThreadRunner()
-    endpoint_list = ["127.0.0.1:9696"]
+    endpoint_list = ["127.0.0.1:9393"]
    #endpoint_list = endpoint_list + endpoint_list + endpoint_list
    result = multi_thread_runner.run(single_func, args.thread,
                                     {"endpoint": endpoint_list})

--- a/python/examples/imagenet/benchmark.sh
+++ b/python/examples/imagenet/benchmark.sh
 rm profile_log
-for thread_num in 1 2 4 8
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+export FLAGS_profile_server=1
+export FLAGS_profile_client=1
+python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
+sleep 5
+#warm up
+$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
+for thread_num in 4 8 16
 do
-for batch_size in 1 2 4 8 16 32 64 128
+for batch_size in 1 4 16 64 256
 do
-    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model ResNet50_vd_client_config/serving_client_conf.prototxt --request rpc > profile 2>&1
+    $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
-    echo "========================================"
+    echo "model name :" $1
-    echo "batch size : $batch_size" >> profile_log
+    echo "thread num :" $thread_num
+    echo "batch size :" $batch_size
+    echo "=================Done===================="
+    echo "model name :$1" >> profile_log
+    echo "batch size :$batch_size" >> profile_log
    $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
-    tail -n 1 profile >> profile_log
+    tail -n 8 profile >> profile_log
 done
 done
+ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
--- a/python/examples/lac/lac_client.py
+++ b/python/examples/lac/lac_client.py
@@ -35,5 +35,4 @@ for line in sys.stdin:
    begin = fetch_map['crf_decode.lod'][0]
    end = fetch_map['crf_decode.lod'][1]
    segs = reader.parse_result(line, fetch_map["crf_decode"][begin:end])
+    print("word_seg: " + "|".join(str(words) for words in segs))
-    print({"word_seg": "|".join(segs)})
--- a/python/examples/lac/lac_web_service.py
+++ b/python/examples/lac/lac_web_service.py
@@ -19,7 +19,7 @@ from paddle_serving_app.reader import LACReader
 class LACService(WebService):
    def load_reader(self):
-        self.reader = LACReader("lac_dict")
+        self.reader = LACReader()
    def preprocess(self, feed={}, fetch=[]):
        feed_batch = []

--- a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
+++ b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py
@@ -14,7 +14,7 @@
 from paddle_serving_client import Client
 from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
-from apddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
+from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
 client = Client()
 client.load_client_config(
@@ -28,5 +28,5 @@ seq = Sequential([
 image_file = "daisy.jpg"
 img = seq(image_file)
-fetch_map = client.predict(feed={"image": img}, fetch=["feature_map"])
+fetch_map = client.predict(feed={"image": img}, fetch=["score"])
-print(fetch_map["feature_map"].reshape(-1))
+print(fetch_map["score"].reshape(-1))
--- a/python/examples/senta/README.md
+++ b/python/examples/senta/README.md
@@ -5,6 +5,8 @@
 ```
 python -m paddle_serving_app.package --get_model senta_bilstm
 python -m paddle_serving_app.package --get_model lac
+tar -xzvf senta_bilstm.tar.gz
+tar -xzvf lac.tar.gz
 ```
 ## Start HTTP Service
@@ -17,5 +19,5 @@ In this demo, the LAC task is placed in the preprocessing part of the HTTP predi
 ## Client prediction
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
--- a/python/examples/senta/README_CN.md
+++ b/python/examples/senta/README_CN.md
@@ -5,6 +5,8 @@
 ```
 python -m paddle_serving_app.package --get_model senta_bilstm
 python -m paddle_serving_app.package --get_model lac
+tar -xzvf lac.tar.gz
+tar -xzvf senta_bilstm.tar.gz
 ```
 ## 启动HTTP服务
@@ -17,5 +19,5 @@ python senta_web_service.py
 ## 客户端预测
 ```
-curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
+curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
 ```
--- a/python/examples/unet_for_image_seg/seg_client.py
+++ b/python/examples/unet_for_image_seg/seg_client.py
@@ -27,7 +27,8 @@ preprocess = Sequential(
 postprocess = SegPostprocess(2)
-im = preprocess("N0060.jpg")
+filename = "N0060.jpg"
+im = preprocess(filename)
 fetch_map = client.predict(feed={"image": im}, fetch=["output"])
 fetch_map["filename"] = filename
 postprocess(fetch_map)
--- a/python/examples/util/show_profile.py
+++ b/python/examples/util/show_profile.py
@@ -31,7 +31,7 @@ with open(profile_file) as f:
        if line[0] == "PROFILE":
            prase(line[2])
-print("thread num {}".format(thread_num))
+print("thread num :{}".format(thread_num))
 for name in time_dict:
-    print("{} cost {} s in each thread ".format(name, time_dict[name] / (
+    print("{} cost :{} s in each thread ".format(name, time_dict[name] / (
        1000000.0 * float(thread_num))))
--- a/python/paddle_serving_app/README.md
+++ b/python/paddle_serving_app/README.md
@@ -21,7 +21,7 @@ python -m paddle_serving_app.package --list_model
 python -m paddle_serving_app.package --get_model senta_bilstm
 ```
-10 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks.
+1 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks.
 The model files can be directly used for deployment, and the `--tutorial` argument can be added to obtain the deployment method.
 | Prediction task | Model name                                         |
@@ -30,7 +30,7 @@ The model files can be directly used for deployment, and the `--tutorial` argume
 | SemanticRepresentation | 'ernie'                                     |
 | ChineseWordSegmentation     | 'lac'                                            |
 | ObjectDetection     | 'faster_rcnn'                         |
-| ImageSegmentation     | 'unet', 'deeplabv3'                              |
+| ImageSegmentation     | 'unet', 'deeplabv3','deeplabv3+cityscapes'      |
 | ImageClassification     | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' |
 ## Data preprocess API
@@ -38,7 +38,8 @@ The model files can be directly used for deployment, and the `--tutorial` argume
 paddle_serving_app provides a variety of data preprocessing methods for prediction tasks in the field of CV and NLP.
 - class ChineseBertReader 
 Preprocessing for Chinese semantic representation task.
  - `__init__(vocab_file, max_seq_len=20)`
@@ -54,7 +55,8 @@ Preprocessing for Chinese semantic representation task.
  [example](../examples/bert/bert_client.py)
 - class LACReader 
 Preprocessing for Chinese word segmentation task.
  - `__init__(dict_floder)`
@@ -65,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
    - words（st ）：Original text input.
    - crf_decode（np.array）：CRF code predicted by model.
-  [example](../examples/bert/lac_web_service.py)
+  [example](../examples/lac/lac_web_service.py)
 - class SentaReader

--- a/python/paddle_serving_app/README_CN.md
+++ b/python/paddle_serving_app/README_CN.md
@@ -20,7 +20,7 @@ python -m paddle_serving_app.package --list_model
 python -m paddle_serving_app.package --get_model senta_bilstm
 ```
-paddle_serving_app中内置了10种预训练模型，涵盖了6种预测任务。获取到的模型文件可以直接用于部署，添加`--tutorial`参数可以获取对应的部署方式。
+paddle_serving_app中内置了11种预训练模型，涵盖了6种预测任务。获取到的模型文件可以直接用于部署，添加`--tutorial`参数可以获取对应的部署方式。
 | 预测服务类型 | 模型名称                                         |
 | ------------ | ------------------------------------------------ |
@@ -28,7 +28,7 @@ paddle_serving_app中内置了10种预训练模型，涵盖了6种预测任务
 | 语义理解     | 'ernie'                                          |
 | 中文分词     | 'lac'                                            |
 | 图像检测     | 'faster_rcnn'                                    |
-| 图像分割     | 'unet', 'deeplabv3'                              |
+| 图像分割     | 'unet', 'deeplabv3', 'deeplabv3+cityscapes'                              |
 | 图像分类     | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' |
 ## 数据预处理API
@@ -36,7 +36,7 @@ paddle_serving_app中内置了10种预训练模型，涵盖了6种预测任务
 paddle_serving_app针对CV和NLP领域的模型任务，提供了多种常见的数据预处理方法。
 - class ChineseBertReader 
    中文语义理解模型预处理
  - `__init__(vocab_file, max_seq_len=20)`

--- a/python/paddle_serving_app/models/model_list.py
+++ b/python/paddle_serving_app/models/model_list.py
@@ -38,7 +38,7 @@ class ServingModels(object):
        object_detection_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ObjectDetection/"
        ocr_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/OCR/"
        senta_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/"
-        semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticRepresentation/"
+        semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/"
        wordseg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/"
        self.url_dict = {}

--- a/python/paddle_serving_app/reader/lac_reader.py
+++ b/python/paddle_serving_app/reader/lac_reader.py
@@ -111,6 +111,10 @@ class LACReader(object):
        return word_ids
    def parse_result(self, words, crf_decode):
+        try:
+            words = unicode(words, "utf-8")
+        except:
+            pass
        tags = [self.id2label_dict[str(x[0])] for x in crf_decode]
        sent_out = []

--- a/python/paddle_serving_client/utils/__init__.py
+++ b/python/paddle_serving_client/utils/__init__.py
@@ -17,6 +17,7 @@ import sys
 import subprocess
 import argparse
 from multiprocessing import Pool
+import numpy as np
 def benchmark_args():
@@ -35,6 +36,17 @@ def benchmark_args():
    return parser.parse_args()
+def show_latency(latency_list):
+    latency_array = np.array(latency_list)
+    info = "latency:\n"
+    info += "mean :{} ms\n".format(np.mean(latency_array))
+    info += "median :{} ms\n".format(np.median(latency_array))
+    info += "80 percent :{} ms\n".format(np.percentile(latency_array, 80))
+    info += "90 percent :{} ms\n".format(np.percentile(latency_array, 90))
+    info += "99 percent :{} ms\n".format(np.percentile(latency_array, 99))
+    sys.stderr.write(info)
 class MultiThreadRunner(object):
    def __init__(self):
        pass

--- a/python/paddle_serving_server/monitor.py
+++ b/python/paddle_serving_server/monitor.py
@@ -20,7 +20,7 @@ Usage:
 import os
 import time
 import argparse
-import commands
+import subprocess
 import datetime
 import shutil
 import tarfile
@@ -209,7 +209,7 @@ class HadoopMonitor(Monitor):
        remote_filepath = os.path.join(path, filename)
        cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath)
        _LOGGER.debug('check cmd: {}'.format(cmd))
-        [status, output] = commands.getstatusoutput(cmd)
+        [status, output] = subprocess.getstatusoutput(cmd)
        _LOGGER.debug('resp: {}'.format(output))
        if status == 0:
            [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split()

--- a/python/paddle_serving_server_gpu/monitor.py
+++ b/python/paddle_serving_server_gpu/monitor.py
@@ -20,7 +20,7 @@ Usage:
 import os
 import time
 import argparse
-import commands
+import subprocess
 import datetime
 import shutil
 import tarfile
@@ -209,7 +209,7 @@ class HadoopMonitor(Monitor):
        remote_filepath = os.path.join(path, filename)
        cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath)
        _LOGGER.debug('check cmd: {}'.format(cmd))
-        [status, output] = commands.getstatusoutput(cmd)
+        [status, output] = subprocess.getstatusoutput(cmd)
        _LOGGER.debug('resp: {}'.format(output))
        if status == 0:
            [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split()

--- a/python/setup.py.server.in
+++ b/python/setup.py.server.in
@@ -38,12 +38,9 @@ max_version, mid_version, min_version = python_version()
 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'protobuf >= 3.1.0',
-    'paddle_serving_client', 'flask >= 1.1.1'
+    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
 ]
-if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
-    REQUIRED_PACKAGES.append("paddlepaddle")
 packages=['paddle_serving_server',
          'paddle_serving_server.proto']

--- a/python/setup.py.server_gpu.in
+++ b/python/setup.py.server_gpu.in
@@ -38,11 +38,9 @@ max_version, mid_version, min_version = python_version()
 REQUIRED_PACKAGES = [
    'six >= 1.10.0', 'protobuf >= 3.1.0',
-    'paddle_serving_client', 'flask >= 1.1.1'
+    'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
 ]
-if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
-    REQUIRED_PACKAGES.append("paddlepaddle") 
 packages=['paddle_serving_server_gpu',
          'paddle_serving_server_gpu.proto']

--- a/tools/Dockerfile
+++ b/tools/Dockerfile
@@ -9,4 +9,6 @@ RUN yum -y install wget && \
    yum -y install python3 python3-devel && \
    yum clean all && \
    curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
-    python get-pip.py && rm get-pip.py
+    python get-pip.py && rm get-pip.py && \
+    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.centos6.devel
+++ b/tools/Dockerfile.centos6.devel
@@ -44,4 +44,6 @@ RUN yum -y install wget && \
    cd .. && rm -rf Python-3.6.8* && \
    pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
    yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
-    yum clean all
+    yum clean all && \
+    localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.centos6.gpu.devel
+++ b/tools/Dockerfile.centos6.gpu.devel
@@ -44,4 +44,5 @@ RUN yum -y install wget && \
    cd .. && rm -rf Python-3.6.8* && \
    pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
    yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \
-    yum clean all
+    yum clean all && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.devel
+++ b/tools/Dockerfile.devel
@@ -21,4 +21,6 @@ RUN yum -y install wget >/dev/null \
    && yum install -y python3 python3-devel \
    && pip3 install google protobuf setuptools wheel flask \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all
+    && yum clean all \
+    && localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
+    && echo "export LANG=en_US.utf8" >> /root/.bashrc
--- a/tools/Dockerfile.gpu
+++ b/tools/Dockerfile.gpu
@@ -15,6 +15,7 @@ RUN yum -y install wget && \
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \
    ln -s /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so && \
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
+    echo "export LANG=en_US.utf8" >> /root/.bashrc && \
    mkdir -p /usr/local/cuda/extras
 COPY --from=builder /usr/local/cuda/extras/CUPTI /usr/local/cuda/extras/CUPTI
--- a/tools/Dockerfile.gpu.devel
+++ b/tools/Dockerfile.gpu.devel
@@ -22,4 +22,5 @@ RUN yum -y install wget >/dev/null \
    && yum install -y python3 python3-devel \
    && pip3 install google protobuf setuptools wheel flask \
    && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
-    && yum clean all
+    && yum clean all \
+    && echo "export LANG=en_US.utf8" >> /root/.bashrc