diff --git a/README.md b/README.md index 1818ddd61cc5423c4a590815930d007303f18e81..f209e58b66cc4c056ff4ab30283213534eac52c0 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. -Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client. +Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client.

Pre-built services with Paddle Serving

diff --git a/README_CN.md b/README_CN.md index 29cf095248f4c125b3dba7146e67efe8b7abae6c..05d3ad2100b15830d10c8bc4454a6d319d7b990b 100644 --- a/README_CN.md +++ b/README_CN.md @@ -55,7 +55,7 @@ pip install paddle-serving-server-gpu # GPU 如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。 -客户端安装包支持Centos 7和Ubuntu 18,或者您可以使用HTTP服务,这种情况下不需要安装客户端。 +Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18,或者您可以使用HTTP服务,这种情况下不需要安装客户端。

Paddle Serving预装的服务

diff --git a/doc/ABTEST_IN_PADDLE_SERVING.md b/doc/ABTEST_IN_PADDLE_SERVING.md index f2302e611bc68607ed68f45f81cd833a91938ae6..3ae23504bff2621c9a814a3ac15e5157626f8999 100644 --- a/doc/ABTEST_IN_PADDLE_SERVING.md +++ b/doc/ABTEST_IN_PADDLE_SERVING.md @@ -21,7 +21,7 @@ The following Python code will process the data `test_data/part-0` and write to [//file]:#process.py ``` python -from imdb_reader import IMDBDataset +from paddle_serving_app.reader import IMDBDataset imdb_dataset = IMDBDataset() imdb_dataset.load_resource('imdb.vocab') @@ -78,7 +78,7 @@ with open('processed.data') as f: feed = {"words": word_ids} fetch = ["acc", "cost", "prediction"] [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True) - if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0: + if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0: cnt[tag]['acc'] += 1 cnt[tag]['total'] += 1 @@ -88,7 +88,7 @@ with open('processed.data') as f: In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`. -When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow. +When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow. ### Expected Results diff --git a/doc/ABTEST_IN_PADDLE_SERVING_CN.md b/doc/ABTEST_IN_PADDLE_SERVING_CN.md index 7ba4e5d7dbe643d87fc15e783afea2955b98fa1e..43bb702bd8b0317d7449313c0e1362953ed87744 100644 --- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md +++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md @@ -20,7 +20,7 @@ sh get_data.sh 下面Python代码将处理`test_data/part-0`的数据,写入`processed.data`文件中。 ```python -from imdb_reader import IMDBDataset +from paddle_serving_app.reader import IMDBDataset imdb_dataset = IMDBDataset() imdb_dataset.load_resource('imdb.vocab') @@ -76,7 +76,7 @@ with open('processed.data') as f: feed = {"words": word_ids} fetch = ["acc", "cost", "prediction"] [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True) - if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0: + if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0: cnt[tag]['acc'] += 1 cnt[tag]['total'] += 1 diff --git a/doc/HOT_LOADING_IN_SERVING.md b/doc/HOT_LOADING_IN_SERVING.md index 299b49d4c9b58af413e5507b5523e93a02acc7d1..94575ca51368e4b9d03cdc65ce391a0ae43f0175 100644 --- a/doc/HOT_LOADING_IN_SERVING.md +++ b/doc/HOT_LOADING_IN_SERVING.md @@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold ### Product model -Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`. +Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`. ```python import os @@ -82,9 +82,14 @@ exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) def push_to_hdfs(local_file_path, remote_path): - hadoop_bin = '/hadoop-3.1.2/bin/hadoop' - os.system('{} fs -put -f {} {}'.format( - hadoop_bin, local_file_path, remote_path)) + afs = 'afs://***.***.***.***:***' # User needs to change + uci = '***,***' # User needs to change + hadoop_bin = '/path/to/haddop/bin' # User needs to change + prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci) + os.system('{} -rmr {}/{}'.format( + prefix, remote_path, local_file_path)) + os.system('{} -put {} {}'.format( + prefix, local_file_path, remote_path)) name = "uci_housing" for pass_id in range(30): diff --git a/doc/HOT_LOADING_IN_SERVING_CN.md b/doc/HOT_LOADING_IN_SERVING_CN.md index 83cb20a3f661c6aa4bbcc3312ac131da1bb5038e..97a2272cffed18e7753859e9991757a5cccb7439 100644 --- a/doc/HOT_LOADING_IN_SERVING_CN.md +++ b/doc/HOT_LOADING_IN_SERVING_CN.md @@ -46,7 +46,7 @@ Paddle Serving提供了一个自动监控脚本,远端地址更新模型后会 ### 生产模型 -在`product_path`下运行下面的Python代码生产模型,每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。 +在`product_path`下运行下面的Python代码生产模型(运行前需要修改hadoop相关的参数),每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。 ```python import os @@ -82,9 +82,14 @@ exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) def push_to_hdfs(local_file_path, remote_path): - hadoop_bin = '/hadoop-3.1.2/bin/hadoop' - os.system('{} fs -put -f {} {}'.format( - hadoop_bin, local_file_path, remote_path)) + afs = 'afs://***.***.***.***:***' # User needs to change + uci = '***,***' # User needs to change + hadoop_bin = '/path/to/haddop/bin' # User needs to change + prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci) + os.system('{} -rmr {}/{}'.format( + prefix, remote_path, local_file_path)) + os.system('{} -put {} {}'.format( + prefix, local_file_path, remote_path)) name = "uci_housing" for pass_id in range(30): diff --git a/doc/PERFORMANCE_OPTIM.md b/doc/PERFORMANCE_OPTIM.md index 0de06c16988d14d8f92eced491db7dc423831afe..eae128c40c0b5d40c0fc50346ca3f6e6c4c02eb5 100644 --- a/doc/PERFORMANCE_OPTIM.md +++ b/doc/PERFORMANCE_OPTIM.md @@ -2,9 +2,9 @@ ([简体中文](./PERFORMANCE_OPTIM_CN.md)|English) -Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computationa-intensive services. +Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services. -For a prediction service, the easiest way to determine what type it is is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service. +For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service. For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction. diff --git a/doc/SAVE.md b/doc/SAVE.md index 4fcdfa438574fac7de21c963f5bb173c69261210..54800fa06ab4b8c20c0ffe75d417e1b42ab6ebe6 100644 --- a/doc/SAVE.md +++ b/doc/SAVE.md @@ -34,7 +34,7 @@ for line in sys.stdin: ## Export from saved model files If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving. -``` +```python import paddle_serving_client.io as serving_io serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None ) ``` diff --git a/doc/SAVE_CN.md b/doc/SAVE_CN.md index 3ca715c024a38b6fdce5c973844e7d023eebffcc..aaf0647fd1c4e95584bb7aa42a6671620adeb6d0 100644 --- a/doc/SAVE_CN.md +++ b/doc/SAVE_CN.md @@ -35,7 +35,7 @@ for line in sys.stdin: ## 从已保存的模型文件中导出 如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。 -``` +```python import paddle_serving_client.io as serving_io serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None) ``` diff --git a/doc/UWSGI_DEPLOY.md b/doc/UWSGI_DEPLOY.md index 92b69fc1f3da6c791c1009d41bbb3a3ec6f30594..1aa9c1fce452d8f3525d3646133d90356fce25e6 100644 --- a/doc/UWSGI_DEPLOY.md +++ b/doc/UWSGI_DEPLOY.md @@ -18,7 +18,7 @@ http://10.127.3.150:9393/uci/prediction Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used. -Next, we will show how to use the [uWSGI] (https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments. +Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments. ```python diff --git a/python/examples/bert/benchmark.py b/python/examples/bert/benchmark.py index af75b718b78b2bc130c2411d05d190fc0d298006..f1533d9710d3149a37818d3f1bc146fad6ce6537 100644 --- a/python/examples/bert/benchmark.py +++ b/python/examples/bert/benchmark.py @@ -21,11 +21,7 @@ import sys import time from paddle_serving_client import Client from paddle_serving_client.utils import MultiThreadRunner -from paddle_serving_client.utils import benchmark_args -from batching import pad_batch_data -import tokenization -import requests -import json +from paddle_serving_client.utils import benchmark_args, show_latency from paddle_serving_app.reader import ChineseBertReader args = benchmark_args() @@ -36,42 +32,75 @@ def single_func(idx, resource): dataset = [] for line in fin: dataset.append(line.strip()) + + profile_flags = False + latency_flags = False + if os.getenv("FLAGS_profile_client"): + profile_flags = True + if os.getenv("FLAGS_serving_latency"): + latency_flags = True + latency_list = [] + if args.request == "rpc": - reader = ChineseBertReader(vocab_file="vocab.txt", max_seq_len=20) + reader = ChineseBertReader({"max_seq_len": 128}) fetch = ["pooled_output"] client = Client() client.load_client_config(args.model) client.connect([resource["endpoint"][idx % len(resource["endpoint"])]]) - start = time.time() - for i in range(1000): - if args.batch_size == 1: - feed_dict = reader.process(dataset[i]) - result = client.predict(feed=feed_dict, fetch=fetch) + for i in range(turns): + if args.batch_size >= 1: + l_start = time.time() + feed_batch = [] + b_start = time.time() + for bi in range(args.batch_size): + feed_batch.append(reader.process(dataset[bi])) + b_end = time.time() + + if profile_flags: + sys.stderr.write( + "PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format( + os.getpid(), + int(round(b_start * 1000000)), + int(round(b_end * 1000000)))) + result = client.predict(feed=feed_batch, fetch=fetch) + + l_end = time.time() + if latency_flags: + latency_list.append(l_end * 1000 - l_start * 1000) else: print("unsupport batch size {}".format(args.batch_size)) elif args.request == "http": - start = time.time() - header = {"Content-Type": "application/json"} - for i in range(1000): - dict_data = {"words": dataset[i], "fetch": ["pooled_output"]} - r = requests.post( - 'http://{}/bert/prediction'.format(resource["endpoint"][ - idx % len(resource["endpoint"])]), - data=json.dumps(dict_data), - headers=header) + raise ("not implemented") end = time.time() - return [[end - start]] + if latency_flags: + return [[end - start], latency_list] + else: + return [[end - start]] if __name__ == '__main__': multi_thread_runner = MultiThreadRunner() - endpoint_list = ["127.0.0.1:9292"] - result = multi_thread_runner.run(single_func, args.thread, - {"endpoint": endpoint_list}) + endpoint_list = [ + "127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295" + ] + turns = 10 + start = time.time() + result = multi_thread_runner.run( + single_func, args.thread, {"endpoint": endpoint_list, + "turns": turns}) + end = time.time() + total_cost = end - start + avg_cost = 0 for i in range(args.thread): avg_cost += result[0][i] avg_cost = avg_cost / args.thread - print("average total cost {} s.".format(avg_cost)) + + print("total cost :{} s".format(total_cost)) + print("each thread cost :{} s. ".format(avg_cost)) + print("qps :{} samples/s".format(args.batch_size * args.thread * turns / + total_cost)) + if os.getenv("FLAGS_serving_latency"): + show_latency(result[1]) diff --git a/python/examples/bert/benchmark.sh b/python/examples/bert/benchmark.sh index 7f9e2325f3b8f7db288d2b7d82d0d412e05417cb..7ee5f32e9e5d89a836f8962a256bcdf7bf0b62e2 100644 --- a/python/examples/bert/benchmark.sh +++ b/python/examples/bert/benchmark.sh @@ -1,9 +1,30 @@ rm profile_log -for thread_num in 1 2 4 8 16 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +export FLAGS_profile_server=1 +export FLAGS_profile_client=1 +export FLAGS_serving_latency=1 +python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog & + +sleep 5 + +#warm up +python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 + +for thread_num in 4 8 16 do - $PYTHONROOT/bin/python benchmark.py --thread $thread_num --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1 - echo "========================================" - echo "batch size : $batch_size" >> profile_log - $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log - tail -n 1 profile >> profile_log +for batch_size in 1 4 16 64 256 +do + python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 + echo "model name :" $1 + echo "thread num :" $thread_num + echo "batch size :" $batch_size + echo "=================Done====================" + echo "model name :$1" >> profile_log_$1 + echo "batch size :$batch_size" >> profile_log_$1 + python3 ../util/show_profile.py profile $thread_num >> profile_log_$1 + tail -n 8 profile >> profile_log_$1 + echo "" >> profile_log_$1 +done done + +ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9 diff --git a/python/examples/bert/benchmark_batch.py b/python/examples/bert/benchmark_batch.py deleted file mode 100644 index 7cedb6aa451e0e4a128f0fedbfde1a896977f601..0000000000000000000000000000000000000000 --- a/python/examples/bert/benchmark_batch.py +++ /dev/null @@ -1,79 +0,0 @@ -# -*- coding: utf-8 -*- -# -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# pylint: disable=doc-string-missing - -from __future__ import unicode_literals, absolute_import -import os -import sys -import time -from paddle_serving_client import Client -from paddle_serving_client.utils import MultiThreadRunner -from paddle_serving_client.utils import benchmark_args -from batching import pad_batch_data -import tokenization -import requests -import json -from bert_reader import BertReader -args = benchmark_args() - - -def single_func(idx, resource): - fin = open("data-c.txt") - dataset = [] - for line in fin: - dataset.append(line.strip()) - profile_flags = False - if os.environ["FLAGS_profile_client"]: - profile_flags = True - if args.request == "rpc": - reader = BertReader(vocab_file="vocab.txt", max_seq_len=20) - fetch = ["pooled_output"] - client = Client() - client.load_client_config(args.model) - client.connect([resource["endpoint"][idx % len(resource["endpoint"])]]) - start = time.time() - for i in range(1000): - if args.batch_size >= 1: - feed_batch = [] - b_start = time.time() - for bi in range(args.batch_size): - feed_batch.append(reader.process(dataset[bi])) - b_end = time.time() - if profile_flags: - print("PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}".format( - os.getpid(), - int(round(b_start * 1000000)), - int(round(b_end * 1000000)))) - result = client.predict(feed=feed_batch, fetch=fetch) - else: - print("unsupport batch size {}".format(args.batch_size)) - - elif args.request == "http": - raise ("no batch predict for http") - end = time.time() - return [[end - start]] - - -if __name__ == '__main__': - multi_thread_runner = MultiThreadRunner() - endpoint_list = ["127.0.0.1:9292"] - result = multi_thread_runner.run(single_func, args.thread, - {"endpoint": endpoint_list}) - avg_cost = 0 - for i in range(args.thread): - avg_cost += result[0][i] - avg_cost = avg_cost / args.thread - print("average total cost {} s.".format(avg_cost)) diff --git a/python/examples/bert/benchmark_batch.sh b/python/examples/bert/benchmark_batch.sh deleted file mode 100644 index 272923776d6640880175745920a8fad9e84972fd..0000000000000000000000000000000000000000 --- a/python/examples/bert/benchmark_batch.sh +++ /dev/null @@ -1,19 +0,0 @@ -rm profile_log -export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9295 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog & - -sleep 5 - -for thread_num in 1 2 4 8 16 -do -for batch_size in 1 2 4 8 16 32 64 128 256 512 -do - $PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1 - echo "========================================" - echo "thread num: ", $thread_num - echo "batch size: ", $batch_size - echo "batch size : $batch_size" >> profile_log - $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log - tail -n 1 profile >> profile_log -done -done diff --git a/python/examples/criteo_ctr_with_cube/cube_prepare.sh b/python/examples/criteo_ctr_with_cube/cube_prepare.sh index 2d0efaa56f06e9ad8d1590f1316e64bcc65f268d..1417254a54e2194ab3a0194f2ec970f480787acd 100755 --- a/python/examples/criteo_ctr_with_cube/cube_prepare.sh +++ b/python/examples/criteo_ctr_with_cube/cube_prepare.sh @@ -17,6 +17,6 @@ mkdir -p cube_model mkdir -p cube/data ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature -./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1 -only_build=false +./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1 -only_build=false mv ./cube/data/0_0/test_dict_part0/* ./cube/data/ cd cube && ./cube diff --git a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh index 7c794e103baa3a97d09966c470dd48eb56579500..0db6575ab307fb81cdd0336a20bb9a8ec30d446d 100755 --- a/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh +++ b/python/examples/criteo_ctr_with_cube/cube_quant_prepare.sh @@ -17,6 +17,6 @@ mkdir -p cube_model mkdir -p cube/data ./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8 -./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1 -only_build=false +./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1 -only_build=false mv ./cube/data/0_0/test_dict_part0/* ./cube/data/ cd cube && ./cube diff --git a/python/examples/imagenet/benchmark.py b/python/examples/imagenet/benchmark.py index caa952f121fbd8725c2a6bfe36f0dd84b6a82707..ac7ba8c333d25fb23bfc7695105315bfaa4e76ee 100644 --- a/python/examples/imagenet/benchmark.py +++ b/python/examples/imagenet/benchmark.py @@ -93,7 +93,7 @@ def single_func(idx, resource): if __name__ == '__main__': multi_thread_runner = MultiThreadRunner() - endpoint_list = ["127.0.0.1:9696"] + endpoint_list = ["127.0.0.1:9393"] #endpoint_list = endpoint_list + endpoint_list + endpoint_list result = multi_thread_runner.run(single_func, args.thread, {"endpoint": endpoint_list}) diff --git a/python/examples/imagenet/benchmark.sh b/python/examples/imagenet/benchmark.sh index 618a62c063c0bc4955baf8516bc5bc93e4832394..84885908fa89d050b3ca71386fe2a21533ce0809 100644 --- a/python/examples/imagenet/benchmark.sh +++ b/python/examples/imagenet/benchmark.sh @@ -1,12 +1,28 @@ rm profile_log -for thread_num in 1 2 4 8 +export CUDA_VISIBLE_DEVICES=0,1,2,3 +export FLAGS_profile_server=1 +export FLAGS_profile_client=1 +python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog & + +sleep 5 + +#warm up +$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 + +for thread_num in 4 8 16 do -for batch_size in 1 2 4 8 16 32 64 128 +for batch_size in 1 4 16 64 256 do - $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model ResNet50_vd_client_config/serving_client_conf.prototxt --request rpc > profile 2>&1 - echo "========================================" - echo "batch size : $batch_size" >> profile_log + $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 + echo "model name :" $1 + echo "thread num :" $thread_num + echo "batch size :" $batch_size + echo "=================Done====================" + echo "model name :$1" >> profile_log + echo "batch size :$batch_size" >> profile_log $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log - tail -n 1 profile >> profile_log + tail -n 8 profile >> profile_log done done + +ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9 diff --git a/python/examples/lac/lac_client.py b/python/examples/lac/lac_client.py index ab9af730abb2f5b33f4d0292115b2f7bf682f278..22f3c511dcd2540365623ef9428b60cfcb5e5a34 100644 --- a/python/examples/lac/lac_client.py +++ b/python/examples/lac/lac_client.py @@ -35,5 +35,4 @@ for line in sys.stdin: begin = fetch_map['crf_decode.lod'][0] end = fetch_map['crf_decode.lod'][1] segs = reader.parse_result(line, fetch_map["crf_decode"][begin:end]) - - print({"word_seg": "|".join(segs)}) + print("word_seg: " + "|".join(str(words) for words in segs)) diff --git a/python/examples/lac/lac_web_service.py b/python/examples/lac/lac_web_service.py index 9b1c6693b52393aee1294b521fe30fb1a9fd0d79..bed89f54b626c0cce55767f8edacc3dd33f0104c 100644 --- a/python/examples/lac/lac_web_service.py +++ b/python/examples/lac/lac_web_service.py @@ -19,7 +19,7 @@ from paddle_serving_app.reader import LACReader class LACService(WebService): def load_reader(self): - self.reader = LACReader("lac_dict") + self.reader = LACReader() def preprocess(self, feed={}, fetch=[]): feed_batch = [] diff --git a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py index 8d916cbd8145cdc73424a05fdb2855412f4d4fe2..b249d2a6df85f87258f66c96aaa779eb2e299613 100644 --- a/python/examples/resnet_v2_50/resnet50_v2_tutorial.py +++ b/python/examples/resnet_v2_50/resnet50_v2_tutorial.py @@ -14,7 +14,7 @@ from paddle_serving_client import Client from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop -from apddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize +from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize client = Client() client.load_client_config( @@ -28,5 +28,5 @@ seq = Sequential([ image_file = "daisy.jpg" img = seq(image_file) -fetch_map = client.predict(feed={"image": img}, fetch=["feature_map"]) -print(fetch_map["feature_map"].reshape(-1)) +fetch_map = client.predict(feed={"image": img}, fetch=["score"]) +print(fetch_map["score"].reshape(-1)) diff --git a/python/examples/senta/README.md b/python/examples/senta/README.md index 9aeb6d1191719e067e2cb99d408a6d091c25ede3..8929a9312c17264800f299f77afb583221006068 100644 --- a/python/examples/senta/README.md +++ b/python/examples/senta/README.md @@ -5,6 +5,8 @@ ``` python -m paddle_serving_app.package --get_model senta_bilstm python -m paddle_serving_app.package --get_model lac +tar -xzvf senta_bilstm.tar.gz +tar -xzvf lac.tar.gz ``` ## Start HTTP Service @@ -17,5 +19,5 @@ In this demo, the LAC task is placed in the preprocessing part of the HTTP predi ## Client prediction ``` -curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction +curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction ``` diff --git a/python/examples/senta/README_CN.md b/python/examples/senta/README_CN.md index f958af221d843748836bea325f87ba603411d39c..e5624dc975e6bc00de219f68cbf74dea7cac8360 100644 --- a/python/examples/senta/README_CN.md +++ b/python/examples/senta/README_CN.md @@ -5,6 +5,8 @@ ``` python -m paddle_serving_app.package --get_model senta_bilstm python -m paddle_serving_app.package --get_model lac +tar -xzvf lac.tar.gz +tar -xzvf senta_bilstm.tar.gz ``` ## 启动HTTP服务 @@ -17,5 +19,5 @@ python senta_web_service.py ## 客户端预测 ``` -curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction +curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction ``` diff --git a/python/examples/unet_for_image_seg/seg_client.py b/python/examples/unet_for_image_seg/seg_client.py index 9e76b060955ec74492312c8896efaf3946a3f7ab..44f634b6090159ee1bd37c176eebb7d2b7f37065 100644 --- a/python/examples/unet_for_image_seg/seg_client.py +++ b/python/examples/unet_for_image_seg/seg_client.py @@ -27,7 +27,8 @@ preprocess = Sequential( postprocess = SegPostprocess(2) -im = preprocess("N0060.jpg") +filename = "N0060.jpg" +im = preprocess(filename) fetch_map = client.predict(feed={"image": im}, fetch=["output"]) fetch_map["filename"] = filename postprocess(fetch_map) diff --git a/python/examples/util/show_profile.py b/python/examples/util/show_profile.py index 9153d939338f0ee171af539b9f955d51802ad547..1581dda19bb0abefe6eb21592bda7fc97d8fb7cd 100644 --- a/python/examples/util/show_profile.py +++ b/python/examples/util/show_profile.py @@ -31,7 +31,7 @@ with open(profile_file) as f: if line[0] == "PROFILE": prase(line[2]) -print("thread num {}".format(thread_num)) +print("thread num :{}".format(thread_num)) for name in time_dict: - print("{} cost {} s in each thread ".format(name, time_dict[name] / ( + print("{} cost :{} s in each thread ".format(name, time_dict[name] / ( 1000000.0 * float(thread_num)))) diff --git a/python/paddle_serving_app/README.md b/python/paddle_serving_app/README.md index 6757407939c150ca14a22427a488f41a24feb7ac..cb48ae376086ec4021af617337e43934dd5e5f6e 100644 --- a/python/paddle_serving_app/README.md +++ b/python/paddle_serving_app/README.md @@ -21,7 +21,7 @@ python -m paddle_serving_app.package --list_model python -m paddle_serving_app.package --get_model senta_bilstm ``` -10 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks. +1 pre-trained models are built into paddle_serving_app, covering 6 kinds of prediction tasks. The model files can be directly used for deployment, and the `--tutorial` argument can be added to obtain the deployment method. | Prediction task | Model name | @@ -30,7 +30,7 @@ The model files can be directly used for deployment, and the `--tutorial` argume | SemanticRepresentation | 'ernie' | | ChineseWordSegmentation | 'lac' | | ObjectDetection | 'faster_rcnn' | -| ImageSegmentation | 'unet', 'deeplabv3' | +| ImageSegmentation | 'unet', 'deeplabv3','deeplabv3+cityscapes' | | ImageClassification | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' | ## Data preprocess API @@ -38,7 +38,8 @@ The model files can be directly used for deployment, and the `--tutorial` argume paddle_serving_app provides a variety of data preprocessing methods for prediction tasks in the field of CV and NLP. - class ChineseBertReader - + + Preprocessing for Chinese semantic representation task. - `__init__(vocab_file, max_seq_len=20)` @@ -54,7 +55,8 @@ Preprocessing for Chinese semantic representation task. [example](../examples/bert/bert_client.py) - class LACReader - + + Preprocessing for Chinese word segmentation task. - `__init__(dict_floder)` @@ -65,7 +67,7 @@ Preprocessing for Chinese word segmentation task. - words(st ):Original text input. - crf_decode(np.array):CRF code predicted by model. - [example](../examples/bert/lac_web_service.py) + [example](../examples/lac/lac_web_service.py) - class SentaReader diff --git a/python/paddle_serving_app/README_CN.md b/python/paddle_serving_app/README_CN.md index d29c3fd9fff3ba2ab34ec67b6fd15ad10e3cfd07..181037c55a2aae578cb189525030ccba87146f6e 100644 --- a/python/paddle_serving_app/README_CN.md +++ b/python/paddle_serving_app/README_CN.md @@ -20,7 +20,7 @@ python -m paddle_serving_app.package --list_model python -m paddle_serving_app.package --get_model senta_bilstm ``` -paddle_serving_app中内置了10种预训练模型,涵盖了6种预测任务。获取到的模型文件可以直接用于部署,添加`--tutorial`参数可以获取对应的部署方式。 +paddle_serving_app中内置了11种预训练模型,涵盖了6种预测任务。获取到的模型文件可以直接用于部署,添加`--tutorial`参数可以获取对应的部署方式。 | 预测服务类型 | 模型名称 | | ------------ | ------------------------------------------------ | @@ -28,7 +28,7 @@ paddle_serving_app中内置了10种预训练模型,涵盖了6种预测任务 | 语义理解 | 'ernie' | | 中文分词 | 'lac' | | 图像检测 | 'faster_rcnn' | -| 图像分割 | 'unet', 'deeplabv3' | +| 图像分割 | 'unet', 'deeplabv3', 'deeplabv3+cityscapes' | | 图像分类 | 'resnet_v2_50_imagenet', 'mobilenet_v2_imagenet' | ## 数据预处理API @@ -36,7 +36,7 @@ paddle_serving_app中内置了10种预训练模型,涵盖了6种预测任务 paddle_serving_app针对CV和NLP领域的模型任务,提供了多种常见的数据预处理方法。 - class ChineseBertReader - + 中文语义理解模型预处理 - `__init__(vocab_file, max_seq_len=20)` diff --git a/python/paddle_serving_app/models/model_list.py b/python/paddle_serving_app/models/model_list.py index cf0ca3bf5765d65065e541462eb919ccc5c4b978..d5f42ab78acdbe837a719908d27cda513da02c3f 100644 --- a/python/paddle_serving_app/models/model_list.py +++ b/python/paddle_serving_app/models/model_list.py @@ -38,7 +38,7 @@ class ServingModels(object): object_detection_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ObjectDetection/" ocr_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/OCR/" senta_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/" - semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticRepresentation/" + semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/" wordseg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/" self.url_dict = {} diff --git a/python/paddle_serving_app/reader/lac_reader.py b/python/paddle_serving_app/reader/lac_reader.py index 7e804ff371e2d90d79f7f663e83a854b1b0c9647..8f7d79a6a1e7ce8c4c86b689e2856eea6fa42158 100644 --- a/python/paddle_serving_app/reader/lac_reader.py +++ b/python/paddle_serving_app/reader/lac_reader.py @@ -111,6 +111,10 @@ class LACReader(object): return word_ids def parse_result(self, words, crf_decode): + try: + words = unicode(words, "utf-8") + except: + pass tags = [self.id2label_dict[str(x[0])] for x in crf_decode] sent_out = [] diff --git a/python/paddle_serving_client/utils/__init__.py b/python/paddle_serving_client/utils/__init__.py index 381da6bf9bade2bb0627f4c07851012360905de5..53f40726fbf21a0607b47bb29a20aa6ff50b6221 100644 --- a/python/paddle_serving_client/utils/__init__.py +++ b/python/paddle_serving_client/utils/__init__.py @@ -17,6 +17,7 @@ import sys import subprocess import argparse from multiprocessing import Pool +import numpy as np def benchmark_args(): @@ -35,6 +36,17 @@ def benchmark_args(): return parser.parse_args() +def show_latency(latency_list): + latency_array = np.array(latency_list) + info = "latency:\n" + info += "mean :{} ms\n".format(np.mean(latency_array)) + info += "median :{} ms\n".format(np.median(latency_array)) + info += "80 percent :{} ms\n".format(np.percentile(latency_array, 80)) + info += "90 percent :{} ms\n".format(np.percentile(latency_array, 90)) + info += "99 percent :{} ms\n".format(np.percentile(latency_array, 99)) + sys.stderr.write(info) + + class MultiThreadRunner(object): def __init__(self): pass diff --git a/python/paddle_serving_server/monitor.py b/python/paddle_serving_server/monitor.py index 3f1ff6436917b8ae7ff4ea06fcae1f55bd65e887..84146039c40794436030a8c5c6ba9d18ccbfda06 100644 --- a/python/paddle_serving_server/monitor.py +++ b/python/paddle_serving_server/monitor.py @@ -20,7 +20,7 @@ Usage: import os import time import argparse -import commands +import subprocess import datetime import shutil import tarfile @@ -209,7 +209,7 @@ class HadoopMonitor(Monitor): remote_filepath = os.path.join(path, filename) cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath) _LOGGER.debug('check cmd: {}'.format(cmd)) - [status, output] = commands.getstatusoutput(cmd) + [status, output] = subprocess.getstatusoutput(cmd) _LOGGER.debug('resp: {}'.format(output)) if status == 0: [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split() diff --git a/python/paddle_serving_server_gpu/monitor.py b/python/paddle_serving_server_gpu/monitor.py index 3f1ff6436917b8ae7ff4ea06fcae1f55bd65e887..84146039c40794436030a8c5c6ba9d18ccbfda06 100644 --- a/python/paddle_serving_server_gpu/monitor.py +++ b/python/paddle_serving_server_gpu/monitor.py @@ -20,7 +20,7 @@ Usage: import os import time import argparse -import commands +import subprocess import datetime import shutil import tarfile @@ -209,7 +209,7 @@ class HadoopMonitor(Monitor): remote_filepath = os.path.join(path, filename) cmd = '{} -ls {} 2>/dev/null'.format(self._cmd_prefix, remote_filepath) _LOGGER.debug('check cmd: {}'.format(cmd)) - [status, output] = commands.getstatusoutput(cmd) + [status, output] = subprocess.getstatusoutput(cmd) _LOGGER.debug('resp: {}'.format(output)) if status == 0: [_, _, _, _, _, mdate, mtime, _] = output.split('\n')[-1].split() diff --git a/python/setup.py.server.in b/python/setup.py.server.in index 97f02078806b20f41e917e0c385983a767a4df8c..a7190ecf36c194e7d486f96e1bf8e219a7600dba 100644 --- a/python/setup.py.server.in +++ b/python/setup.py.server.in @@ -38,12 +38,9 @@ max_version, mid_version, min_version = python_version() REQUIRED_PACKAGES = [ 'six >= 1.10.0', 'protobuf >= 3.1.0', - 'paddle_serving_client', 'flask >= 1.1.1' + 'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app' ] -if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"): - REQUIRED_PACKAGES.append("paddlepaddle") - packages=['paddle_serving_server', 'paddle_serving_server.proto'] diff --git a/python/setup.py.server_gpu.in b/python/setup.py.server_gpu.in index 6a651053391b30afb71996c5073d21a5620d3320..90db7addbcd8b1929342a893c8213a48f3c8e9e3 100644 --- a/python/setup.py.server_gpu.in +++ b/python/setup.py.server_gpu.in @@ -38,11 +38,9 @@ max_version, mid_version, min_version = python_version() REQUIRED_PACKAGES = [ 'six >= 1.10.0', 'protobuf >= 3.1.0', - 'paddle_serving_client', 'flask >= 1.1.1' + 'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app' ] -if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"): - REQUIRED_PACKAGES.append("paddlepaddle") packages=['paddle_serving_server_gpu', 'paddle_serving_server_gpu.proto'] diff --git a/tools/Dockerfile b/tools/Dockerfile index dc39adf01288f092143803557b322a0c8fbcb2b4..3c701725400350247153f828410d06cec69856f5 100644 --- a/tools/Dockerfile +++ b/tools/Dockerfile @@ -9,4 +9,6 @@ RUN yum -y install wget && \ yum -y install python3 python3-devel && \ yum clean all && \ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \ - python get-pip.py && rm get-pip.py + python get-pip.py && rm get-pip.py && \ + localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \ + echo "export LANG=en_US.utf8" >> /root/.bashrc diff --git a/tools/Dockerfile.centos6.devel b/tools/Dockerfile.centos6.devel index 5223693d846bdbc90bdefe58c26db29d6a81359d..83981dcc4731252dfc75270b5ce6fc623a0266a8 100644 --- a/tools/Dockerfile.centos6.devel +++ b/tools/Dockerfile.centos6.devel @@ -44,4 +44,6 @@ RUN yum -y install wget && \ cd .. && rm -rf Python-3.6.8* && \ pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \ yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \ - yum clean all + yum clean all && \ + localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \ + echo "export LANG=en_US.utf8" >> /root/.bashrc diff --git a/tools/Dockerfile.centos6.gpu.devel b/tools/Dockerfile.centos6.gpu.devel index 1432d49abe9a4aec3b558d855c9cfcf30efef461..9ee3591b9a1e2ea5881106cf7e67ca28b24c1890 100644 --- a/tools/Dockerfile.centos6.gpu.devel +++ b/tools/Dockerfile.centos6.gpu.devel @@ -44,4 +44,5 @@ RUN yum -y install wget && \ cd .. && rm -rf Python-3.6.8* && \ pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \ yum -y install epel-release && yum -y install patchelf libXext libSM libXrender && \ - yum clean all + yum clean all && \ + echo "export LANG=en_US.utf8" >> /root/.bashrc diff --git a/tools/Dockerfile.devel b/tools/Dockerfile.devel index 385e568273eab54f7dfa51a20bb7dcd89cfa98a8..e4bcd33534cb9e887f49fcba5029619aaa1dea4c 100644 --- a/tools/Dockerfile.devel +++ b/tools/Dockerfile.devel @@ -21,4 +21,6 @@ RUN yum -y install wget >/dev/null \ && yum install -y python3 python3-devel \ && pip3 install google protobuf setuptools wheel flask \ && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\ - && yum clean all + && yum clean all \ + && localedef -c -i en_US -f UTF-8 en_US.UTF-8 \ + && echo "export LANG=en_US.utf8" >> /root/.bashrc diff --git a/tools/Dockerfile.gpu b/tools/Dockerfile.gpu index bf05080ca72e90b2179f6a717f6f4e86e7aefe29..2f38a3a3cd1c8987d34a81259ec9ad6ba67156a7 100644 --- a/tools/Dockerfile.gpu +++ b/tools/Dockerfile.gpu @@ -15,6 +15,7 @@ RUN yum -y install wget && \ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \ ln -s /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so && \ echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \ + echo "export LANG=en_US.utf8" >> /root/.bashrc && \ mkdir -p /usr/local/cuda/extras COPY --from=builder /usr/local/cuda/extras/CUPTI /usr/local/cuda/extras/CUPTI diff --git a/tools/Dockerfile.gpu.devel b/tools/Dockerfile.gpu.devel index 2ffbe4601e1f7e9b05c87f9562b3e0ffc4b967ff..057201cefa1f8de7a105ea9b7f93e7ca9e342777 100644 --- a/tools/Dockerfile.gpu.devel +++ b/tools/Dockerfile.gpu.devel @@ -22,4 +22,5 @@ RUN yum -y install wget >/dev/null \ && yum install -y python3 python3-devel \ && pip3 install google protobuf setuptools wheel flask \ && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\ - && yum clean all + && yum clean all \ + && echo "export LANG=en_US.utf8" >> /root/.bashrc