未验证 提交 4d6feecc 编写于 作者: T TeslaZhao 提交者: GitHub

Merge pull request #5 from PaddlePaddle/develop

Sync lastest code form PaddlePaddle/Serving
# gRPC接口
# gRPC接口使用介绍
- [1.与bRPC接口对比](#1与brpc接口对比)
- [1.1 服务端对比](#11-服务端对比)
- [1.2 客服端对比](#12-客服端对比)
- [1.3 其他](#13-其他)
- [2.示例:线性回归预测服务](#2示例线性回归预测服务)
- [获取数据](#获取数据)
- [开启 gRPC 服务端](#开启-grpc-服务端)
- [客户端预测](#客户端预测)
- [同步预测](#同步预测)
- [异步预测](#异步预测)
- [Batch 预测](#batch-预测)
- [通用 pb 预测](#通用-pb-预测)
- [预测超时](#预测超时)
- [List 输入](#list-输入)
- [3.更多示例](#3更多示例)
使用gRPC接口,Client端可以在Win/Linux/MacOS平台上调用不同语言。gRPC 接口实现结构如下:
![](https://github.com/PaddlePaddle/Serving/blob/develop/doc/grpc_impl.png)
## 1.与bRPC接口对比
#### 1.1 服务端对比
* gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数:
gRPC 接口实现形式类似 Web Service:
![](grpc_impl.png)
## 与bRPC接口对比
1. gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数:
```python
```
def load_model_config(self, server_config_paths, client_config_path=None)
```
在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能不同(如 在cube local 中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),此时 gRPC Server 端需要手动设置 gRPC Client 端的配置`client_config_path`
**`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`。**
在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能是不同的(如 cube local 例子中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),所以 gRPC Server 端需要获取 gRPC Client 端的配置;同时为了取消 gRPC Client 端手动加载配置文件的过程,所以设计 gRPC Server 端同时加载两个配置文件。`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`
#### 1.2 客服端对比
2. gRPC Client 端取消 `load_client_config` 步骤:
* gRPC Client 端取消 `load_client_config` 步骤:
`connect` 步骤通过 RPC 获取相应的 prototxt(从任意一个 endpoint 获取即可)。
3. gRPC Client 需要通过 RPC 方式设置 timeout 时间(调用形式与 bRPC Client保持一致)
* gRPC Client 需要通过 RPC 方式设置 timeout 时间(调用形式与 bRPC Client保持一致)
因为 bRPC Client 在 `connect` 后无法更改 timeout 时间,所以当 gRPC Server 收到变更 timeout 的调用请求时会重新创建 bRPC Client 实例以变更 bRPC Client timeout时间,同时 gRPC Client 会设置 gRPC 的 deadline 时间。
**注意,设置 timeout 接口和 Inference 接口不能同时调用(非线程安全),出于性能考虑暂时不加锁。**
4. gRPC Client 端 `predict` 函数添加 `asyn``is_python` 参数:
* gRPC Client 端 `predict` 函数添加 `asyn``is_python` 参数:
```python
```
def predict(self, feed, fetch, need_variant_tag=False, asyn=False, is_python=True)
```
其中,`asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。
1. `asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。
2. `is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。
#### 1.3 其他
* 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。
* 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略,ABTEST 特性还未打齐。
* 系统兼容性:
* [x] CentOS
* [x] macOS
* [x] Windows
* 已经支持的客户端语言:
- Python
- Java
- Go
## 2.示例:线性回归预测服务
以下是采用gRPC实现的关于线性回归预测的一个示例,具体代码详见此[链接](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example/fit_a_line)
#### 获取数据
```shell
sh get_data.sh
```
#### 开启 gRPC 服务端
``` shell
python test_server.py uci_housing_model/
```
也可以通过下面的一行代码开启默认 gRPC 服务:
```shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang
```
注:--use_multilang参数用来启用多语言客户端
### 客户端预测
#### 同步预测
``` shell
python test_sync_client.py
```
#### 异步预测
``` shell
python test_asyn_client.py
```
#### Batch 预测
``` shell
python test_batch_client.py
```
`is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。
#### 通用 pb 预测
5. 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。
``` shell
python test_general_pb_client.py
```
6. 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略,ABTEST 特性还未打齐。
#### 预测超时
7. 经测试,gRPC 版本可以在 Windows、macOS 平台使用。
``` shell
python test_timeout_client.py
```
8. 计划支持的客户端语言:
#### List 输入
- [x] Python
- [ ] Java
- [ ] Go
- [ ] JavaScript
``` shell
python test_list_input_client.py
```
## Python 端的一些例子
## 3.更多示例
详见 `python/examples/grpc_impl_example` 下的示例文件。
详见[`python/examples/grpc_impl_example`](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example)下的示例文件。
......@@ -24,13 +24,13 @@ inference_model_dir = "your_inference_model"
serving_client_dir = "serving_client_dir"
serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir)
inference_model_dir, serving_server_dir, serving_client_dir)
```
if your model file and params file are both standalone, please use the following api.
```
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir,
inference_model_dir, serving_server_dir, serving_client_dir,
model_filename="model", params_filename="params")
```
......@@ -23,11 +23,11 @@ inference_model_dir = "your_inference_model"
serving_client_dir = "serving_client_dir"
serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir)
inference_model_dir, serving_server_dir, serving_client_dir)
```
如果模型中有模型描述文件`model_filename` 和 模型参数文件`params_filename`,那么请用
```
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir,
inference_model_dir, serving_server_dir, serving_client_dir,
model_filename="model", params_filename="params")
```
......@@ -75,7 +75,7 @@
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
<dependency>
......
......@@ -33,5 +33,5 @@ for line in sys.stdin:
for key in feed_dict.keys():
feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
#print(feed_dict)
result = client.predict(feed=feed_dict, fetch=fetch, batch=True)
result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
print(result)
......@@ -29,7 +29,7 @@ class BertService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_res = []
is_batch = True
is_batch = False
for ins in feed:
feed_dict = self.reader.process(ins["words"].encode("utf-8"))
for key in feed_dict.keys():
......
# -*- coding: utf-8 -*-
#
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from __future__ import unicode_literals, absolute_import
import os
import sys
import time
import json
import requests
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args, show_latency
from paddle_serving_app.reader import ChineseBertReader
from paddle_serving_app.reader import *
import numpy as np
args = benchmark_args()
def single_func(idx, resource):
img="./000000570688.jpg"
profile_flags = False
latency_flags = False
if os.getenv("FLAGS_profile_client"):
profile_flags = True
if os.getenv("FLAGS_serving_latency"):
latency_flags = True
latency_list = []
if args.request == "rpc":
preprocess = Sequential([
File2Image(), BGR2RGB(), Div(255.0),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
Resize(640, 640), Transpose((2, 0, 1))
])
postprocess = RCNNPostprocess("label_list.txt", "output")
client = Client()
client.load_client_config(args.model)
client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
start = time.time()
for i in range(turns):
if args.batch_size >= 1:
l_start = time.time()
feed_batch = []
b_start = time.time()
im = preprocess(img)
for bi in range(args.batch_size):
print("1111batch")
print(bi)
feed_batch.append({"image": im,
"im_info": np.array(list(im.shape[1:]) + [1.0]),
"im_shape": np.array(list(im.shape[1:]) + [1.0])})
# im = preprocess(img)
b_end = time.time()
if profile_flags:
sys.stderr.write(
"PROFILE\tpid:{}\tbert_pre_0:{} bert_pre_1:{}\n".format(
os.getpid(),
int(round(b_start * 1000000)),
int(round(b_end * 1000000))))
#result = client.predict(feed=feed_batch, fetch=fetch)
fetch_map = client.predict(
feed=feed_batch,
fetch=["multiclass_nms"])
fetch_map["image"] = img
postprocess(fetch_map)
l_end = time.time()
if latency_flags:
latency_list.append(l_end * 1000 - l_start * 1000)
else:
print("unsupport batch size {}".format(args.batch_size))
else:
raise ValueError("not implemented {} request".format(args.request))
end = time.time()
if latency_flags:
return [[end - start], latency_list]
else:
return [[end - start]]
if __name__ == '__main__':
multi_thread_runner = MultiThreadRunner()
endpoint_list = [
"127.0.0.1:7777"
]
turns = 10
start = time.time()
result = multi_thread_runner.run(
single_func, args.thread, {"endpoint": endpoint_list,"turns": turns})
end = time.time()
total_cost = end - start
avg_cost = 0
for i in range(args.thread):
avg_cost += result[0][i]
avg_cost = avg_cost / args.thread
print("total cost: {}s".format(total_cost))
print("each thread cost: {}s. ".format(avg_cost))
print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
total_cost))
if os.getenv("FLAGS_serving_latency"):
show_latency(result[1])
rm profile_log*
export CUDA_VISIBLE_DEVICES=0
export FLAGS_profile_server=1
export FLAGS_profile_client=1
export FLAGS_serving_latency=1
gpu_id=0
#save cpu and gpu utilization log
if [ -d utilization ];then
rm -rf utilization
else
mkdir utilization
fi
#start server
$PYTHONROOT/bin/python3 -m paddle_serving_server_gpu.serve --model $1 --port 7777 --thread 4 --gpu_ids 0 --ir_optim > elog 2>&1 &
sleep 5
#warm up
$PYTHONROOT/bin/python3 benchmark.py --thread 4 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
echo -e "import psutil\ncpu_utilization=psutil.cpu_percent(1,False)\nprint('CPU_UTILIZATION:', cpu_utilization)\n" > cpu_utilization.py
for thread_num in 1 4 8 16
do
for batch_size in 1
do
job_bt=`date '+%Y%m%d%H%M%S'`
nvidia-smi --id=0 --query-compute-apps=used_memory --format=csv -lms 100 > gpu_use.log 2>&1 &
nvidia-smi --id=0 --query-gpu=utilization.gpu --format=csv -lms 100 > gpu_utilization.log 2>&1 &
gpu_memory_pid=$!
$PYTHONROOT/bin/python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
kill ${gpu_memory_pid}
kill `ps -ef|grep used_memory|awk '{print $2}'`
echo "model_name:" $1
echo "thread_num:" $thread_num
echo "batch_size:" $batch_size
echo "=================Done===================="
echo "model_name:$1" >> profile_log_$1
echo "batch_size:$batch_size" >> profile_log_$1
$PYTHONROOT/bin/python3 cpu_utilization.py >> profile_log_$1
job_et=`date '+%Y%m%d%H%M%S'`
awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "MAX_GPU_MEMORY:", max}' gpu_use.log >> profile_log_$1
awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "GPU_UTILIZATION:", max}' gpu_utilization.log >> profile_log_$1
rm -rf gpu_use.log gpu_utilization.log
$PYTHONROOT/bin/python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
tail -n 8 profile >> profile_log_$1
echo "" >> profile_log_$1
done
done
#Divided log
awk 'BEGIN{RS="\n\n"}{i++}{print > "bert_log_"i}' profile_log_$1
mkdir bert_log && mv bert_log_* bert_log
ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
......@@ -38,7 +38,8 @@ start = time.time()
image_file = "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"
for i in range(10):
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
fetch_map = client.predict(
feed={"image": img}, fetch=["score"], batch=False)
prob = max(fetch_map["score"][0])
label = label_dict[fetch_map["score"][0].tolist().index(prob)].strip(
).replace(",", "")
......
......@@ -13,7 +13,7 @@
# limitations under the License.
import sys
from paddle_serving_client import Client
import numpy as np
from paddle_serving_app.reader import Sequential, URL2Image, Resize, CenterCrop, RGB2BGR, Transpose, Div, Normalize, Base64ToImage
if len(sys.argv) != 4:
......@@ -44,12 +44,13 @@ class ImageService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
for ins in feed:
if "image" not in ins:
raise ("feed data error!")
img = self.seq(ins["image"])
feed_batch.append({"image": img[np.newaxis, :]})
return feed_batch, fetch
return feed_batch, fetch, is_batch
def postprocess(self, feed=[], fetch=[], fetch_map={}):
score_list = fetch_map["score"]
......
......@@ -29,13 +29,14 @@ class IMDBService(WebService):
def preprocess(self, feed={}, fetch=[]):
feed_batch = []
words_lod = [0]
is_batch = True
for ins in feed:
words = self.dataset.get_words_only(ins["words"])
words = np.array(words).reshape(len(words), 1)
words_lod.append(words_lod[-1] + len(words))
feed_batch.append(words)
feed = {"words": np.concatenate(feed_batch), "words.lod": words_lod}
return feed, fetch
return feed, fetch, is_batch
imdb_service = IMDBService(name="imdb")
......
......@@ -15,6 +15,7 @@
from paddle_serving_server.web_service import WebService
import sys
from paddle_serving_app.reader import LACReader
import numpy as np
class LACService(WebService):
......@@ -23,13 +24,21 @@ class LACService(WebService):
def preprocess(self, feed={}, fetch=[]):
feed_batch = []
fetch = ["crf_decode"]
lod_info = [0]
is_batch = True
for ins in feed:
if "words" not in ins:
raise ("feed data error!")
feed_data = self.reader.process(ins["words"])
feed_batch.append({"words": feed_data})
fetch = ["crf_decode"]
return feed_batch, fetch
feed_batch.append(np.array(feed_data).reshape(len(feed_data), 1))
lod_info.append(lod_info[-1] + len(feed_data))
feed_dict = {
"words": np.concatenate(
feed_batch, axis=0),
"words.lod": lod_info
}
return feed_dict, fetch, is_batch
def postprocess(self, feed={}, fetch=[], fetch_map={}):
batch_ret = []
......
# Imagenet Pipeline WebService
This document will takes Imagenet service as an example to introduce how to use Pipeline WebService.
## Get model
```
sh get_model.sh
```
## Start server
```
python resnet50_web_service.py &>log.txt &
```
## RPC test
```
python pipeline_rpc_client.py
```
# Imagenet Pipeline WebService
这里以 Uci 服务为例来介绍 Pipeline WebService 的使用。
## 获取模型
```
sh get_data.sh
```
## 启动服务
```
python web_service.py &>log.txt &
```
## 测试
```
curl -X POST -k http://localhost:18082/uci/prediction -d '{"key": ["x"], "value": ["0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332"]}'
```
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
worker_num: 1
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
http_port: 18082
rpc_port: 9999
dag:
#op资源类型, True, 为线程模型;False,为进程模型
is_thread_op: False
op:
imagenet:
#当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
local_service_conf:
#并发数,is_thread_op=True时,为线程并发;否则为进程并发
concurrency: 2
#uci模型路径
model_config: ResNet50_vd_model
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
devices: "0" # "0,1"
#client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
client_type: local_predictor
#Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["score"]
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz
tar -xzvf ResNet50_vd.tar.gz
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/image_data.tar.gz
tar -xzvf image_data.tar.gz
此差异已折叠。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server_gpu.pipeline import PipelineClient
import numpy as np
import requests
import json
import cv2
import base64
import os
client = PipelineClient()
client.connect(['127.0.0.1:9999'])
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
with open("daisy.jpg", 'rb') as file:
image_data = file.read()
image = cv2_to_base64(image_data)
for i in range(1):
ret = client.predict(feed_dict={"image": image}, fetch=["label", "prob"])
print(ret)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from paddle_serving_app.reader import Sequential, URL2Image, Resize, CenterCrop, RGB2BGR, Transpose, Div, Normalize, Base64ToImage
try:
from paddle_serving_server_gpu.web_service import WebService, Op
except ImportError:
from paddle_serving_server.web_service import WebService, Op
import logging
import numpy as np
import base64, cv2
class ImagenetOp(Op):
def init_op(self):
self.seq = Sequential([
Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
True)
])
self.label_dict = {}
label_idx = 0
with open("imagenet.label") as fin:
for line in fin:
self.label_dict[label_idx] = line.strip()
label_idx += 1
def preprocess(self, input_dicts, data_id, log_id):
(_, input_dict), = input_dicts.items()
data = base64.b64decode(input_dict["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
# Note: class variables(self.var) can only be used in process op mode
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img = self.seq(im)
return {"image": img[np.newaxis, :].copy()}, False, None, ""
def postprocess(self, input_dicts, fetch_dict, log_id):
print(fetch_dict)
score_list = fetch_dict["score"]
result = {"label": [], "prob": []}
for score in score_list:
score = score.tolist()
max_score = max(score)
result["label"].append(self.label_dict[score.index(max_score)]
.strip().replace(",", ""))
result["prob"].append(max_score)
result["label"] = str(result["label"])
result["prob"] = str(result["prob"])
return result, None, ""
class ImageService(WebService):
def get_pipeline_response(self, read_op):
image_op = ImagenetOp(name="imagenet", input_ops=[read_op])
return image_op
uci_service = ImageService(name="imagenet")
uci_service.prepare_pipeline_config("config.yml")
uci_service.run_service()
......@@ -37,6 +37,7 @@ class SentaService(WebService):
#定义senta模型预测服务的预处理,调用顺序:lac reader->lac模型预测->预测结果后处理->senta reader
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
words_lod = [0]
for ins in feed:
if "words" not in ins:
......@@ -64,14 +65,13 @@ class SentaService(WebService):
return {
"words": np.concatenate(feed_batch),
"words.lod": words_lod
}, fetch
}, fetch, is_batch
senta_service = SentaService(name="senta")
senta_service.load_model_config("senta_bilstm_model")
senta_service.prepare_server(workdir="workdir")
senta_service.init_lac_client(
lac_port=9300,
lac_client_config="lac/lac_model/serving_server_conf.prototxt")
lac_port=9300, lac_client_config="lac_model/serving_server_conf.prototxt")
senta_service.run_rpc_service()
senta_service.run_web_service()
#UNET_BENCHMARK 使用说明
## 功能
* benchmark测试
## 注意事项
* 示例图片(可以有多张)请放置于与img_data路径中,支持jpg,jpeg
* 图片张数应该大于等于并发数量
## TODO
* http benchmark
#!/bin/bash
python unet_benchmark.py --thread 1 --batch_size 1 --model ../unet_client/serving_client_conf.prototxt
# thread/batch can be modified as you wish
# -*- coding: utf-8 -*-
#
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
unet bench mark script
20201130 first edition by cg82616424
"""
from __future__ import unicode_literals, absolute_import
import os
import time
import json
import requests
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args, show_latency
from paddle_serving_app.reader import Sequential, File2Image, Resize, Transpose, BGR2RGB, SegPostprocess
args = benchmark_args()
def get_img_names(path):
"""
Brief:
get img files(jpg) under this path
if any exception happened return None
Args:
path (string): image file path
Returns:
list: images names under this folder
"""
if not os.path.exists(path):
return None
if not os.path.isdir(path):
return None
list_name = []
for f_handler in os.listdir(path):
file_path = os.path.join(path, f_handler)
if os.path.isdir(file_path):
continue
else:
if not file_path.endswith(".jpeg") and not file_path.endswith(
".jpg"):
continue
list_name.append(file_path)
return list_name
def preprocess_img(img_list):
"""
Brief:
prepare img data for benchmark
Args:
img_list(list): list for img file path
Returns:
image content binary list after preprocess
"""
preprocess = Sequential([File2Image(), Resize((512, 512))])
result_list = []
for img in img_list:
img_tmp = preprocess(img)
result_list.append(img_tmp)
return result_list
def benckmark_worker(idx, resource):
"""
Brief:
benchmark single worker for unet
Args:
idx(int): worker idx ,use idx to select backend unet service
resource(dict): unet serving endpoint dict
Returns:
latency
TODO:
http benckmarks
"""
profile_flags = False
latency_flags = False
postprocess = SegPostprocess(2)
if os.getenv("FLAGS_profile_client"):
profile_flags = True
if os.getenv("FLAGS_serving_latency"):
latency_flags = True
latency_list = []
client_handler = Client()
client_handler.load_client_config(args.model)
client_handler.connect(
[resource["endpoint"][idx % len(resource["endpoint"])]])
start = time.time()
turns = resource["turns"]
img_list = resource["img_list"]
for i in range(turns):
if args.batch_size >= 1:
l_start = time.time()
feed_batch = []
b_start = time.time()
for bi in range(args.batch_size):
feed_batch.append({"image": img_list[bi]})
b_end = time.time()
if profile_flags:
sys.stderr.write(
"PROFILE\tpid:{}\tunt_pre_0:{} unet_pre_1:{}\n".format(
os.getpid(),
int(round(b_start * 1000000)),
int(round(b_end * 1000000))))
result = client_handler.predict(
feed={"image": img_list[bi]}, fetch=["output"])
#result["filename"] = "./img_data/N0060.jpg" % (os.getpid(), idx, time.time())
#postprocess(result) # if you want to measure post process time, you have to uncomment this line
l_end = time.time()
if latency_flags:
latency_list.append(l_end * 1000 - l_start * 1000)
else:
print("unsupport batch size {}".format(args.batch_size))
end = time.time()
if latency_flags:
return [[end - start], latency_list]
else:
return [[end - start]]
if __name__ == '__main__':
"""
usage:
"""
img_file_list = get_img_names("./img_data")
img_content_list = preprocess_img(img_file_list)
multi_thread_runner = MultiThreadRunner()
endpoint_list = ["127.0.0.1:9494"]
turns = 1
start = time.time()
result = multi_thread_runner.run(benckmark_worker, args.thread, {
"endpoint": endpoint_list,
"turns": turns,
"img_list": img_content_list
})
end = time.time()
total_cost = end - start
avg_cost = 0
for i in range(args.thread):
avg_cost += result[0][i]
avg_cost = avg_cost / args.thread
print("total cost: {}s".format(total_cost))
print("each thread cost: {}s. ".format(avg_cost))
print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
total_cost))
if os.getenv("FLAGS_serving_latency"):
show_latency(result[1])
......@@ -23,13 +23,13 @@ import paddle_serving_server as paddle_serving_server
from .version import serving_server_version
from contextlib import closing
import collections
import fcntl
import shutil
import numpy as np
import grpc
from .proto import multi_lang_general_model_service_pb2
import sys
if sys.platform.startswith('win') is False:
import fcntl
sys.path.append(
os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
from .proto import multi_lang_general_model_service_pb2_grpc
......
......@@ -52,6 +52,20 @@ class WebService(object):
def load_model_config(self, model_config):
print("This API will be deprecated later. Please do not use it")
self.model_config = model_config
import os
from .proto import general_model_config_pb2 as m_config
import google.protobuf.text_format
if os.path.isdir(model_config):
client_config = "{}/serving_server_conf.prototxt".format(
model_config)
elif os.path.isfile(path):
client_config = model_config
model_conf = m_config.GeneralModelConfig()
f = open(client_config, 'r')
model_conf = google.protobuf.text_format.Merge(
str(f.read()), model_conf)
self.feed_names = [var.alias_name for var in model_conf.feed_var]
self.fetch_names = [var.alias_name for var in model_conf.fetch_var]
def _launch_rpc_service(self):
op_maker = OpMaker()
......@@ -179,10 +193,7 @@ class WebService(object):
def run_web_service(self):
print("This API will be deprecated later. Please do not use it")
self.app_instance.run(host="0.0.0.0",
port=self.port,
threaded=False,
processes=1)
self.app_instance.run(host="0.0.0.0", port=self.port, threaded=True)
def get_app_instance(self):
return self.app_instance
......
......@@ -58,6 +58,20 @@ class WebService(object):
def load_model_config(self, model_config):
print("This API will be deprecated later. Please do not use it")
self.model_config = model_config
import os
from .proto import general_model_config_pb2 as m_config
import google.protobuf.text_format
if os.path.isdir(model_config):
client_config = "{}/serving_server_conf.prototxt".format(
model_config)
elif os.path.isfile(path):
client_config = model_config
model_conf = m_config.GeneralModelConfig()
f = open(client_config, 'r')
model_conf = google.protobuf.text_format.Merge(
str(f.read()), model_conf)
self.feed_names = [var.alias_name for var in model_conf.feed_var]
self.fetch_names = [var.alias_name for var in model_conf.fetch_var]
def set_gpus(self, gpus):
print("This API will be deprecated later. Please do not use it")
......@@ -240,10 +254,7 @@ class WebService(object):
def run_web_service(self):
print("This API will be deprecated later. Please do not use it")
self.app_instance.run(host="0.0.0.0",
port=self.port,
threaded=False,
processes=4)
self.app_instance.run(host="0.0.0.0", port=self.port, threaded=True)
def get_app_instance(self):
return self.app_instance
......
......@@ -1343,7 +1343,7 @@ class ResponseOp(Op):
type(var)))
_LOGGER.error("(logid={}) Failed to pack RPC "
"response package: {}".format(
channeldata.id, resp.error_info))
channeldata.id, resp.err_msg))
break
resp.value.append(var)
resp.key.append(name)
......
......@@ -23,7 +23,7 @@ import socket
from .channel import ChannelDataErrcode
from .proto import pipeline_service_pb2
from .proto import pipeline_service_pb2_grpc
import six
_LOGGER = logging.getLogger(__name__)
......@@ -53,7 +53,10 @@ class PipelineClient(object):
if logid is None:
req.logid = 0
else:
req.logid = long(logid)
if six.PY2:
req.logid = long(logid)
elif six.PY3:
req.logid = int(log_id)
feed_dict.pop("logid")
clientip = feed_dict.get("clientip")
......
......@@ -32,8 +32,8 @@ if '${PACK}' == 'ON':
REQUIRED_PACKAGES = [
'six >= 1.10.0', 'sentencepiece', 'opencv-python<=4.2.0.32', 'pillow',
'shapely<=1.6.1', 'pyclipper'
'six >= 1.10.0', 'sentencepiece<=0.1.92', 'opencv-python<=4.2.0.32', 'pillow',
'pyclipper'
]
packages=['paddle_serving_app',
......
sphinx==2.1.0
mistune
sphinx_rtd_theme
paddlepaddle>=1.6
paddlepaddle>=1.8.4
shapely
FROM nvidia/cuda:10.1-cudnn7-devel-centos7
RUN export http_proxy="http://172.19.56.199:3128" \
&& export https_proxy="http://172.19.56.199:3128" \
&& yum -y install wget >/dev/null \
&& yum -y install gcc gcc-c++ make glibc-static which \
&& yum -y install git openssl-devel curl-devel bzip2-devel python-devel \
&& yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false
RUN export http_proxy="http://172.19.56.199:3128" \
&& export https_proxy="http://172.19.56.199:3128" && \
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.11.2/protobuf-all-3.11.2.tar.gz && \
tar zxf protobuf-all-3.11.2.tar.gz && \
cd protobuf-3.11.2 && \
./configure && make -j4 && make install && \
make clean && \
cd .. && rm -rf protobuf-*
RUN export http_proxy="http://172.19.56.199:3128" \
&& export https_proxy="http://172.19.56.199:3128" && \
wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
&& tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \
&& mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \
&& echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \
&& rm cmake-3.2.0-Linux-x86_64.tar.gz
RUN export http_proxy="http://172.19.56.199:3128" \
&& export https_proxy="http://172.19.56.199:3128" && \
wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
&& tar xzf go1.14.linux-amd64.tar.gz \
&& mv go /usr/local/go \
&& echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \
&& echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \
&& rm go1.14.linux-amd64.tar.gz
RUN export http_proxy="http://172.19.56.199:3128" \
&& export https_proxy="http://172.19.56.199:3128" && \
yum -y install python-devel sqlite-devel \
&& curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
&& python get-pip.py >/dev/null \
&& rm get-pip.py
RUN export http_proxy="http://172.19.56.199:3128" \
&& export https_proxy="http://172.19.56.199:3128" && \
yum install -y python3 python3-devel \
&& yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
&& yum clean all
RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
&& echo "export LANG=en_US.utf8" >> /root/.bashrc \
&& echo "export LANGUAGE=en_US.utf8" >> /root/.bashrc
RUN wget https://paddle-serving.bj.bcebos.com/tools/TensorRT-6.0.1.5.CentOS-7.6.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz \
&& tar -xzf TensorRT-6.0.1.5.CentOS-7.6.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz \
&& mv TensorRT-6.0.1.5 /usr/local/ \
&& rm TensorRT-6.0.1.5.CentOS-7.6.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz \
&& echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/TensorRT-6.0.1.5/lib/' >> /root/.bashrc
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册