提交 9fe65331 编写于 作者: H HexToString

Merge branch 'v0.6.0' of https://github.com/PaddlePaddle/Serving into v0.6.0

......@@ -17,7 +17,7 @@ python -m paddle_serving_client.convert --dirname ResNet50_quant
```
Start RPC service, specify the GPU id and precision mode
```
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_gpu --use_trt --precision int8
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8
```
Request the serving service with Client
```
......@@ -27,7 +27,7 @@ from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
"serving_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
......@@ -37,11 +37,11 @@ seq = Sequential([
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
fetch_map = client.predict(feed={"image": img}, fetch=["save_infer_model/scale_0.tmp_0"])
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
```
## Reference
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* [Deploy the quantized model Using Paddle Inference on Intel CPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* [Deploy the quantized model Using Paddle Inference on Nvidia GPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
\ No newline at end of file
* [Deploy the quantized model Using Paddle Inference on Nvidia GPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
......@@ -16,7 +16,7 @@ python -m paddle_serving_client.convert --dirname ResNet50_quant
```
启动rpc服务, 设定所选GPU id、部署模型精度
```
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_gpu --use_trt --precision int8
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8
```
使用client进行请求
```
......@@ -43,4 +43,4 @@ print(fetch_map["score"].reshape(-1))
## 参考文档
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
\ No newline at end of file
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
# resnet50 int8 example
(English|[简体中文](./README_CN.md))
## Obtain the quantized model through PaddleSlim tool
Train the low-precision models please refer to [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html).
## Deploy the quantized model from PaddleSlim using Paddle Serving with Nvidia TensorRT int8 mode
Firstly, download the [Resnet50 int8 model](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz) and convert to Paddle Serving's saved model。
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz
python -m paddle_serving_client.convert --dirname ResNet50_quant
```
Start RPC service, specify the GPU id and precision mode
```
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8
```
Request the serving service with Client
```
python resnet50_client.py
```
## Reference
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* [Deploy the quantized model Using Paddle Inference on Intel CPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* [Deploy the quantized model Using Paddle Inference on Nvidia GPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
# resnet50 int8示例
(简体中文|[English](./README.md))
## 通过PaddleSlim量化生成低精度模型
详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html)
## 使用TensorRT int8加载PaddleSlim Int8量化模型进行部署
首先下载Resnet50 [PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz),并转换为Paddle Serving支持的部署模型格式。
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz
python -m paddle_serving_client.convert --dirname ResNet50_quant
```
启动rpc服务, 设定所选GPU id、部署模型精度
```
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8
```
使用client进行请求
```
python resnet50_client.py
```
## 参考文档
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
......@@ -13,30 +13,20 @@
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
import paddle
import numpy as np
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"serving_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
def single_func(idx, resource):
client = Client()
client.load_client_config(
"./uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
0.6283, 0.4919, 0.1856, 0.0795, -0.0332
]
x = np.array(x)
for i in range(1000):
fetch_map = client.predict(feed={"x": x}, fetch=["price"])
if fetch_map is None:
return [[None]]
return [[0]]
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
multi_thread_runner = MultiThreadRunner()
thread_num = 4
result = multi_thread_runner.run(single_func, thread_num, {})
if None in result[0]:
exit(1)
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["save_infer_model/scale_0.tmp_0"])
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
## Prepare
### download model and extract
```
wget https://paddle-inference-dist.cdn.bcebos.com/PaddleLite/models_and_data_for_unittests/bert_base_chinese.tar.gz
tar zxvf bert_base_chinese.tar.gz
```
### convert model
```
python -m paddle_serving_client.convert --dirname infer_bert-base-chinese_ft_model_4000.pdparams
python3 -m paddle_serving_client.convert --dirname bert_base_chinese --model_filename bert_base_chinese/model.pdmodel --params_filename bert_base_chinese/model.pdiparams
```
### or, you can get the serving saved model directly
```
wget https://paddle-serving.bj.bcebos.com/models/xpu/bert.tar.gz
tar zxvf bert.tar.gz
```
### Getting Dict and Sample Dataset
```
sh get_data.sh
```
this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
## RPC Service
### Start Service
```
pytyon bert_web_service.py serving_server 7703
python3 bert_web_service.py serving_server 7703
```
### Client Prediction
```
python bert_client.py
head data-c.txt | python3 bert_client.py --model serving_client/serving_client_conf.prototxt
```
wget https://paddle-serving.bj.bcebos.com/bert_example/data-c.txt --no-check-certificate
wget https://paddle-serving.bj.bcebos.com/bert_example/vocab.txt --no-check-certificate
此差异已折叠。
## Prepare
### download model and extract
```
wget https://paddle-inference-dist.cdn.bcebos.com/PaddleLite/models_and_data_for_unittests/ernie.tar.gz
tar zxvf ernie.tar.gz
```
### convert model
```
python3 -m paddle_serving_client.convert --dirname erine
python3 -m paddle_serving_client.convert --dirname ernie
```
### or, you can get the serving saved model directly
```
wget https://paddle-serving.bj.bcebos.com/models/xpu/ernie.tar.gz
tar zxvf ernie.tar.gz
```
### Getting Dict and Sample Dataset
```
sh get_data.sh
```
this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
## RPC Service
......
......@@ -23,7 +23,7 @@ args = benchmark_args()
reader = ChineseErnieReader({"max_seq_len": 128})
fetch = ["save_infer_model/scale_0"]
endpoint_list = ['127.0.0.1:7704']
endpoint_list = ['127.0.0.1:12000']
client = Client()
client.load_client_config(args.model)
client.connect(endpoint_list)
......
wget https://paddle-serving.bj.bcebos.com/bert_example/data-c.txt --no-check-certificate
wget https://paddle-serving.bj.bcebos.com/bert_example/vocab.txt --no-check-certificate
此差异已折叠。
......@@ -8,15 +8,10 @@
sh get_data.sh
```
## RPC service
### Start server
``` shell
python test_server.py uci_housing_model/
```
You can alse use the following code to start the RPC service
You can use the following code to start the RPC service
```shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
```
......@@ -29,19 +24,17 @@ The `paddlepaddle` package is used in `test_client.py`, and you may need to down
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
## HTTP service
### Start server
Start a web service with default web service hosting modules:
``` shell
python test_server.py
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim --name uci
```
### Client prediction
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
```
......@@ -32,8 +32,6 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
## HTTP服务
### 开启服务端
......@@ -41,11 +39,11 @@ python test_client.py uci_housing_client/serving_client_conf.prototxt
通过下面的一行代码开启默认web服务:
``` shell
python test_server.py
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim --name uci
```
### 客户端预测
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_server.web_service import WebService
import numpy as np
class UciService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
new_data = np.zeros((len(feed), 1, 13)).astype("float32")
for i, ins in enumerate(feed):
nums = np.array(ins["x"]).reshape(1, 1, 13)
new_data[i] = nums
feed = {"x": new_data}
return feed, fetch, is_batch
uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(
workdir="workdir", port=9393, use_lite=True, use_xpu=True, ir_optim=True)
uci_service.run_rpc_service()
uci_service.run_web_service()
## Prepare
### download model and extract
```
wget https://paddle-inference-dist.bj.bcebos.com/PaddleLite/models_and_data_for_unittests/VGG19.tar.gz
tar zxvf VGG19.tar.gz
```
### convert model
```
python -m paddle_serving_client.convert --dirname VGG19
python3 -m paddle_serving_client.convert --dirname VGG19
```
### or, you can get the serving saved model directly
```
wget https://paddle-serving.bj.bcebos.com/models/xpu/vgg19.tar.gz
tar zxvf vgg19.tar.gz
```
## RPC Service
......@@ -10,7 +20,7 @@ python -m paddle_serving_client.convert --dirname VGG19
### Start Service
```
python -m paddle_serving_server.serve --model serving_server --port 7702 --use_lite --use_xpu --ir_optim
python3 -m paddle_serving_server.serve --model serving_server --port 7702 --use_lite --use_xpu --ir_optim
```
### Client Prediction
......
......@@ -386,8 +386,6 @@ class Server(object):
return
if not os.path.exists(self.server_path):
os.system("touch {}/{}.is_download".format(self.module_path,
folder_name))
print('Frist time run, downloading PaddleServing components ...')
r = os.system('wget ' + bin_url + ' --no-check-certificate')
......@@ -403,9 +401,10 @@ class Server(object):
tar = tarfile.open(tar_name)
tar.extractall()
tar.close()
open(download_flag, "a").close()
except:
if os.path.exists(exe_path):
os.remove(exe_path)
if os.path.exists(self.server_path):
os.remove(self.server_path)
raise SystemExit(
'Decompressing failed, please check your permission of {} or disk space left.'
.format(self.module_path))
......
......@@ -148,7 +148,7 @@ function before_hook() {
setproxy
unsetproxy
cd ${build_path}/python
python3.6 -m pip install --upgrade pip
python3.6 -m pip install --upgrade pip==20.0.1
python3.6 -m pip install requests
python3.6 -m pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple
python3.6 -m pip install numpy==1.16.4
......@@ -323,7 +323,7 @@ function bert_rpc_cpu() {
link_data ${data_dir}
sed -i 's/8860/8861/g' bert_client.py
python3.6 -m paddle_serving_server.serve --model bert_seq128_model/ --port 8861 > ${dir}server_log.txt 2>&1 &
check_result server 3
check_result server 5
cp data-c.txt.1 data-c.txt
head data-c.txt | python3.6 bert_client.py --model bert_seq128_client/serving_client_conf.prototxt > ${dir}client_log.txt 2>&1
check_result client "bert_CPU_RPC server test completed"
......@@ -338,7 +338,7 @@ function pipeline_imagenet() {
data_dir=${data}imagenet/
link_data ${data_dir}
python3.6 resnet50_web_service.py > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 pipeline_rpc_client.py > ${dir}client_log.txt 2>&1
check_result client "pipeline_imagenet_GPU_RPC server test completed"
......@@ -355,7 +355,7 @@ function ResNet50_rpc() {
link_data ${data_dir}
sed -i 's/9696/8863/g' resnet50_rpc_client.py
python3.6 -m paddle_serving_server.serve --model ResNet50_vd_model --port 8863 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt > ${dir}client_log.txt 2>&1
check_result client "ResNet50_GPU_RPC server test completed"
......@@ -372,7 +372,7 @@ function ResNet101_rpc() {
link_data ${data_dir}
sed -i "22cclient.connect(['127.0.0.1:8864'])" image_rpc_client.py
python3.6 -m paddle_serving_server.serve --model ResNet101_vd_model --port 8864 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 image_rpc_client.py ResNet101_vd_client_config/serving_client_conf.prototxt > ${dir}client_log.txt 2>&1
check_result client "ResNet101_GPU_RPC server test completed"
......@@ -482,7 +482,7 @@ function cascade_rcnn_rpc() {
link_data ${data_dir}
sed -i "s/9292/8879/g" test_client.py
python3.6 -m paddle_serving_server.serve --model serving_server --port 8879 --gpu_ids 0 --thread 2 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 test_client.py > ${dir}client_log.txt 2>&1
nvidia-smi
......@@ -499,7 +499,7 @@ function deeplabv3_rpc() {
link_data ${data_dir}
sed -i "s/9494/8880/g" deeplabv3_client.py
python3.6 -m paddle_serving_server.serve --model deeplabv3_server --gpu_ids 0 --port 8880 --thread 2 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 10
nvidia-smi
python3.6 deeplabv3_client.py > ${dir}client_log.txt 2>&1
nvidia-smi
......@@ -516,7 +516,7 @@ function mobilenet_rpc() {
tar xf mobilenet_v2_imagenet.tar.gz
sed -i "s/9393/8881/g" mobilenet_tutorial.py
python3.6 -m paddle_serving_server.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 8881 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 mobilenet_tutorial.py > ${dir}client_log.txt 2>&1
nvidia-smi
......@@ -533,7 +533,7 @@ function unet_rpc() {
link_data ${data_dir}
sed -i "s/9494/8882/g" seg_client.py
python3.6 -m paddle_serving_server.serve --model unet_model --gpu_ids 0 --port 8882 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 seg_client.py > ${dir}client_log.txt 2>&1
nvidia-smi
......@@ -599,7 +599,7 @@ function criteo_ctr_rpc_gpu() {
link_data ${data_dir}
sed -i "s/8885/8886/g" test_client.py
python3.6 -m paddle_serving_server.serve --model ctr_serving_model/ --port 8886 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
nvidia-smi
python3.6 test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0 > ${dir}client_log.txt 2>&1
nvidia-smi
......@@ -617,7 +617,7 @@ function yolov4_rpc_gpu() {
sed -i "s/9393/8887/g" test_client.py
python3.6 -m paddle_serving_server.serve --model yolov4_model --port 8887 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
nvidia-smi
check_result server 5
check_result server 8
python3.6 test_client.py 000000570688.jpg > ${dir}client_log.txt 2>&1
nvidia-smi
check_result client "yolov4_GPU_RPC server test completed"
......@@ -634,7 +634,7 @@ function senta_rpc_cpu() {
sed -i "s/9393/8887/g" test_client.py
python3.6 -m paddle_serving_server.serve --model yolov4_model --port 8887 --gpu_ids 0 > ${dir}server_log.txt 2>&1 &
nvidia-smi
check_result server 5
check_result server 8
python3.6 test_client.py 000000570688.jpg > ${dir}client_log.txt 2>&1
nvidia-smi
check_result client "senta_GPU_RPC server test completed"
......@@ -724,7 +724,7 @@ function bert_http() {
cp vocab.txt.1 vocab.txt
export CUDA_VISIBLE_DEVICES=0
python3.6 bert_web_service.py bert_seq128_model/ 8878 > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 8
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:8878/bert/prediction > ${dir}client_log.txt 2>&1
check_result client "bert_GPU_HTTP server test completed"
kill_server_process
......@@ -762,7 +762,7 @@ function grpc_yolov4() {
link_data ${data_dir}
echo -e "${GREEN_COLOR}grpc_impl_example_yolov4_GPU_gRPC server started${RES}"
python3.6 -m paddle_serving_server.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang > ${dir}server_log.txt 2>&1 &
check_result server 5
check_result server 10
echo -e "${GREEN_COLOR}grpc_impl_example_yolov4_GPU_gRPC client started${RES}"
python3.6 test_client.py 000000570688.jpg > ${dir}client_log.txt 2>&1
check_result client "grpc_yolov4_GPU_GRPC server test completed"
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册