未验证 提交 eb0eb46c 编写于 作者: T TeslaZhao 提交者: GitHub

Merge pull request #1766 from TeslaZhao/develop

Update examples of pipeline low precison
# Imagenet Pipeline WebService
# Low precsion examples of python pipeline
This document will takes Imagenet service as an example to introduce how to use Pipeline WebService.
Here we take the ResNet50 quantization model as an example to introduce the low-precision deployment case of Python Pipline.
## Get model
## 1.Get model
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz
```
## Start server
## 2.Save model var for serving
```
python3 -m paddle_serving_client.convert --dirname ResNet50_quant --serving_server serving_server --serving_client serving_client
```
## 3.Start server
```
python3 resnet50_web_service.py &>log.txt &
```
## RPC test
## 4.Test
```
python3 pipeline_rpc_client.py
python3 pipeline_http_client.py
```
# Imagenet Pipeline WebService
# Python Pipeline 低精度部署案例
这里以 Imagenet 服务为例来介绍 Pipeline WebService 的使用
这里以 ResNet50 量化模型为例,介绍 Python Pipline 低精度量化模型部署案例
## 获取模型
## 1.获取模型
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz
```
## 启动服务
## 2.保存模型参数
```
python3 -m paddle_serving_client.convert --dirname ResNet50_quant --serving_server serving_server --serving_client serving_client
```
## 3.启动服务
```
python3 resnet50_web_service.py &>log.txt &
```
## 测试
## 4.测试
```
python3 pipeline_rpc_client.py
python3 pipeline_http_client.py
```
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
worker_num: 1
worker_num: 10
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
http_port: 18080
......@@ -21,7 +21,7 @@ op:
model_config: serving_server/
#计算硬件类型: 空缺时由devices决定(CPU/GPU),0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 1
device_type: 2
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
devices: "0" # "0,1"
......@@ -30,15 +30,15 @@ op:
client_type: local_predictor
#Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["score"]
fetch_list: ["save_infer_model/scale_0.tmp_0"]
#precsion, 预测精度,降低预测精度可提升预测速度
#GPU 支持: "fp32"(default), "fp16", "int8";
#CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
precision: "fp32"
precision: "int8"
#开启 TensorRT calibration
use_calib: True
#开启 TensorRT calibration, 量化模型要设置 use_calib: False, 非量化模型离线生成int8需要开启 use_calib: True
use_calib: False
#开启 ir_optim
ir_optim: True
......@@ -47,7 +47,7 @@ class ImagenetOp(Op):
return {"image": input_imgs}, False, None, ""
def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
score_list = fetch_dict["score"]
score_list = fetch_dict["save_infer_model/scale_0.tmp_0"]
result = {"label": [], "prob": []}
for score in score_list:
score = score.tolist()
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册