未验证 提交 456b9fed 编写于 作者: G Guanghua Yu 提交者: GitHub

update act paddle inference demo (#1432)

Co-authored-by: Nceci3 <ceci3@users.noreply.github.com>
上级 fe33833a
...@@ -7,8 +7,7 @@ ...@@ -7,8 +7,7 @@
- [3.1 环境准备](#31-准备环境) - [3.1 环境准备](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集) - [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型) - [3.3 准备预测模型](#33-准备预测模型)
- [3.4 测试模型精度](#34-测试模型精度) - [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
- [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署) - [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ) - [5.FAQ](5FAQ)
...@@ -110,23 +109,52 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log - ...@@ -110,23 +109,52 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
--config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/' --config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
``` ```
#### 3.5 测试模型精度
使用eval.py脚本得到模型的mAP: ## 4.预测部署
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
```
**注意** #### 4.1 Paddle Inference 验证性能
- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
## 4.预测部署 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
- 如果模型包含NMS,可以参考[PaddleDetection部署教程](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy),GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。 | 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| reader_config | eval时模型reader的配置文件路径 |
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn``````use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括`fp32/fp16/int8` |
- 模型为PPYOLOE,同时不包含NMS,使用以下预测demo进行部署:
- Paddle-TensorRT C++部署 - TensorRT预测:
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
```shell
python paddle_inference_eval.py \
--model_path=models/ppyoloe_crn_l_300e_coco_quant \
--reader_config=configs/yoloe_reader.yml \
--use_trt=True \
--precision=int8
```
- MKLDNN预测:
```shell
python paddle_inference_eval.py \
--model_path=models/ppyoloe_crn_l_300e_coco_quant \
--reader_config=configs/yoloe_reader.yml \
--device=CPU \
--use_mkldnn=True \
--cpu_threads=10 \
--precision=int8
```
- 模型为PPYOLOE,同时不包含NMS,可以使用C++预测demo进行测速:
进入[cpp_infer](./cpp_infer_ppyoloe)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer_ppyoloe/README.md)进行准备环境及编译,然后开始测试: 进入[cpp_infer](./cpp_infer_ppyoloe)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer_ppyoloe/README.md)进行准备环境及编译,然后开始测试:
```shell ```shell
...@@ -136,14 +164,6 @@ python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml ...@@ -136,14 +164,6 @@ python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
./build/trt_run --model_file ppyoloe_s_quant/model.pdmodel --params_file ppyoloe_s_quant/model.pdiparams --run_mode=trt_int8 ./build/trt_run --model_file ppyoloe_s_quant/model.pdmodel --params_file ppyoloe_s_quant/model.pdiparams --run_mode=trt_int8
``` ```
- Paddle-TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
```
## 5.FAQ ## 5.FAQ
- 如果想对模型进行离线量化,可进入[Detection模型离线量化示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/detection)中进行实验。 - 如果想对模型进行离线量化,可进入[Detection模型离线量化示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/detection)中进行实验。
...@@ -113,48 +113,56 @@ python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' - ...@@ -113,48 +113,56 @@ python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' -
注意 ```learning rate``````batch size``` 呈线性关系,这里单卡 ```batch size``` 为32,对应的 ```learning rate``` 为0.015,那么如果 ```batch size``` 减小4倍改为8,```learning rate``` 也需除以4;多卡时 ```batch size``` 为32,```learning rate``` 需乘上卡数。所以改变 ```batch size``` 或改变训练卡数都需要对应修改 ```learning rate``` 注意 ```learning rate``````batch size``` 呈线性关系,这里单卡 ```batch size``` 为32,对应的 ```learning rate``` 为0.015,那么如果 ```batch size``` 减小4倍改为8,```learning rate``` 也需除以4;多卡时 ```batch size``` 为32,```learning rate``` 需乘上卡数。所以改变 ```batch size``` 或改变训练卡数都需要对应修改 ```learning rate```
**验证精度**
根据训练log可以看到模型验证的精度,若需再次验证精度,修改配置文件```./configs/MobileNetV1/qat_dis.yaml```中所需验证模型的文件夹路径及模型和参数名称```model_dir, model_filename, params_filename```,然后使用以下命令进行验证:
```shell
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path='./configs/MobileNetV1/qat_dis.yaml'
```
## 4.预测部署 ## 4.预测部署
#### 4.1 Python预测推理
环境配置:若使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) #### 4.1 Paddle Inference 验证性能
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数: 以下字段用于配置预测参数:
- ```inference_model_dir```:inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
- ```model_filename```:inference_model_dir文件夹下的模型文件名称 | 参数名 | 含义 |
- ```params_filename```:inference_model_dir文件夹下的参数文件名称 |:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件 |
- ```batch_size```:预测一个batch的大小 | model_filename | inference_model_dir文件夹下的模型文件名称 |
- ```image_size```:输入图像的大小 | params_filename | inference_model_dir文件夹下的参数文件名称 |
- ```use_tensorrt```:是否使用 TesorRT 预测引擎 | data_path | 数据集路径 |
- ```use_gpu```:是否使用 GPU 预测 | batch_size | 预测一个batch的大小 |
- ```enable_mkldnn```:是否启用```MKL-DNN```加速库,注意```enable_mkldnn``````use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 | image_size | 输入图像的大小 |
- ```use_fp16```:是否启用```FP16``` | use_gpu | 是否使用 GPU 预测 |
- ```use_int8```:是否启用```INT8``` | use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn``````use_gpu```同时为```True```时,将忽略```use_mkldnn```,而使用```GPU```预测 |
| cpu_num_threads | CPU预测时,使用CPU线程数量,默认10 |
| use_fp16 | 使用TensorRT时,是否启用```FP16``` |
| use_int8 | 是否启用```INT8``` |
注意: 注意:
- 请注意模型的输入数据尺寸,如InceptionV3输入尺寸为299,部分模型需要修改参数:```image_size``` - 请注意模型的输入数据尺寸,如InceptionV3输入尺寸为299,部分模型需要修改参数:```image_size```
- 如果希望提升评测模型速度,使用 ```GPU``` 评测时,建议开启 ```TensorRT``` 加速预测,使用 ```CPU``` 评测时,建议开启 ```MKL-DNN``` 加速预测
准备好inference模型后,使用以下命令进行预测: - TensorRT预测:
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
```shell
python paddle_inference_eval.py \
--model_path=models/ResNet50_vd_QAT \
--use_trt=True \
--use_int8=True \
--use_gpu=True \
--data_path=./dataset/ILSVRC2012/
```
- MKLDNN预测:
```shell ```shell
python infer.py --model_dir='MobileNetV1_infer' \ python paddle_inference_eval.py \
--model_filename='inference.pdmodel' \ --model_path=models/ResNet50_vd_QAT \
--model_filename='inference.pdiparams' \ --data_path=./dataset/ILSVRC2012/ \
--eval=True \ --cpu_num_threads=10 \
--use_gpu=True \ --use_mkldnn=True \
--enable_mkldnn=True \ --use_int8=True
--use_int8=True
``` ```
#### 4.2 PaddleLite端侧部署 #### 4.2 PaddleLite端侧部署
......
...@@ -13,76 +13,72 @@ ...@@ -13,76 +13,72 @@
# limitations under the License. # limitations under the License.
import os import os
import numpy as np
import cv2
import time import time
import sys import sys
import argparse import argparse
import numpy as np
import cv2
import yaml import yaml
from tqdm import tqdm
from utils import preprocess, postprocess
import paddle import paddle
from paddle.inference import create_predictor from paddle.inference import create_predictor
from paddleslim.common import load_config
from paddle.io import DataLoader from paddle.io import DataLoader
from imagenet_reader import ImageNetDataset, process_image from imagenet_reader import ImageNetDataset
def argsparser(): def argsparser():
"""
argsparser func
"""
parser = argparse.ArgumentParser(description=__doc__) parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument( parser.add_argument(
'--model_dir', "--model_path",
type=str, type=str,
default='./MobileNetV1_infer', default="./MobileNetV1_infer",
help='model directory') help="model directory")
parser.add_argument( parser.add_argument(
'--model_filename', "--model_filename",
type=str, type=str,
default='inference.pdmodel', default="inference.pdmodel",
help='model file name') help="model file name")
parser.add_argument( parser.add_argument(
'--params_filename', "--params_filename",
type=str, type=str,
default='inference.pdiparams', default="inference.pdiparams",
help='params file name') help="params file name")
parser.add_argument('--batch_size', type=int, default=1) parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument('--img_size', type=int, default=224) parser.add_argument("--img_size", type=int, default=224)
parser.add_argument('--resize_size', type=int, default=256) parser.add_argument("--resize_size", type=int, default=256)
parser.add_argument( parser.add_argument(
'--eval', type=bool, default=False, help='Whether to evaluate') "--data_path", type=str, default="./dataset/ILSVRC2012/")
parser.add_argument('--data_path', type=str, default='./ILSVRC2012/')
parser.add_argument( parser.add_argument(
'--use_gpu', type=bool, default=False, help='Whether to use gpu') "--use_gpu", type=bool, default=False, help="Whether to use gpu")
parser.add_argument( parser.add_argument(
'--enable_mkldnn', "--use_trt", type=bool, default=False, help="Whether to use tensorrt")
type=bool,
default=False,
help='Whether to use mkldnn')
parser.add_argument( parser.add_argument(
'--cpu_num_threads', type=int, default=10, help='Number of cpu threads') "--use_mkldnn", type=bool, default=False, help="Whether to use mkldnn")
parser.add_argument( parser.add_argument(
'--use_fp16', type=bool, default=False, help='Whether to use fp16') "--cpu_num_threads", type=int, default=10, help="Number of cpu threads")
parser.add_argument( parser.add_argument(
'--use_int8', type=bool, default=False, help='Whether to use int8') "--use_fp16", type=bool, default=False, help="Whether to use fp16")
parser.add_argument( parser.add_argument(
'--use_tensorrt', "--use_int8", type=bool, default=False, help="Whether to use int8")
type=bool, parser.add_argument("--gpu_mem", type=int, default=8000, help="GPU memory")
default=True, parser.add_argument("--ir_optim", type=bool, default=True)
help='Whether to use tensorrt')
parser.add_argument( parser.add_argument(
'--enable_profile', "--use_dynamic_shape",
type=bool, type=bool,
default=False, default=True,
help='Whether to enable profile') help="Whether use dynamic shape or not.")
parser.add_argument('--gpu_mem', type=int, default=8000, help='GPU memory')
parser.add_argument('--ir_optim', type=bool, default=True)
return parser return parser
def eval_reader(data_dir, batch_size, crop_size, resize_size): def eval_reader(data_dir, batch_size, crop_size, resize_size):
"""
eval reader func
"""
val_reader = ImageNetDataset( val_reader = ImageNetDataset(
mode='val', mode="val",
data_dir=data_dir, data_dir=data_dir,
crop_size=crop_size, crop_size=crop_size,
resize_size=resize_size) resize_size=resize_size)
...@@ -96,14 +92,17 @@ def eval_reader(data_dir, batch_size, crop_size, resize_size): ...@@ -96,14 +92,17 @@ def eval_reader(data_dir, batch_size, crop_size, resize_size):
class Predictor(object): class Predictor(object):
def __init__(self, args): """
Paddle Inference Predictor class
"""
def __init__(self):
# HALF precission predict only work when using tensorrt # HALF precission predict only work when using tensorrt
if args.use_fp16 is True: if args.use_fp16 is True:
assert args.use_tensorrt is True assert args.use_trt is True
self.args = args
self.paddle_predictor = self.create_paddle_predictor() self.rerun_flag = False
self.paddle_predictor = self._create_paddle_predictor()
input_names = self.paddle_predictor.get_input_names() input_names = self.paddle_predictor.get_input_names()
self.input_tensor = self.paddle_predictor.get_input_handle(input_names[ self.input_tensor = self.paddle_predictor.get_input_handle(input_names[
0]) 0])
...@@ -112,121 +111,140 @@ class Predictor(object): ...@@ -112,121 +111,140 @@ class Predictor(object):
self.output_tensor = self.paddle_predictor.get_output_handle( self.output_tensor = self.paddle_predictor.get_output_handle(
output_names[0]) output_names[0])
def create_paddle_predictor(self): def _create_paddle_predictor(self):
inference_model_dir = self.args.model_dir inference_model_dir = args.model_path
model_file = os.path.join(inference_model_dir, self.args.model_filename) model_file = os.path.join(inference_model_dir, args.model_filename)
params_file = os.path.join(inference_model_dir, params_file = os.path.join(inference_model_dir, args.params_filename)
self.args.params_filename)
config = paddle.inference.Config(model_file, params_file) config = paddle.inference.Config(model_file, params_file)
precision = paddle.inference.Config.Precision.Float32 precision = paddle.inference.Config.Precision.Float32
if self.args.use_int8: if args.use_int8:
precision = paddle.inference.Config.Precision.Int8 precision = paddle.inference.Config.Precision.Int8
elif self.args.use_fp16: elif args.use_fp16:
precision = paddle.inference.Config.Precision.Half precision = paddle.inference.Config.Precision.Half
if self.args.use_gpu: if args.use_gpu:
config.enable_use_gpu(self.args.gpu_mem, 0) config.enable_use_gpu(args.gpu_mem, 0)
else: else:
config.disable_gpu() config.disable_gpu()
if self.args.enable_mkldnn: config.set_cpu_math_library_num_threads(args.cpu_num_threads)
# cache 10 different shapes for mkldnn to avoid memory leak config.switch_ir_optim()
config.set_mkldnn_cache_capacity(10) if args.use_mkldnn:
config.enable_mkldnn() config.enable_mkldnn()
config.set_cpu_math_library_num_threads(self.args.cpu_num_threads) if args.use_int8:
config.enable_mkldnn_int8(
{"conv2d", "depthwise_conv2d", "transpose2", "pool2d"})
if self.args.enable_profile: config.switch_ir_optim(args.ir_optim) # default true
config.enable_profile() if args.use_trt:
config.switch_ir_optim(self.args.ir_optim) # default true
if self.args.use_tensorrt:
config.enable_tensorrt_engine( config.enable_tensorrt_engine(
precision_mode=precision, precision_mode=precision,
max_batch_size=self.args.batch_size, max_batch_size=args.batch_size,
workspace_size=1 << 30, workspace_size=1 << 30,
min_subgraph_size=30, min_subgraph_size=30,
use_calib_mode=False) use_static=True,
use_calib_mode=False, )
if args.use_dynamic_shape:
dynamic_shape_file = os.path.join(inference_model_dir,
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(
dynamic_shape_file, True)
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
self.rerun_flag = True
config.enable_memory_optim() config.enable_memory_optim()
# use zero copy
config.switch_use_feed_fetch_ops(False)
predictor = create_predictor(config) predictor = create_predictor(config)
return predictor return predictor
def predict(self):
test_num = 1000
test_time = 0.0
for i in range(0, test_num + 10):
inputs = np.random.rand(self.args.batch_size, 3, self.args.img_size,
self.args.img_size).astype(np.float32)
start_time = time.time()
self.input_tensor.copy_from_cpu(inputs)
self.paddle_predictor.run()
batch_output = self.output_tensor.copy_to_cpu().flatten()
if i >= 10:
test_time += time.time() - start_time
time.sleep(0.01) # sleep for T4 GPU
fp_message = "FP16" if self.args.use_fp16 else "FP32"
fp_message = "INT8" if self.args.use_int8 else fp_message
trt_msg = "using tensorrt" if self.args.use_tensorrt else "not using tensorrt"
print("{0}\t{1}\tbatch size: {2}\ttime(ms): {3}".format(
trt_msg, fp_message, args.batch_size, 1000 * test_time / test_num))
def eval(self): def eval(self):
if os.path.exists(self.args.data_path): """
eval func
"""
if os.path.exists(args.data_path):
val_loader = eval_reader( val_loader = eval_reader(
self.args.data_path, args.data_path,
batch_size=self.args.batch_size, batch_size=args.batch_size,
crop_size=self.args.img_size, crop_size=args.img_size,
resize_size=self.args.resize_size) resize_size=args.resize_size)
else: else:
image = np.ones((1, 3, self.args.img_size, image = np.ones(
self.args.img_size)).astype(np.float32) (1, 3, args.img_size, args.img_size)).astype(np.float32)
label = None label = None
val_loader = [[image, label]] val_loader = [[image, label]]
results = [] results = []
with tqdm( input_names = self.paddle_predictor.get_input_names()
total=len(val_loader), input_tensor = self.paddle_predictor.get_input_handle(input_names[0])
bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}', output_names = self.paddle_predictor.get_output_names()
ncols=80) as t: output_tensor = self.paddle_predictor.get_output_handle(output_names[0])
for batch_id, (image, label) in enumerate(val_loader): predict_time = 0.0
input_names = self.paddle_predictor.get_input_names() time_min = float("inf")
input_tensor = self.paddle_predictor.get_input_handle( time_max = float("-inf")
input_names[0]) sample_nums = len(val_loader)
output_names = self.paddle_predictor.get_output_names() for batch_id, (image, label) in enumerate(val_loader):
output_tensor = self.paddle_predictor.get_output_handle( image = np.array(image)
output_names[0])
input_tensor.copy_from_cpu(image)
image = np.array(image) start_time = time.time()
self.paddle_predictor.run()
input_tensor.copy_from_cpu(image) batch_output = output_tensor.copy_to_cpu()
self.paddle_predictor.run() end_time = time.time()
batch_output = output_tensor.copy_to_cpu() timed = end_time - start_time
sort_array = batch_output.argsort(axis=1) time_min = min(time_min, timed)
top_1_pred = sort_array[:, -1:][:, ::-1] time_max = max(time_max, timed)
if label is None: predict_time += timed
results.append(top_1_pred) if self.rerun_flag:
break return
label = np.array(label) sort_array = batch_output.argsort(axis=1)
top_1 = np.mean(label == top_1_pred) top_1_pred = sort_array[:, -1:][:, ::-1]
top_5_pred = sort_array[:, -5:][:, ::-1] if label is None:
acc_num = 0 results.append(top_1_pred)
for i in range(len(label)): break
if label[i][0] in top_5_pred[i]: label = np.array(label)
acc_num += 1 top_1 = np.mean(label == top_1_pred)
top_5 = float(acc_num) / len(label) top_5_pred = sort_array[:, -5:][:, ::-1]
results.append([top_1, top_5]) acc_num = 0
for i, _ in enumerate(label):
result = np.mean(np.array(results), axis=0) if label[i][0] in top_5_pred[i]:
t.update() acc_num += 1
print('Evaluation result: {}'.format(result[0])) top_5 = float(acc_num) / len(label)
results.append([top_1, top_5])
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
result = np.mean(np.array(results), axis=0)
fp_message = "FP16" if args.use_fp16 else "FP32"
fp_message = "INT8" if args.use_int8 else fp_message
print_msg = "Paddle"
if args.use_trt:
print_msg = "using TensorRT"
elif args.use_mkldnn:
print_msg = "using MKLDNN"
time_avg = predict_time / sample_nums
print(
"[Benchmark]{}\t{}\tbatch size: {}.Inference time(ms): min={}, max={}, avg={}".
format(
print_msg,
fp_message,
args.batch_size,
round(time_min * 1000, 2),
round(time_max * 1000, 1),
round(time_avg * 1000, 1), ))
print("[Benchmark] Evaluation acc result: {}".format(result[0]))
sys.stdout.flush()
if __name__ == "__main__": if __name__ == "__main__":
parser = argsparser() parser = argsparser()
global args
args = parser.parse_args() args = parser.parse_args()
predictor = Predictor(args) predictor = Predictor()
predictor.predict() predictor.eval()
if args.eval: if predictor.rerun_flag:
predictor.eval() print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
...@@ -194,25 +194,41 @@ Quantization: ...@@ -194,25 +194,41 @@ Quantization:
## 5. 预测部署 ## 5. 预测部署
- Python部署: 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[infer.py](./infer.py)进行部署 - TensorRT预测
本示例将以ERNIE 3.0-Medium模型、afqmc数据集的为例,介绍如何利用Paddle—TensorRT测试压缩后模型的精度和速度。 环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
精度测试方法 首先下载量化好的模型
```shell ```shell
python infer.py --task_name='afqmc' --model_path='./save_ernie3.0_afqmc/' --device='gpu' --use_trt --int8 wget https://bj.bcebos.com/v1/paddle-slim-models/act/save_ppminilm_afqmc_new_calib.tar
tar -xf save_ppminilm_afqmc_new_calib.tar
``` ```
速度测试方法
```shell ```shell
python infer.py --task_name='afqmc' --model_path='./save_ernie3.0_afqmc/' --device='gpu' --use_trt --int8 --perf python paddle_inference_eval.py \
--model_path=save_ernie3_afqmc_new_cablib \
--model_filename=infer.pdmodel \
--params_filename=infer.pdiparams \
--task_name='afqmc' \
--use_trt \
--precision=int8
``` ```
- [PP-MiniLM Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/model_compression/pp-minilm) - MKLDNN预测:
- [ERNIE-3.0 Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0)
```shell
python paddle_inference_eval.py \
--model_path=save_ernie3_afqmc_new_cablib \
--model_filename=infer.pdmodel \
--params_filename=infer.pdiparams \
--task_name='afqmc' \
--device=cpu \
--use_mkldnn=True \
--cpu_threads=10 \
--precision=int8
```
## 6. FAQ ## 6. FAQ
...@@ -45,96 +45,42 @@ METRIC_CLASSES = { ...@@ -45,96 +45,42 @@ METRIC_CLASSES = {
} }
def convert_example(example, dataset, tokenizer, label_list,
max_seq_length=512):
assert dataset in ['glue', 'clue'
], "This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if dataset == 'glue':
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example['labels']
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
return example['input_ids'], example['token_type_ids'], label
else: #if dataset == 'clue':
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
example['label'] = np.array(
example["label"], dtype="int64").reshape((-1, 1))
label = example['label']
# Convert raw text to feature
if 'keyword' in example: # CSL
sentence1 = " ".join(example['keyword'])
example = {
'sentence1': sentence1,
'sentence2': example['abst'],
'label': example['label']
}
elif 'target' in example: # wsc
text, query, pronoun, query_idx, pronoun_idx = example[
'text'], example['target']['span1_text'], example['target'][
'span2_text'], example['target']['span1_index'], example[
'target']['span2_index']
text_list = list(text)
assert text[pronoun_idx:(pronoun_idx + len(
pronoun))] == pronoun, "pronoun: {}".format(pronoun)
assert text[query_idx:(query_idx + len(query)
)] == query, "query: {}".format(query)
if pronoun_idx > query_idx:
text_list.insert(query_idx, "_")
text_list.insert(query_idx + len(query) + 1, "_")
text_list.insert(pronoun_idx + 2, "[")
text_list.insert(pronoun_idx + len(pronoun) + 2 + 1, "]")
else:
text_list.insert(pronoun_idx, "[")
text_list.insert(pronoun_idx + len(pronoun) + 1, "]")
text_list.insert(query_idx + 2, "_")
text_list.insert(query_idx + len(query) + 2 + 1, "_")
text = "".join(text_list)
example['sentence'] = text
if tokenizer is None:
return example
if 'sentence' in example:
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
elif 'sentence1' in example:
example = tokenizer(
example['sentence1'],
text_pair=example['sentence2'],
max_seq_len=max_seq_length)
return example['input_ids'], example['token_type_ids'], label
def parse_args(): def parse_args():
"""
parse_args func
"""
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument(
# Required parameters "--model_path",
default="./afqmc",
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument(
"--model_filename",
type=str,
default="inference.pdmodel",
help="model file name")
parser.add_argument(
"--params_filename",
type=str,
default="inference.pdiparams",
help="params file name")
parser.add_argument( parser.add_argument(
"--task_name", "--task_name",
default='afqmc', default="afqmc",
type=str, type=str,
help="The name of the task to perform predict, selected in the list: " + help="The name of the task to perform predict, selected in the list: " +
", ".join(METRIC_CLASSES.keys()), ) ", ".join(METRIC_CLASSES.keys()), )
parser.add_argument( parser.add_argument(
"--dataset", "--dataset",
default='clue', default="clue",
type=str, type=str,
help="The dataset of model.", ) help="The dataset of model.", )
parser.add_argument(
"--model_path",
default='./quant_models/model',
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument( parser.add_argument(
"--device", "--device",
default="gpu", default="gpu",
choices=["gpu", "cpu", "xpu"], choices=["gpu", "cpu"],
help="Device selected for inference.", ) help="Device selected for inference.", )
parser.add_argument( parser.add_argument(
"--batch_size", "--batch_size",
...@@ -154,25 +100,101 @@ def parse_args(): ...@@ -154,25 +100,101 @@ def parse_args():
help="Warmup steps for performance test.", ) help="Warmup steps for performance test.", )
parser.add_argument( parser.add_argument(
"--use_trt", "--use_trt",
action='store_true', action="store_true",
help="Whether to use inference engin TensorRT.", ) help="Whether to use inference engin TensorRT.", )
parser.add_argument( parser.add_argument(
"--perf", "--precision",
action='store_true', type=str,
help="Whether to test performance.", ) default="fp32",
choices=["fp32", "fp16", "int8"],
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
)
parser.add_argument( parser.add_argument(
"--int8", "--use_mkldnn",
action='store_true', type=bool,
help="Whether to use int8 inference.", ) default=False,
help="Whether use mkldnn or not.")
parser.add_argument( parser.add_argument(
"--fp16", "--cpu_threads", type=int, default=1, help="Num of cpu threads.")
action='store_true',
help="Whether to use float16 inference.", )
args = parser.parse_args() args = parser.parse_args()
return args return args
def _convert_example(example,
dataset,
tokenizer,
label_list,
max_seq_length=512):
assert dataset in ["glue", "clue"
], "This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if dataset == "glue":
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example["labels"]
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
example = tokenizer(example["sentence"], max_seq_len=max_seq_length)
return example["input_ids"], example["token_type_ids"], label
else: # if dataset == 'clue':
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
example["label"] = np.array(
example["label"], dtype="int64").reshape((-1, 1))
label = example["label"]
# Convert raw text to feature
if "keyword" in example: # CSL
sentence1 = " ".join(example["keyword"])
example = {
"sentence1": sentence1,
"sentence2": example["abst"],
"label": example["label"]
}
elif "target" in example: # wsc
text, query, pronoun, query_idx, pronoun_idx = (
example["text"],
example["target"]["span1_text"],
example["target"]["span2_text"],
example["target"]["span1_index"],
example["target"]["span2_index"], )
text_list = list(text)
assert text[pronoun_idx:(pronoun_idx + len(
pronoun))] == pronoun, "pronoun: {}".format(pronoun)
assert text[query_idx:(query_idx + len(query)
)] == query, "query: {}".format(query)
if pronoun_idx > query_idx:
text_list.insert(query_idx, "_")
text_list.insert(query_idx + len(query) + 1, "_")
text_list.insert(pronoun_idx + 2, "[")
text_list.insert(pronoun_idx + len(pronoun) + 2 + 1, "]")
else:
text_list.insert(pronoun_idx, "[")
text_list.insert(pronoun_idx + len(pronoun) + 1, "]")
text_list.insert(query_idx + 2, "_")
text_list.insert(query_idx + len(query) + 2 + 1, "_")
text = "".join(text_list)
example["sentence"] = text
if tokenizer is None:
return example
if "sentence" in example:
example = tokenizer(example["sentence"], max_seq_len=max_seq_length)
elif "sentence1" in example:
example = tokenizer(
example["sentence1"],
text_pair=example["sentence2"],
max_seq_len=max_seq_length)
return example["input_ids"], example["token_type_ids"], label
class Predictor(object): class Predictor(object):
"""
Inference Predictor class
"""
def __init__(self, predictor, input_handles, output_handles): def __init__(self, predictor, input_handles, output_handles):
self.predictor = predictor self.predictor = predictor
self.input_handles = input_handles self.input_handles = input_handles
...@@ -180,60 +202,50 @@ class Predictor(object): ...@@ -180,60 +202,50 @@ class Predictor(object):
@classmethod @classmethod
def create_predictor(cls, args): def create_predictor(cls, args):
config = paddle.inference.Config(args.model_path + "infer.pdmodel", """
args.model_path + "infer.pdiparams") create_predictor func
"""
cls.rerun_flag = False
config = paddle.inference.Config(
os.path.join(args.model_path, args.model_filename),
os.path.join(args.model_path, args.params_filename))
if args.device == "gpu": if args.device == "gpu":
# set GPU configs accordingly # set GPU configs accordingly
config.enable_use_gpu(100, 0) config.enable_use_gpu(100, 0)
cls.device = paddle.set_device("gpu") cls.device = paddle.set_device("gpu")
elif args.device == "cpu": else:
# set CPU configs accordingly,
# such as enable_mkldnn, set_cpu_math_library_num_threads
config.disable_gpu() config.disable_gpu()
cls.device = paddle.set_device("cpu") config.set_cpu_math_library_num_threads(args.cpu_threads)
elif args.device == "xpu": config.switch_ir_optim()
# set XPU configs accordingly if args.use_mkldnn:
config.enable_xpu(100) config.enable_mkldnn()
if args.use_trt: if args.precision == "int8":
if args.int8: config.enable_mkldnn_int8()
config.enable_tensorrt_engine(
workspace_size=1 << 30, precision_map = {
precision_mode=inference.PrecisionType.Int8, "int8": inference.PrecisionType.Int8,
max_batch_size=args.batch_size, "fp32": inference.PrecisionType.Float32,
min_subgraph_size=5, "fp16": inference.PrecisionType.Half,
use_static=False, }
use_calib_mode=False) if args.precision in precision_map.keys() and args.use_trt:
elif args.fp16: config.enable_tensorrt_engine(
config.enable_tensorrt_engine( workspace_size=1 << 30,
workspace_size=1 << 30, max_batch_size=args.batch_size,
precision_mode=inference.PrecisionType.Half, min_subgraph_size=5,
max_batch_size=args.batch_size, precision_mode=precision_map[args.precision],
min_subgraph_size=5, use_static=True,
use_static=False, use_calib_mode=False, )
use_calib_mode=False)
else:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Float32,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
print("Enable TensorRT is: {}".format(
config.tensorrt_engine_enabled()))
dynamic_shape_file = os.path.join(args.model_path, dynamic_shape_file = os.path.join(args.model_path,
'dynamic_shape.txt') "dynamic_shape.txt")
if os.path.exists(dynamic_shape_file): if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file, config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True) True)
print('trt set dynamic shape done!') print("trt set dynamic shape done!")
else: else:
config.collect_shape_range_info(dynamic_shape_file) config.collect_shape_range_info(dynamic_shape_file)
print( print("Start collect dynamic shape...")
'Start collect dynamic shape... Please eval again to get real result in TensorRT' cls.rerun_flag = True
)
sys.exit()
predictor = paddle.inference.create_predictor(config) predictor = paddle.inference.create_predictor(config)
...@@ -249,6 +261,9 @@ class Predictor(object): ...@@ -249,6 +261,9 @@ class Predictor(object):
return cls(predictor, input_handles, output_handles) return cls(predictor, input_handles, output_handles)
def predict_batch(self, data): def predict_batch(self, data):
"""
predict from batch func
"""
for input_field, input_handle in zip(data, self.input_handles): for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field) input_handle.copy_from_cpu(input_field)
self.predictor.run() self.predictor.run()
...@@ -257,11 +272,11 @@ class Predictor(object): ...@@ -257,11 +272,11 @@ class Predictor(object):
] ]
return output return output
def convert_predict_batch(self, args, data, tokenizer, batchify_fn, def _convert_predict_batch(self, args, data, tokenizer, batchify_fn,
label_list): label_list):
examples = [] examples = []
for example in data: for example in data:
example = convert_example( example = _convert_example(
example, example,
args.dataset, args.dataset,
tokenizer, tokenizer,
...@@ -272,64 +287,82 @@ class Predictor(object): ...@@ -272,64 +287,82 @@ class Predictor(object):
return examples return examples
def predict(self, dataset, tokenizer, batchify_fn, args): def predict(self, dataset, tokenizer, batchify_fn, args):
"""
predict func
"""
batches = [ batches = [
dataset[idx:idx + args.batch_size] dataset[idx:idx + args.batch_size]
for idx in range(0, len(dataset), args.batch_size) for idx in range(0, len(dataset), args.batch_size)
] ]
if args.perf:
for i, batch in enumerate(batches):
examples = self.convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, label = batchify_fn(examples)
output = self.predict_batch([input_ids, segment_ids])
if i > args.perf_warmup_steps:
break
start_time = time.time()
for i, batch in enumerate(batches):
examples = self.convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, _ = batchify_fn(examples)
output = self.predict_batch([input_ids, segment_ids])
for i, batch in enumerate(batches):
examples = self._convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, label = batchify_fn(examples)
output = self.predict_batch([input_ids, segment_ids])
if i > args.perf_warmup_steps:
break
if self.rerun_flag:
return
metric = METRIC_CLASSES[args.task_name]()
metric.reset()
predict_time = 0.0
for i, batch in enumerate(batches):
examples = self._convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, label = batchify_fn(examples)
start_time = time.time()
output = self.predict_batch([input_ids, segment_ids])
end_time = time.time() end_time = time.time()
sequences_num = i * args.batch_size predict_time += end_time - start_time
print("task name: %s, time: %s qps/s, " % correct = metric.compute(
(args.task_name, sequences_num / (end_time - start_time))) paddle.to_tensor(output),
paddle.to_tensor(np.array(label).flatten()))
metric.update(correct)
else: sequences_num = i * args.batch_size
metric = METRIC_CLASSES[args.task_name]() print(
metric.reset() "[benchmark]task name: {}, batch size: {} Inference time per batch: {}ms, qps: {}.".
for i, batch in enumerate(batches): format(
examples = self.convert_predict_batch( args.task_name,
args, batch, tokenizer, batchify_fn, dataset.label_list) args.batch_size,
input_ids, segment_ids, label = batchify_fn(examples) round(predict_time * 1000 / i, 2),
output = self.predict_batch([input_ids, segment_ids]) round(sequences_num / predict_time, 2), ))
correct = metric.compute( res = metric.accumulate()
paddle.to_tensor(output), print(
paddle.to_tensor(np.array(label).flatten())) "[benchmark]task name: %s, acc: %s. \n" % (args.task_name, res),
metric.update(correct) end="")
sys.stdout.flush()
res = metric.accumulate()
print("task name: %s, acc: %s, \n" % (args.task_name, res), end='')
def main(): def main():
"""
main func
"""
paddle.seed(42) paddle.seed(42)
args = parse_args() args = parse_args()
args.task_name = args.task_name.lower() args.task_name = args.task_name.lower()
if args.use_mkldnn:
paddle.set_device("cpu")
predictor = Predictor.create_predictor(args) predictor = Predictor.create_predictor(args)
dev_ds = load_dataset('clue', args.task_name, splits='dev') dev_ds = load_dataset("clue", args.task_name, splits="dev")
tokenizer = AutoTokenizer.from_pretrained(args.model_path) tokenizer = AutoTokenizer.from_pretrained(args.model_path)
batchify_fn = lambda samples, fn=Tuple( batchify_fn = lambda samples, fn=Tuple(
Pad(axis=0, pad_val=tokenizer.pad_token_id), # input Pad(axis=0, pad_val=tokenizer.pad_token_id), # input
Pad(axis=0, pad_val=tokenizer.pad_token_id), # segment Pad(axis=0, pad_val=tokenizer.pad_token_id), # segment
Stack(dtype="int64" if dev_ds.label_list else "float32") # label Stack(dtype="int64" if dev_ds.label_list else "float32"), # label
): fn(samples) ): fn(samples)
outputs = predictor.predict(dev_ds, tokenizer, batchify_fn, args) predictor.predict(dev_ds, tokenizer, batchify_fn, args)
if predictor.rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if __name__ == "__main__": if __name__ == "__main__":
paddle.set_device("cpu")
main() main()
...@@ -14,12 +14,9 @@ ...@@ -14,12 +14,9 @@
## 1. 简介 ## 1. 简介
飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。 飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。
本示例将以[Pytorch](https://github.com/pytorch/pytorch)框架的自然语言处理模型为例,介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用[huggingface](https://github.com/huggingface/transformers)开源transformers库,将Pytorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和量化训练。 本示例将以[Pytorch](https://github.com/pytorch/pytorch)框架的自然语言处理模型为例,介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用[huggingface](https://github.com/huggingface/transformers)开源transformers库,将Pytorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和量化训练。
## 2. Benchmark ## 2. Benchmark
[BERT](https://arxiv.org/abs/1810.04805)```Bidirectional Encoder Representations from Transformers```)以Transformer 编码器为网络基本组件,使用掩码语言模型(```Masked Language Model```)和邻接句子预测(```Next Sentence Prediction```)两个任务在大规模无标注文本语料上进行预训练(pre-train),得到融合了双向内容的通用语义表示模型。以预训练产生的通用语义表示模型为基础,结合任务适配的简单输出层,微调(fine-tune)后即可应用到下游的NLP任务,效果通常也较直接在下游的任务上训练的模型更优。此前BERT即在[GLUE](https://gluebenchmark.com/tasks)评测任务上取得了SOTA的结果。 [BERT](https://arxiv.org/abs/1810.04805)```Bidirectional Encoder Representations from Transformers```)以Transformer 编码器为网络基本组件,使用掩码语言模型(```Masked Language Model```)和邻接句子预测(```Next Sentence Prediction```)两个任务在大规模无标注文本语料上进行预训练(pre-train),得到融合了双向内容的通用语义表示模型。以预训练产生的通用语义表示模型为基础,结合任务适配的简单输出层,微调(fine-tune)后即可应用到下游的NLP任务,效果通常也较直接在下游的任务上训练的模型更优。此前BERT即在[GLUE](https://gluebenchmark.com/tasks)评测任务上取得了SOTA的结果。
...@@ -192,41 +189,38 @@ python run.py --config_path=./configs/cola.yaml --eval True ...@@ -192,41 +189,38 @@ python run.py --config_path=./configs/cola.yaml --eval True
## 4. 预测部署 ## 4. 预测部署
环境配置:若使用 Paddle TensorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
启动配置 - TensorRT预测
除需传入```task_name```任务名称,```model_name_or_path```模型名称,```model_path```保存inference模型的路径等基本参数外,还需根据预测环境传入预测参数: 环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
- ```device```:默认为gpu,可选为gpu, cpu, xpu
- ```use_trt```:是否使用 TesorRT 预测引擎
- ```int8```:是否启用```INT8```
- ```fp16```:是否启用```FP16```
准备好inference模型后,可以使用```infer.py```进行预测,如使用 TesorRT 预测引擎测试 FP32 模型: 首先下载量化好的模型:
```shell ```shell
python -u ./infer.py \ wget https://bj.bcebos.com/v1/paddle-slim-models/act/x2paddle_cola_new_calib.tar
--task_name cola \ tar -xf x2paddle_cola_new_calib.tar
--model_name_or_path bert-base-cased \
--model_path ./x2paddle_cola/model \
--batch_size 1 \
--max_seq_length 128 \
--device gpu \
--use_trt
``` ```
如使用 TesorRT 预测引擎测试 INT8 模型:
```shell ```shell
python -u ./infer.py \ python paddle_inference_eval.py \
--task_name cola \ --model_path=x2paddle_cola_new_calib \
--model_name_or_path bert-base-cased \ --use_trt \
--model_path ./output/cola/model \ --precision=int8 \
--batch_size 1 \ --batch_size=1
--max_seq_length 128 \
--device gpu \
--use_trt \
--int8
``` ```
- MKLDNN预测:
```shell
python paddle_inference_eval.py \
--model_path=x2paddle_cola_new_calib \
--device=cpu \
--use_mkldnn=True \
--cpu_threads=10 \
--batch_size=1 \
--precision=int8
```
......
...@@ -22,9 +22,9 @@ import numpy as np ...@@ -22,9 +22,9 @@ import numpy as np
import paddle import paddle
from paddle import inference from paddle import inference
from paddle.metric import Metric, Accuracy, Precision, Recall
from paddlenlp.datasets import load_dataset from paddlenlp.datasets import load_dataset
from paddlenlp.data import Stack, Tuple, Pad from paddlenlp.data import Stack, Tuple, Pad
from paddle.metric import Metric, Accuracy, Precision, Recall
from paddlenlp.metrics import AccuracyAndF1, Mcc, PearsonAndSpearman from paddlenlp.metrics import AccuracyAndF1, Mcc, PearsonAndSpearman
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer
...@@ -53,35 +53,46 @@ task_to_keys = { ...@@ -53,35 +53,46 @@ task_to_keys = {
def parse_args(): def parse_args():
"""
parse_args func
"""
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument(
# Required parameters "--model_path",
default="./x2paddle_cola",
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument(
"--model_filename",
type=str,
default="model.pdmodel",
help="model file name")
parser.add_argument(
"--params_filename",
type=str,
default="model.pdiparams",
help="params file name")
parser.add_argument( parser.add_argument(
"--task_name", "--task_name",
default='cola', default="cola",
type=str, type=str,
help="The name of the task to perform predict, selected in the list: " + help="The name of the task to perform predict, selected in the list: " +
", ".join(METRIC_CLASSES.keys()), ) ", ".join(METRIC_CLASSES.keys()), )
parser.add_argument( parser.add_argument(
"--model_type", "--model_type",
default='bert-base-cased', default="bert-base-cased",
type=str, type=str,
help="Model type selected in bert.") help="Model type selected in bert.")
parser.add_argument( parser.add_argument(
"--model_name_or_path", "--model_name_or_path",
default='bert-base-cased', default="bert-base-cased",
type=str, type=str,
help="The directory or name of model.", ) help="The directory or name of model.", )
parser.add_argument(
"--model_path",
default='./quant_models/model',
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument( parser.add_argument(
"--device", "--device",
default="gpu", default="gpu",
choices=["gpu", "cpu", "xpu"], choices=["gpu", "cpu"],
help="Device selected for inference.", ) help="Device selected for inference.", )
parser.add_argument( parser.add_argument(
"--batch_size", "--batch_size",
...@@ -101,42 +112,45 @@ def parse_args(): ...@@ -101,42 +112,45 @@ def parse_args():
help="Warmup steps for performance test.", ) help="Warmup steps for performance test.", )
parser.add_argument( parser.add_argument(
"--use_trt", "--use_trt",
action='store_true', action="store_true",
help="Whether to use inference engin TensorRT.", ) help="Whether to use inference engin TensorRT.", )
parser.add_argument( parser.add_argument(
"--perf", "--precision",
action='store_true', type=str,
help="Whether to test performance.", ) default="fp32",
choices=["fp32", "fp16", "int8"],
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
)
parser.add_argument( parser.add_argument(
"--int8", "--use_mkldnn",
action='store_true', type=bool,
help="Whether to use int8 inference.", ) default=False,
help="Whether use mkldnn or not.")
parser.add_argument( parser.add_argument(
"--fp16", "--cpu_threads", type=int, default=1, help="Num of cpu threads.")
action='store_true',
help="Whether to use float16 inference.", )
args = parser.parse_args() args = parser.parse_args()
return args return args
def convert_example(example, def _convert_example(
tokenizer, example,
label_list, tokenizer,
max_seq_length=512, label_list,
task_name=None, max_seq_length=512,
is_test=False, task_name=None,
padding='max_length', is_test=False,
return_attention_mask=True): padding="max_length",
return_attention_mask=True, ):
if not is_test: if not is_test:
# `label_list == None` is for regression task # `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32" label_dtype = "int64" if label_list else "float32"
# Get the label # Get the label
label = example['labels'] label = example["labels"]
label = np.array([label], dtype=label_dtype) label = np.array([label], dtype=label_dtype)
# Convert raw text to feature # Convert raw text to feature
sentence1_key, sentence2_key = task_to_keys[task_name] sentence1_key, sentence2_key = task_to_keys[task_name]
texts = ((example[sentence1_key], ) if sentence2_key is None else texts = (example[sentence1_key], ) if sentence2_key is None else (
(example[sentence1_key], example[sentence2_key])) example[sentence1_key], example[sentence2_key])
example = tokenizer( example = tokenizer(
*texts, *texts,
max_seq_len=max_seq_length, max_seq_len=max_seq_length,
...@@ -144,19 +158,23 @@ def convert_example(example, ...@@ -144,19 +158,23 @@ def convert_example(example,
return_attention_mask=return_attention_mask) return_attention_mask=return_attention_mask)
if not is_test: if not is_test:
if return_attention_mask: if return_attention_mask:
return example['input_ids'], example['attention_mask'], example[ return example["input_ids"], example["attention_mask"], example[
'token_type_ids'], label "token_type_ids"], label
else: else:
return example['input_ids'], example['token_type_ids'], label return example["input_ids"], example["token_type_ids"], label
else: else:
if return_attention_mask: if return_attention_mask:
return example['input_ids'], example['attention_mask'], example[ return example["input_ids"], example["attention_mask"], example[
'token_type_ids'] "token_type_ids"]
else: else:
return example['input_ids'], example['token_type_ids'] return example["input_ids"], example["token_type_ids"]
class Predictor(object): class Predictor(object):
"""
Inference Predictor class
"""
def __init__(self, predictor, input_handles, output_handles): def __init__(self, predictor, input_handles, output_handles):
self.predictor = predictor self.predictor = predictor
self.input_handles = input_handles self.input_handles = input_handles
...@@ -164,60 +182,51 @@ class Predictor(object): ...@@ -164,60 +182,51 @@ class Predictor(object):
@classmethod @classmethod
def create_predictor(cls, args): def create_predictor(cls, args):
config = paddle.inference.Config(args.model_path + ".pdmodel", """
args.model_path + ".pdiparams") create_predictor func
"""
cls.rerun_flag = False
config = paddle.inference.Config(
os.path.join(args.model_path, args.model_filename),
os.path.join(args.model_path, args.params_filename))
if args.device == "gpu": if args.device == "gpu":
# set GPU configs accordingly # set GPU configs accordingly
config.enable_use_gpu(100, 0) config.enable_use_gpu(100, 0)
cls.device = paddle.set_device("gpu") cls.device = paddle.set_device("gpu")
elif args.device == "cpu": else:
# set CPU configs accordingly,
# such as enable_mkldnn, set_cpu_math_library_num_threads
config.disable_gpu() config.disable_gpu()
cls.device = paddle.set_device("cpu") config.set_cpu_math_library_num_threads(args.cpu_threads)
elif args.device == "xpu": config.switch_ir_optim()
# set XPU configs accordingly if args.use_mkldnn:
config.enable_xpu(100) config.enable_mkldnn()
if args.use_trt: if args.precision == "int8":
if args.int8: config.enable_mkldnn_int8(
config.enable_tensorrt_engine( {"fc", "reshape2", "transpose2", "slice"})
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Int8, precision_map = {
max_batch_size=args.batch_size, "int8": inference.PrecisionType.Int8,
min_subgraph_size=5, "fp32": inference.PrecisionType.Float32,
use_static=False, "fp16": inference.PrecisionType.Half,
use_calib_mode=False) }
elif args.fp16: if args.precision in precision_map.keys() and args.use_trt:
config.enable_tensorrt_engine( config.enable_tensorrt_engine(
workspace_size=1 << 30, workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Half, max_batch_size=args.batch_size,
max_batch_size=args.batch_size, min_subgraph_size=5,
min_subgraph_size=5, precision_mode=precision_map[args.precision],
use_static=False, use_static=True,
use_calib_mode=False) use_calib_mode=False, )
else:
config.enable_tensorrt_engine( dynamic_shape_file = os.path.join(args.model_path,
workspace_size=1 << 30, "dynamic_shape.txt")
precision_mode=inference.PrecisionType.Float32,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
print("Enable TensorRT is: {}".format(
config.tensorrt_engine_enabled()))
model_dir = os.path.dirname(args.model_path)
dynamic_shape_file = os.path.join(model_dir, 'dynamic_shape.txt')
if os.path.exists(dynamic_shape_file): if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file, config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True) True)
print('trt set dynamic shape done!') print("trt set dynamic shape done!")
else: else:
config.collect_shape_range_info(dynamic_shape_file) config.collect_shape_range_info(dynamic_shape_file)
print( print("Start collect dynamic shape...")
'Start collect dynamic shape... Please eval again to get real result in TensorRT' cls.rerun_flag = True
)
sys.exit()
predictor = paddle.inference.create_predictor(config) predictor = paddle.inference.create_predictor(config)
...@@ -233,6 +242,9 @@ class Predictor(object): ...@@ -233,6 +242,9 @@ class Predictor(object):
return cls(predictor, input_handles, output_handles) return cls(predictor, input_handles, output_handles)
def predict(self, dataset, collate_fn, args): def predict(self, dataset, collate_fn, args):
"""
predict func
"""
batch_sampler = paddle.io.BatchSampler( batch_sampler = paddle.io.BatchSampler(
dataset, batch_size=args.batch_size, shuffle=False) dataset, batch_size=args.batch_size, shuffle=False)
data_loader = paddle.io.DataLoader( data_loader = paddle.io.DataLoader(
...@@ -241,94 +253,92 @@ class Predictor(object): ...@@ -241,94 +253,92 @@ class Predictor(object):
collate_fn=collate_fn, collate_fn=collate_fn,
num_workers=0, num_workers=0,
return_list=True) return_list=True)
end_time = 0
if args.perf:
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy(
) if isinstance(input_field, paddle.Tensor) else
input_field)
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
if i > args.perf_warmup_steps:
break
time1 = time.time()
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy(
) if isinstance(input_field, paddle.Tensor) else
input_field)
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
sequences_num = i * args.batch_size
print("task name: %s, time: %s qps/s, " %
(args.task_name, sequences_num / (time.time() - time1)))
else:
metric = METRIC_CLASSES[args.task_name]()
metric.reset()
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy(
) if isinstance(input_field, paddle.Tensor) else
input_field)
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
label = data[-1]
correct = metric.compute(
paddle.to_tensor(output[0]),
paddle.to_tensor(np.array(label).flatten()))
print(correct)
metric.update(correct)
res = metric.accumulate() for i, data in enumerate(data_loader):
print("task name: %s, acc: %s, \n" % (args.task_name, res), end='') for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy() if isinstance(
input_field, paddle.Tensor) else input_field)
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
if i > args.perf_warmup_steps:
break
if self.rerun_flag:
return
metric = METRIC_CLASSES[args.task_name]()
metric.reset()
predict_time = 0.0
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy() if isinstance(
input_field, paddle.Tensor) else input_field)
start_time = time.time()
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
end_time = time.time()
predict_time += end_time - start_time
label = data[-1]
correct = metric.compute(
paddle.to_tensor(output[0]),
paddle.to_tensor(np.array(label).flatten()))
metric.update(correct)
sequences_num = i * args.batch_size
print(
"[benchmark]task name: {}, batch size: {} Inference time per batch: {}ms, qps: {}.".
format(
args.task_name,
args.batch_size,
round(predict_time * 1000 / i, 2),
round(sequences_num / predict_time, 2), ))
res = metric.accumulate()
print(
"[benchmark]task name: %s, acc: %s. \n" % (args.task_name, res),
end="")
sys.stdout.flush()
def main(): def main():
"""
main func
"""
paddle.seed(42) paddle.seed(42)
args = parse_args() args = parse_args()
if args.use_mkldnn:
paddle.set_device("cpu")
predictor = Predictor.create_predictor(args) predictor = Predictor.create_predictor(args)
args.task_name = args.task_name.lower() args.task_name = args.task_name.lower()
args.model_type = args.model_type.lower() args.model_type = args.model_type.lower()
dev_ds = load_dataset('glue', args.task_name, splits='dev') dev_ds = load_dataset("glue", args.task_name, splits="dev")
tokenizer = BertTokenizer.from_pretrained(args.model_name_or_path) tokenizer = BertTokenizer.from_pretrained(args.model_name_or_path)
trans_func = partial( trans_func = partial(
convert_example, _convert_example,
tokenizer=tokenizer, tokenizer=tokenizer,
label_list=dev_ds.label_list, label_list=dev_ds.label_list,
max_seq_length=args.max_seq_length, max_seq_length=args.max_seq_length,
task_name=args.task_name, task_name=args.task_name,
return_attention_mask=True) return_attention_mask=True, )
dev_ds = dev_ds.map(trans_func) dev_ds = dev_ds.map(trans_func)
batchify_fn = lambda samples, fn=Tuple( batchify_fn = lambda samples, fn=Tuple(
Pad(axis=0, pad_val=tokenizer.pad_token_id), # input Pad(axis=0, pad_val=tokenizer.pad_token_id), # input
Pad(axis=0, pad_val=0), Pad(axis=0, pad_val=0),
Pad(axis=0, pad_val=tokenizer.pad_token_id), # segment Pad(axis=0, pad_val=tokenizer.pad_token_id), # segment
Stack(dtype="int64" if dev_ds.label_list else "float32") # label Stack(dtype="int64" if dev_ds.label_list else "float32"), # label
): fn(samples) ): fn(samples)
predictor.predict(dev_ds, batchify_fn, args) predictor.predict(dev_ds, batchify_fn, args)
if __name__ == "__main__": if __name__ == "__main__":
paddle.set_device("cpu")
main() main()
...@@ -8,7 +8,6 @@ ...@@ -8,7 +8,6 @@
- [3.2 准备数据集](#32-准备数据集) - [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型) - [3.3 准备预测模型](#33-准备预测模型)
- [3.4 自动压缩并产出模型](#34-自动压缩并产出模型) - [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
- [3.5 测试模型精度](#35-测试模型精度)
- [4.预测部署](#4预测部署) - [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ) - [5.FAQ](5FAQ)
...@@ -149,14 +148,6 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log - ...@@ -149,14 +148,6 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
--config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/' --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
``` ```
#### 3.5 测试模型精度
修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```
## 4.预测部署 ## 4.预测部署
...@@ -164,31 +155,60 @@ python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml ...@@ -164,31 +155,60 @@ python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```shell ```shell
├── model.pdiparams # Paddle预测模型权重 ├── model.pdiparams # Paddle预测模型权重
├── model.pdmodel # Paddle预测模型文件 ├── model.pdmodel # Paddle预测模型文件
├── calibration_table.txt # Paddle量化后校准表
├── ONNX ├── ONNX
│ ├── quant_model.onnx # 量化后转出的ONNX模型 │ ├── quant_model.onnx # 量化后转出的ONNX模型
│ ├── calibration.cache # TensorRT可以直接加载的校准表 │ ├── calibration.cache # TensorRT可以直接加载的校准表
``` ```
#### 导出至ONNX使用TensorRT部署 #### Paddle Inference部署测试
加载`quant_model.onnx``calibration.cache`,可以直接使用TensorRT测试脚本进行验证,详细代码可参考[TensorRT部署](./TensorRT) 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| dataset_dir | eval时数据验证集路径, 默认`dataset/coco` |
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn``````use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括`fp32/fp16/int8` |
TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[paddle_inference_eval.py](./paddle_inference_eval.py)进行部署:
- python测试:
```shell ```shell
cd TensorRT python paddle_inference_eval.py \
python trt_eval.py --onnx_model_file=output/ONNX/quant_model.onnx \ --model_path=output \
--calibration_file=output/ONNX/calibration.cache \ --reader_config=configs/yoloe_reader.yml \
--image_file=../images/000000570688.jpg \ --use_trt=True \
--precision_mode=int8 --precision=int8
``` ```
- 速度测试 - MKLDNN预测:
```shell ```shell
trtexec --onnx=output/ONNX/quant_model.onnx --avgRuns=1000 --workspace=1024 --calib=output/ONNX/calibration.cache --int8 python paddle_inference_eval.py \
--model_path=output \
--reader_config=configs/yoloe_reader.yml \
--device=CPU \
--use_mkldnn=True \
--cpu_threads=10 \
--precision=int8
```
- 测试单张图片
```shell
python paddle_inference_eval.py --model_path=output --image_file=images/000000570688.jpg --use_trt=True --precision=int8
``` ```
#### Paddle-TensorRT部署
- C++部署 - C++部署
进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试: 进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
...@@ -199,13 +219,22 @@ bash compile.sh ...@@ -199,13 +219,22 @@ bash compile.sh
./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8 ./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
``` ```
- Python部署: #### 导出至ONNX使用TensorRT部署
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python) 加载`quant_model.onnx``calibration.cache`,可以直接使用TensorRT测试脚本进行验证,详细代码可参考[TensorRT部署](./TensorRT)
- python测试:
```shell
cd TensorRT
python trt_eval.py --onnx_model_file=output/ONNX/quant_model.onnx \
--calibration_file=output/ONNX/calibration.cache \
--image_file=../images/000000570688.jpg \
--precision_mode=int8
```
然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署: - 速度测试
```shell ```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8 trtexec --onnx=output/ONNX/quant_model.onnx --avgRuns=1000 --workspace=1024 --calib=output/ONNX/calibration.cache --int8
``` ```
## 5.FAQ ## 5.FAQ
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import sys
import argparse
import cv2
import numpy as np
from tqdm import tqdm
import pkg_resources as pkg
import paddle
from paddle.inference import Config
from paddle.inference import create_predictor
from dataset import COCOValDataset
from post_process import YOLOPostProcess, coco_metric
def argsparser():
"""
argsparser func
"""
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_path", type=str, help="inference model filepath")
parser.add_argument(
"--image_file",
type=str,
default=None,
help="image path, if set image_file, it will not eval coco.")
parser.add_argument(
"--dataset_dir",
type=str,
default="dataset/coco",
help="COCO dataset dir.")
parser.add_argument(
"--val_image_dir",
type=str,
default="val2017",
help="COCO dataset val image dir.")
parser.add_argument(
"--val_anno_path",
type=str,
default="annotations/instances_val2017.json",
help="COCO dataset anno path.")
parser.add_argument(
"--benchmark",
type=bool,
default=False,
help="Whether run benchmark or not.")
parser.add_argument(
"--use_dynamic_shape",
type=bool,
default=True,
help="Whether use dynamic shape or not.")
parser.add_argument(
"--use_trt",
type=bool,
default=False,
help="Whether use TensorRT or not.")
parser.add_argument(
"--precision",
type=str,
default="paddle",
help="mode of running(fp32/fp16/int8)")
parser.add_argument(
"--device",
type=str,
default="GPU",
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU",
)
parser.add_argument(
"--arch", type=str, default="YOLOv5", help="architectures name.")
parser.add_argument("--img_shape", type=int, default=640, help="input_size")
parser.add_argument(
"--batch_size", type=int, default=1, help="Batch size of model input.")
parser.add_argument(
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--cpu_threads", type=int, default=1, help="Num of cpu threads.")
return parser
CLASS_LABEL = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
def preprocess(image, input_size, mean=None, std=None, swap=(2, 0, 1)):
"""
image preprocess func
"""
if len(image.shape) == 3:
padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
else:
padded_img = np.ones(input_size) * 114.0
img = np.array(image)
r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
resized_img = cv2.resize(
img,
(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR, ).astype(np.float32)
padded_img[:int(img.shape[0] * r), :int(img.shape[1] * r)] = resized_img
padded_img = padded_img[:, :, ::-1]
padded_img /= 255.0
if mean is not None:
padded_img -= mean
if std is not None:
padded_img /= std
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, r
def get_color_map_list(num_classes):
"""
get_color_map_list func
"""
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= ((lab >> 0) & 1) << (7 - j)
color_map[i * 3 + 1] |= ((lab >> 1) & 1) << (7 - j)
color_map[i * 3 + 2] |= ((lab >> 2) & 1) << (7 - j)
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def draw_box(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
"""
draw_box func
"""
color_list = get_color_map_list(len(class_names))
for i, _ in enumerate(boxes):
box = boxes[i]
cls_id = int(cls_ids[i])
color = tuple(color_list[cls_id])
score = scores[i]
if score < conf:
continue
x0 = int(box[0])
y0 = int(box[1])
x1 = int(box[2])
y1 = int(box[3])
text = "{}:{:.1f}%".format(class_names[cls_id], score * 100)
font = cv2.FONT_HERSHEY_SIMPLEX
txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
cv2.rectangle(img, (x0, y0 + 1), (
x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])), color, -1)
cv2.putText(
img,
text, (x0, y0 + txt_size[1]),
font,
0.8, (0, 255, 0),
thickness=2)
return img
def get_current_memory_mb():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
import pynvml
import psutil
import GPUtil
gpu_id = int(os.environ.get("CUDA_VISIBLE_DEVICES", 0))
pid = os.getpid()
p = psutil.Process(pid)
info = p.memory_full_info()
cpu_mem = info.uss / 1024.0 / 1024.0
gpu_mem = 0
gpu_percent = 0
gpus = GPUtil.getGPUs()
if gpu_id is not None and len(gpus) > 0:
gpu_percent = gpus[gpu_id].load
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
gpu_mem = meminfo.used / 1024.0 / 1024.0
return round(cpu_mem, 4), round(gpu_mem, 4)
def load_predictor(
model_dir,
precision="fp32",
use_trt=False,
use_mkldnn=False,
batch_size=1,
device="CPU",
min_subgraph_size=3,
use_dynamic_shape=False,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
cpu_threads=1, ):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
precision (str): mode of running(fp32/fp16/int8)
use_trt (bool): whether use TensorRT or not.
use_mkldnn (bool): whether use MKLDNN or not in CPU.
device (str): Choose the device you want to run, it can be: CPU/GPU, default is CPU
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
rerun_flag = False
if device != "GPU" and use_trt:
raise ValueError(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}".
format(precision, device))
config = Config(
os.path.join(model_dir, "model.pdmodel"),
os.path.join(model_dir, "model.pdiparams"))
if device == "GPU":
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(cpu_threads)
config.switch_ir_optim()
if use_mkldnn:
config.enable_mkldnn()
if precision == "int8":
config.enable_mkldnn_int8({"conv2d", "transpose2", "pool2d"})
precision_map = {
"int8": Config.Precision.Int8,
"fp32": Config.Precision.Float32,
"fp16": Config.Precision.Half,
}
if precision in precision_map.keys() and use_trt:
config.enable_tensorrt_engine(
workspace_size=(1 << 25) * batch_size,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[precision],
use_static=True,
use_calib_mode=False, )
if use_dynamic_shape:
dynamic_shape_file = os.path.join(FLAGS.model_path,
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True)
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
rerun_flag = True
# enable shared memory
config.enable_memory_optim()
predictor = create_predictor(config)
return predictor, rerun_flag
def eval(predictor, val_loader, anno_file, rerun_flag=False):
"""
eval main func
"""
bboxes_list, bbox_nums_list, image_id_list = [], [], []
cpu_mems, gpu_mems = 0, 0
sample_nums = len(val_loader)
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
input_names = predictor.get_input_names()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
for batch_id, data in enumerate(val_loader):
data_all = {k: np.array(v) for k, v in data.items()}
inputs = {}
if FLAGS.arch == "YOLOv6":
inputs["x2paddle_image_arrays"] = data_all["image"]
else:
inputs["x2paddle_images"] = data_all["image"]
for i, _ in enumerate(input_names):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
start_time = time.time()
predictor.run()
outs = boxes_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
if rerun_flag:
return
postprocess = YOLOPostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs), data_all["scale_factor"])
bboxes_list.append(res["bbox"])
bbox_nums_list.append(res["bbox_num"])
image_id_list.append(np.array(data_all["im_id"]))
cpu_mem, gpu_mem = get_current_memory_mb()
cpu_mems += cpu_mem
gpu_mems += gpu_mem
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
print("[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB".format(
cpu_mems / sample_nums, gpu_mems / sample_nums))
time_avg = predict_time / sample_nums
print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
map_res = coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
print("[Benchmark] COCO mAP: {}".format(map_res[0]))
sys.stdout.flush()
def infer(predictor):
"""
infer image main func
"""
warmup, repeats = 1, 1
if FLAGS.benchmark:
warmup, repeats = 50, 100
origin_img = cv2.imread(FLAGS.image_file)
input_image, scale_factor = preprocess(origin_img,
[FLAGS.img_shape, FLAGS.img_shape])
input_image = np.expand_dims(input_image, axis=0)
scale_factor = np.array([[scale_factor, scale_factor]])
inputs = {}
if FLAGS.arch == "YOLOv6":
inputs["x2paddle_image_arrays"] = input_image
else:
inputs["x2paddle_images"] = input_image
input_names = predictor.get_input_names()
for i, _ in enumerate(input_names):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
for i in range(warmup):
predictor.run()
np_boxes = None
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
cpu_mems, gpu_mems = 0, 0
for i in range(repeats):
start_time = time.time()
predictor.run()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
cpu_mem, gpu_mem = get_current_memory_mb()
cpu_mems += cpu_mem
gpu_mems += gpu_mem
print("[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB".format(
cpu_mems / repeats, gpu_mems / repeats))
time_avg = predict_time / repeats
print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
postprocess = YOLOPostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np_boxes, scale_factor)
# Draw rectangles and labels on the original image
dets = res["bbox"]
if dets is not None:
final_boxes, final_scores, final_class = dets[:, 2:], dets[:,
1], dets[:,
0]
res_img = draw_box(
origin_img,
final_boxes,
final_scores,
final_class,
conf=0.5,
class_names=CLASS_LABEL)
cv2.imwrite("output.jpg", res_img)
print("The prediction results are saved in output.jpg.")
def main():
"""
main func
"""
predictor, rerun_flag = load_predictor(
FLAGS.model_path,
device=FLAGS.device,
use_trt=FLAGS.use_trt,
use_mkldnn=FLAGS.use_mkldnn,
precision=FLAGS.precision,
use_dynamic_shape=FLAGS.use_dynamic_shape,
cpu_threads=FLAGS.cpu_threads, )
if FLAGS.image_file:
infer(predictor)
else:
dataset = COCOValDataset(
dataset_dir=FLAGS.dataset_dir,
image_dir=FLAGS.val_image_dir,
anno_path=FLAGS.val_anno_path)
anno_file = dataset.ann_file
val_loader = paddle.io.DataLoader(
dataset, batch_size=FLAGS.batch_size, drop_last=True)
eval(predictor, val_loader, anno_file, rerun_flag=rerun_flag)
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if __name__ == "__main__":
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
# DataLoader need run on cpu
paddle.set_device("cpu")
main()
...@@ -8,8 +8,7 @@ ...@@ -8,8 +8,7 @@
- [3.2 准备数据集](#32-准备数据集) - [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型) - [3.3 准备预测模型](#33-准备预测模型)
- [3.4 自动压缩并产出模型](#34-自动压缩并产出模型) - [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
- [4.评估精度](#4评估精度) - [4.预测部署](#4预测部署)
- [5.预测部署](#5预测部署)
- [5.FAQ](5FAQ) - [5.FAQ](5FAQ)
## 1.简介 ## 1.简介
...@@ -156,104 +155,68 @@ python -m paddle.distributed.launch run.py --config_path='./configs/pp_humanseg/ ...@@ -156,104 +155,68 @@ python -m paddle.distributed.launch run.py --config_path='./configs/pp_humanseg/
压缩完成后会在`save_dir`中产出压缩好的预测模型,可直接预测部署。 压缩完成后会在`save_dir`中产出压缩好的预测模型,可直接预测部署。
## 4.评估精度 ## 4.预测部署
本小节以人像分割模型和小数据集为例, 介绍如何在测试集上评估压缩后的模型. #### 4.1 Paddle Inference 验证性能
下载经过量化训练压缩后的推理模型: 量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
```
wget https://bj.bcebos.com/v1/paddle-slim-models/act/PaddleSeg/qat/pp_humanseg_qat.zip
unzip pp_humanseg_qat.zip
```
通过以下命令下载人像分割示例数据: 以下字段用于配置预测参数:
```shell | 参数名 | 含义 |
cd ./data |:------:|:------:|
python download_data.py mini_humanseg | model_path | inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件 |
cd - | model_filename | inference_model_dir文件夹下的模型文件名称 |
| params_filename | inference_model_dir文件夹下的参数文件名称 |
``` | dataset | 选择数据集的类型,可选:`human`, `cityscape`。 |
| dataset_config | 数据集配置的config |
| image_file | 待测试单张图片的路径,如果设置image_file,则dataset_config将无效。 |
| device | 预测时的设备,可选:`CPU`, `GPU`。 |
| use_trt | 是否使用 TesorRT 预测引擎,在device为```GPU```时生效。 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn```,在device为```CPU```时生效。 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测时精度,可选:`fp32`, `fp16`, `int8`。 |
执行以下命令评估模型在测试集上的精度:
```
python eval.py \
--model_dir ./pp_humanseg_qat \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--dataset_config configs/dataset/humanseg_dataset.yaml
```
## 5.预测部署 - TensorRT预测:
本小节以人像分割为例, 介绍如何使用Paddle Inference推理库执行压缩后的模型. 环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
### 5.1 安装推理库 准备好预测模型,并且修改dataset_config中数据集路径为正确的路径后,启动测试:
请参考该链接安装Python版本的PaddleInference推理库: [推理库安装教程](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python) ```shell
python paddle_inference_eval.py \
### 5.2 准备模型和数据 --model_path=pp_liteseg_qat \
--dataset='cityscape' \
[2.Benchmark](#2Benchmark) 的表格中获得压缩前后的推理模型的下载链接,执行以下命令下载并解压推理模型: --dataset_config=configs/dataset/cityscapes_1024x512_scale1.0.yml \
--use_trt=True \
下载Float32数值类型的模型: --precision=int8
```
wget https://paddleseg.bj.bcebos.com/dygraph/ppseg/ppseg_lite_portrait_398x224_with_softmax.tar.gz
tar -xzf ppseg_lite_portrait_398x224_with_softmax.tar.gz
mv ppseg_lite_portrait_398x224_with_softmax pp_humanseg_fp32
```
下载经过量化训练压缩后的推理模型:
```
wget https://bj.bcebos.com/v1/paddle-slim-models/act/PaddleSeg/qat/pp_humanseg_qat.zip
unzip pp_humanseg_qat.zip
``` ```
准备好需要处理的图片,这里直接使用人像示例图片 `./data/human_demo.jpg` - MKLDNN预测:
### 5.3 执行推理
执行以下命令,直接使用飞桨框架的原生推理(仅支持Float32, 无需依赖TensorRT):
``` ```shell
export CUDA_VISIBLE_DEVICES=0 python paddle_inference_eval.py \
python infer.py \ --model_path=pp_liteseg_qat \
--image_file "./data/human_demo.jpg" \ --dataset='cityscape' \
--model_path "./pp_humanseg_fp32/model.pdmodel" \ --dataset_config=configs/dataset/cityscapes_1024x512_scale1.0.yml \
--params_path "./pp_humanseg_fp32/model.pdiparams" \ --device=CPU \
--save_file "./humanseg_result_fp32.png" \ --use_mkldnn=True \
--dataset "human" \ --precision=int8 \
--benchmark True \ --cpu_threads=10
--precision "fp32"
``` ```
执行以下命令,使用Int8推理: #### 4.2 Paddle Inference 测试单张图片
``` 利用人像分割测试单张图片:
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--image_file "./data/human_demo.jpg" \
--model_path "./pp_humanseg_qat/model.pdmodel" \
--params_path "./pp_humanseg_qat/model.pdiparams" \
--save_file "./humanseg_result_qat.png" \
--dataset "human" \
--benchmark True \
--use_trt True \
--precision "int8"
```
执行以下命令,使用Paddle Inference在相应数据集上测试精度: ```shell
python paddle_inference_eval.py \
``` --model_path=pp_humanseg_qat \
export CUDA_VISIBLE_DEVICES=0 --dataset='human' \
python infer.py \ --image_file=./data/human_demo.jpg \
--model_path "./pp_humanseg_qat/model.pdmodel" \ --use_trt=True \
--params_path "./pp_humanseg_qat/model.pdiparams" \ --precision=int8
--dataset_config configs/dataset/humanseg_dataset.yaml \
--use_trt True \
--precision "int8"
``` ```
<table><tbody> <table><tbody>
...@@ -287,17 +250,11 @@ Int8推理结果 ...@@ -287,17 +250,11 @@ Int8推理结果
</tbody></table> </tbody></table>
执行以下命令查看更多关于 `infer.py` 使用说明:
```
python infer.py --help
```
### 5.4 更多部署教程 ### 4.3 更多部署教程
- [Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/python_inference.md) - [Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/python_inference.md)
- [Paddle Inference C++部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/cpp_inference.md) - [Paddle Inference C++部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/cpp_inference.md)
- [Paddle Lite部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/lite/lite.md) - [Paddle Lite部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/lite/lite.md)
## 6.FAQ ## 5.FAQ
...@@ -12,11 +12,12 @@ ...@@ -12,11 +12,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import cv2
import numpy as np
import argparse import argparse
import time import time
from tqdm import tqdm import os
import sys
import cv2
import numpy as np
import paddle import paddle
import paddleseg.transforms as T import paddleseg.transforms as T
from paddleseg.cvlibs import Config as PaddleSegDataConfig from paddleseg.cvlibs import Config as PaddleSegDataConfig
...@@ -38,62 +39,72 @@ def _transforms(dataset): ...@@ -38,62 +39,72 @@ def _transforms(dataset):
return transforms return transforms
def auto_tune_trt(args, data): def load_predictor(args):
auto_tuned_shape_file = "./auto_tuning_shape" """
pred_cfg = PredictConfig(args.model_path, args.params_path) load predictor func
pred_cfg.enable_use_gpu(100, 0) """
pred_cfg.collect_shape_range_info("./auto_tuning_shape") rerun_flag = False
predictor = create_predictor(pred_cfg) model_file = os.path.join(args.model_path, args.model_filename)
input_names = predictor.get_input_names() params_file = os.path.join(args.model_path, args.params_filename)
input_handle = predictor.get_input_handle(input_names[0]) pred_cfg = PredictConfig(model_file, params_file)
input_handle.reshape(data.shape)
input_handle.copy_from_cpu(data)
predictor.run()
return auto_tuned_shape_file
def load_predictor(args, data):
pred_cfg = PredictConfig(args.model_path, args.params_path)
pred_cfg.disable_glog_info()
pred_cfg.enable_memory_optim() pred_cfg.enable_memory_optim()
pred_cfg.switch_ir_optim(True) pred_cfg.switch_ir_optim(True)
if args.device == "GPU": if args.device == "GPU":
pred_cfg.enable_use_gpu(100, 0) pred_cfg.enable_use_gpu(100, 0)
else:
pred_cfg.disable_gpu()
pred_cfg.set_cpu_math_library_num_threads(args.cpu_threads)
if args.use_mkldnn:
pred_cfg.enable_mkldnn()
if args.precision == "int8":
pred_cfg.enable_mkldnn_int8({
"conv2d", "depthwise_conv2d", "pool2d", "elementwise_mul"
})
if args.use_trt: if args.use_trt:
# To collect the dynamic shapes of inputs for TensorRT engine # To collect the dynamic shapes of inputs for TensorRT engine
auto_tuned_shape_file = auto_tune_trt(args, data) dynamic_shape_file = os.path.join(args.model_path, "dynamic_shape.txt")
precision_map = { if os.path.exists(dynamic_shape_file):
"fp16": PrecisionType.Half, pred_cfg.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
"fp32": PrecisionType.Float32, True)
"int8": PrecisionType.Int8 print("trt set dynamic shape done!")
} precision_map = {
pred_cfg.enable_tensorrt_engine( "fp16": PrecisionType.Half,
workspace_size=1 << 30, "fp32": PrecisionType.Float32,
max_batch_size=1, "int8": PrecisionType.Int8
min_subgraph_size=4, }
precision_mode=precision_map[args.precision], pred_cfg.enable_tensorrt_engine(
use_static=False, workspace_size=1 << 30,
use_calib_mode=False) max_batch_size=1,
allow_build_at_runtime = True min_subgraph_size=4,
pred_cfg.enable_tuned_tensorrt_dynamic_shape(auto_tuned_shape_file, precision_mode=precision_map[args.precision],
allow_build_at_runtime) use_static=True,
use_calib_mode=False, )
else:
pred_cfg.disable_gpu()
pred_cfg.set_cpu_math_library_num_threads(10)
pred_cfg.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
rerun_flag = True
predictor = create_predictor(pred_cfg) predictor = create_predictor(pred_cfg)
return predictor return predictor, rerun_flag
def predict_image(args): def predict_image(args):
"""
predict image func
"""
transforms = _transforms(args.dataset) transforms = _transforms(args.dataset)
transform = T.Compose(transforms) transform = T.Compose(transforms)
# Step1: Load image and preprocess # Step1: Load image and preprocess
im = cv2.imread(args.image_file).astype('float32') im = cv2.imread(args.image_file).astype("float32")
data, _ = transform(im) data, _ = transform(im)
data = np.array(data)[np.newaxis, :] data = np.array(data)[np.newaxis, :]
# Step2: Prepare prdictor # Step2: Prepare prdictor
predictor = load_predictor(args, data) predictor, rerun_flag = load_predictor(args)
# Step3: Inference # Step3: Inference
input_names = predictor.get_input_names() input_names = predictor.get_input_names()
...@@ -114,14 +125,21 @@ def predict_image(args): ...@@ -114,14 +125,21 @@ def predict_image(args):
for i in range(repeats): for i in range(repeats):
predictor.run() predictor.run()
results = output_handle.copy_to_cpu() results = output_handle.copy_to_cpu()
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
return
total_time = time.time() - start_time total_time = time.time() - start_time
avg_time = float(total_time) / repeats avg_time = float(total_time) / repeats
print(f"Average inference time: \033[91m{round(avg_time*1000, 2)}ms\033[0m") print(
f"[Benchmark]Average inference time: \033[91m{round(avg_time*1000, 2)}ms\033[0m"
)
# Step4: Post process # Step4: Post process
if args.dataset == "human": if args.dataset == "human":
results = reverse_transform( results = reverse_transform(
paddle.to_tensor(results), im.shape, transforms, mode='bilinear') paddle.to_tensor(results), im.shape, transforms, mode="bilinear")
results = np.argmax(results, axis=1) results = np.argmax(results, axis=1)
result = get_pseudo_color_map(results[0]) result = get_pseudo_color_map(results[0])
...@@ -132,8 +150,11 @@ def predict_image(args): ...@@ -132,8 +150,11 @@ def predict_image(args):
def eval(args): def eval(args):
"""
eval mIoU func
"""
# DataLoader need run on cpu # DataLoader need run on cpu
paddle.set_device('cpu') paddle.set_device("cpu")
data_cfg = PaddleSegDataConfig(args.dataset_config) data_cfg = PaddleSegDataConfig(args.dataset_config)
eval_dataset = data_cfg.val_dataset eval_dataset = data_cfg.val_dataset
...@@ -142,48 +163,56 @@ def eval(args): ...@@ -142,48 +163,56 @@ def eval(args):
loader = paddle.io.DataLoader( loader = paddle.io.DataLoader(
eval_dataset, eval_dataset,
batch_sampler=batch_sampler, batch_sampler=batch_sampler,
num_workers=1, num_workers=0,
return_list=True) return_list=True)
total_iters = len(loader) predictor, rerun_flag = load_predictor(args)
intersect_area_all = 0 intersect_area_all = 0
pred_area_all = 0 pred_area_all = 0
label_area_all = 0 label_area_all = 0
input_names = predictor.get_input_names()
print("Start evaluating (total_samples: {}, total_iters: {})...".format( input_handle = predictor.get_input_handle(input_names[0])
len(eval_dataset), total_iters)) output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
init_predictor = False total_samples = len(eval_dataset)
for (image, label) in tqdm(loader): sample_nums = len(loader)
label = np.array(label).astype('int64') batch_size = int(total_samples / sample_nums)
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
print("Start evaluating (total_samples: {}, total_iters: {}).".format(
total_samples, sample_nums))
for batch_id, data in enumerate(loader):
image = np.array(data[0])
label = np.array(data[1]).astype("int64")
ori_shape = np.array(label).shape[-2:] ori_shape = np.array(label).shape[-2:]
data = np.array(image) input_handle.reshape(image.shape)
input_handle.copy_from_cpu(image)
if not init_predictor: start_time = time.time()
predictor = load_predictor(args, data)
init_predictor = True
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
input_handle.reshape(data.shape)
input_handle.copy_from_cpu(data)
predictor.run() predictor.run()
output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
results = output_handle.copy_to_cpu() results = output_handle.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
return
logit = reverse_transform( logit = reverse_transform(
paddle.to_tensor(results), paddle.to_tensor(results),
ori_shape, ori_shape,
eval_dataset.transforms.transforms, eval_dataset.transforms.transforms,
mode='bilinear') mode="bilinear")
pred = paddle.to_tensor(logit) pred = paddle.to_tensor(logit)
if len( if len(
pred.shape pred.shape
) == 4: # for humanseg model whose prediction is distribution but not class id ) == 4: # for humanseg model whose prediction is distribution but not class id
pred = paddle.argmax(pred, axis=1, keepdim=True, dtype='int32') pred = paddle.argmax(pred, axis=1, keepdim=True, dtype="int32")
intersect_area, pred_area, label_area = metrics.calculate_area( intersect_area, pred_area, label_area = metrics.calculate_area(
pred, pred,
...@@ -193,71 +222,95 @@ def eval(args): ...@@ -193,71 +222,95 @@ def eval(args):
intersect_area_all = intersect_area_all + intersect_area intersect_area_all = intersect_area_all + intersect_area
pred_area_all = pred_area_all + pred_area pred_area_all = pred_area_all + pred_area
label_area_all = label_area_all + label_area label_area_all = label_area_all + label_area
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
class_iou, miou = metrics.mean_iou(intersect_area_all, pred_area_all, _, miou = metrics.mean_iou(intersect_area_all, pred_area_all,
label_area_all) label_area_all)
class_acc, acc = metrics.accuracy(intersect_area_all, pred_area_all) _, acc = metrics.accuracy(intersect_area_all, pred_area_all)
kappa = metrics.kappa(intersect_area_all, pred_area_all, label_area_all) kappa = metrics.kappa(intersect_area_all, pred_area_all, label_area_all)
class_dice, mdice = metrics.dice(intersect_area_all, pred_area_all, _, mdice = metrics.dice(intersect_area_all, pred_area_all, label_area_all)
label_area_all)
time_avg = predict_time / sample_nums
infor = "[EVAL] #Images: {} mIoU: {:.4f} Acc: {:.4f} Kappa: {:.4f} Dice: {:.4f}".format( print(
len(eval_dataset), miou, acc, kappa, mdice) "[Benchmark]Batch size: {}, Inference time(ms): min={}, max={}, avg={}".
format(batch_size,
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
infor = "[Benchmark] #Images: {} mIoU: {:.4f} Acc: {:.4f} Kappa: {:.4f} Dice: {:.4f}".format(
total_samples, miou, acc, kappa, mdice)
print(infor) print(infor)
sys.stdout.flush()
if __name__ == '__main__': if __name__ == "__main__":
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument( parser.add_argument(
'--image_file', "--model_path", type=str, help="inference model filepath")
parser.add_argument(
"--model_filename",
type=str,
default="model.pdmodel",
help="model file name")
parser.add_argument(
"--params_filename",
type=str,
default="model.pdiparams",
help="params file name")
parser.add_argument(
"--image_file",
type=str, type=str,
default=None, default=None,
help="Image path to be processed.") help="Image path to be processed.")
parser.add_argument( parser.add_argument(
'--save_file', "--save_file",
type=str, type=str,
default=None, default=None,
help="The path to save the processed image.") help="The path to save the processed image.")
parser.add_argument( parser.add_argument(
'--model_path', type=str, help="Inference model filepath.") "--dataset",
parser.add_argument(
'--params_path', type=str, help="Inference parameters filepath.")
parser.add_argument(
'--dataset',
type=str, type=str,
default="human", default="human",
choices=["human", "cityscape"], choices=["human", "cityscape"],
help="The type of given image which can be 'human' or 'cityscape'.") help="The type of given image which can be 'human' or 'cityscape'.", )
parser.add_argument( parser.add_argument(
'--dataset_config', "--dataset_config",
type=str, type=str,
default=None, default=None,
help="path of dataset config.") help="path of dataset config.")
parser.add_argument( parser.add_argument(
'--benchmark', "--benchmark",
type=bool, type=bool,
default=False, default=False,
help="Whether to run benchmark or not.") help="Whether to run benchmark or not.")
parser.add_argument( parser.add_argument(
'--use_trt', "--use_trt",
type=bool, type=bool,
default=False, default=False,
help="Whether to use tensorrt engine or not.") help="Whether to use tensorrt engine or not.")
parser.add_argument( parser.add_argument(
'--device', "--device",
type=str, type=str,
default='GPU', default="GPU",
choices=["CPU", "GPU"], choices=["CPU", "GPU"],
help="Choose the device you want to run, it can be: CPU/GPU, default is GPU" help="Choose the device you want to run, it can be: CPU/GPU, default is GPU",
) )
parser.add_argument( parser.add_argument(
'--precision', "--precision",
type=str, type=str,
default='fp32', default="fp32",
choices=["fp32", "fp16", "int8"], choices=["fp32", "fp16", "int8"],
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'." help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
) )
parser.add_argument(
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--cpu_threads", type=int, default=1, help="Num of cpu threads.")
args = parser.parse_args() args = parser.parse_args()
if args.image_file: if args.image_file:
predict_image(args) predict_image(args)
......
...@@ -230,7 +230,6 @@ def export_onnx(model_dir, ...@@ -230,7 +230,6 @@ def export_onnx(model_dir,
opset_version=opset_version, opset_version=opset_version,
enable_onnx_checker=True, enable_onnx_checker=True,
deploy_backend=deploy_backend, deploy_backend=deploy_backend,
scale_file=os.path.join(model_dir, 'calibration_table.txt'),
calibration_file=os.path.join( calibration_file=os.path.join(
save_file_path.rstrip(os.path.split(save_file_path)[-1]), save_file_path.rstrip(os.path.split(save_file_path)[-1]),
'calibration.cache')) 'calibration.cache'))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册