未验证 提交 456b9fed 编写于 作者: G Guanghua Yu 提交者: GitHub

update act paddle inference demo (#1432)

Co-authored-by: Nceci3 <ceci3@users.noreply.github.com>
上级 fe33833a
......@@ -7,8 +7,7 @@
- [3.1 环境准备](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 测试模型精度](#34-测试模型精度)
- [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
......@@ -110,23 +109,52 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
--config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
```
## 4.预测部署
**注意**
- 要测试的模型路径可以在配置文件中`model_dir`字段下进行修改。
#### 4.1 Paddle Inference 验证性能
## 4.预测部署
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
- 如果模型包含NMS,可以参考[PaddleDetection部署教程](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy),GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| reader_config | eval时模型reader的配置文件路径 |
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn``````use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括`fp32/fp16/int8` |
- 模型为PPYOLOE,同时不包含NMS,使用以下预测demo进行部署:
- Paddle-TensorRT C++部署
- TensorRT预测:
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
```shell
python paddle_inference_eval.py \
--model_path=models/ppyoloe_crn_l_300e_coco_quant \
--reader_config=configs/yoloe_reader.yml \
--use_trt=True \
--precision=int8
```
- MKLDNN预测:
```shell
python paddle_inference_eval.py \
--model_path=models/ppyoloe_crn_l_300e_coco_quant \
--reader_config=configs/yoloe_reader.yml \
--device=CPU \
--use_mkldnn=True \
--cpu_threads=10 \
--precision=int8
```
- 模型为PPYOLOE,同时不包含NMS,可以使用C++预测demo进行测速:
进入[cpp_infer](./cpp_infer_ppyoloe)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer_ppyoloe/README.md)进行准备环境及编译,然后开始测试:
```shell
......@@ -136,14 +164,6 @@ python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
./build/trt_run --model_file ppyoloe_s_quant/model.pdmodel --params_file ppyoloe_s_quant/model.pdiparams --run_mode=trt_int8
```
- Paddle-TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)。然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
```
## 5.FAQ
- 如果想对模型进行离线量化,可进入[Detection模型离线量化示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/detection)中进行实验。
......@@ -13,16 +13,81 @@
# limitations under the License.
import os
import cv2
import numpy as np
import argparse
import time
import sys
import cv2
import numpy as np
import paddle
from paddle.inference import Config
from paddle.inference import create_predictor
from ppdet.core.workspace import load_config, create
from ppdet.metrics import COCOMetric
from post_process import PPYOLOEPostProcess
def argsparser():
"""
argsparser func
"""
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_path", type=str, help="inference model filepath")
parser.add_argument(
"--image_file",
type=str,
default=None,
help="image path, if set image_file, it will not eval coco.")
parser.add_argument(
"--reader_config",
type=str,
default=None,
help="path of datset and reader config.")
parser.add_argument(
"--benchmark",
type=bool,
default=False,
help="Whether run benchmark or not.")
parser.add_argument(
"--use_trt",
type=bool,
default=False,
help="Whether use TensorRT or not.")
parser.add_argument(
"--precision",
type=str,
default="paddle",
help="mode of running(fp32/fp16/int8)")
parser.add_argument(
"--device",
type=str,
default="GPU",
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU",
)
parser.add_argument(
"--use_dynamic_shape",
type=bool,
default=True,
help="Whether use dynamic shape or not.")
parser.add_argument(
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--cpu_threads", type=int, default=10, help="Num of cpu threads.")
parser.add_argument("--img_shape", type=int, default=640, help="input_size")
parser.add_argument(
'--include_nms',
type=bool,
default=True,
help="Whether include nms or not.")
return parser
CLASS_LABEL = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
......@@ -67,6 +132,9 @@ def generate_scale(im, target_shape, keep_ratio=True):
def image_preprocess(img_path, target_shape):
"""
image_preprocess func
"""
img = cv2.imread(img_path)
im_scale_y, im_scale_x = generate_scale(img, target_shape, keep_ratio=False)
img = cv2.resize(
......@@ -84,14 +152,17 @@ def image_preprocess(img_path, target_shape):
def get_color_map_list(num_classes):
"""
get_color_map_list func
"""
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
color_map[i * 3] |= ((lab >> 0) & 1) << (7 - j)
color_map[i * 3 + 1] |= ((lab >> 1) & 1) << (7 - j)
color_map[i * 3 + 2] |= ((lab >> 2) & 1) << (7 - j)
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
......@@ -99,6 +170,9 @@ def get_color_map_list(num_classes):
def draw_box(image_file, results, class_label, threshold=0.5):
"""
draw_box func
"""
srcimg = cv2.imread(image_file, 1)
for i in range(len(results)):
color_list = get_color_map_list(len(class_label))
......@@ -114,115 +188,142 @@ def draw_box(image_file, results, class_label, threshold=0.5):
color = tuple(clsid2color[classid])
cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
print(class_label[classid] + ': ' + str(round(conf, 3)))
print(class_label[classid] + ": " + str(round(conf, 3)))
cv2.putText(
srcimg,
class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
class_label[classid] + ":" + str(round(conf, 3)),
(xmin, ymin - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.8, (0, 255, 0),
thickness=2)
0.8,
(0, 255, 0),
thickness=2, )
return srcimg
def load_predictor(model_dir,
run_mode='paddle',
def load_predictor(
model_dir,
precision="fp32",
use_trt=False,
use_mkldnn=False,
batch_size=1,
device='CPU',
device="CPU",
min_subgraph_size=3,
use_dynamic_shape=False,
trt_min_shape=3,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
trt_calib_mode=False,
cpu_threads=1,
enable_mkldnn=False,
enable_mkldnn_bfloat16=False,
delete_shuffle_pass=False):
cpu_threads=1, ):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
precision (str): mode of running(fp32/fp16/int8)
use_trt (bool): whether use TensorRT or not.
use_mkldnn (bool): whether use MKLDNN or not in CPU.
device (str): Choose the device you want to run, it can be: CPU/GPU, default is CPU
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT.
Used by action model.
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
if device != 'GPU' and run_mode != 'paddle':
rerun_flag = False
if device != "GPU" and use_trt:
raise ValueError(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.format(run_mode, device))
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}".
format(precision, device))
config = Config(
os.path.join(model_dir, 'model.pdmodel'),
os.path.join(model_dir, 'model.pdiparams'))
if device == 'GPU':
os.path.join(model_dir, "model.pdmodel"),
os.path.join(model_dir, "model.pdiparams"))
if device == "GPU":
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
elif device == 'XPU':
config.enable_lite_engine()
config.enable_xpu(10 * 1024 * 1024)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(cpu_threads)
if enable_mkldnn:
try:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.switch_ir_optim()
if use_mkldnn:
config.enable_mkldnn()
if enable_mkldnn_bfloat16:
config.enable_mkldnn_bfloat16()
except Exception as e:
print(
"The current environment does not support `mkldnn`, so disable mkldnn."
)
pass
if precision == "int8":
config.enable_mkldnn_int8(
{"conv2d", "depthwise_conv2d", "transpose2", "pool2d"})
precision_map = {
'trt_int8': Config.Precision.Int8,
'trt_fp32': Config.Precision.Float32,
'trt_fp16': Config.Precision.Half
"int8": Config.Precision.Int8,
"fp32": Config.Precision.Float32,
"fp16": Config.Precision.Half,
}
if run_mode in precision_map.keys():
if precision in precision_map.keys() and use_trt:
config.enable_tensorrt_engine(
workspace_size=(1 << 25) * batch_size,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
use_static=False,
use_calib_mode=trt_calib_mode)
precision_mode=precision_map[precision],
use_static=True,
use_calib_mode=False, )
if use_dynamic_shape:
min_input_shape = {
'image': [batch_size, 3, trt_min_shape, trt_min_shape]
}
max_input_shape = {
'image': [batch_size, 3, trt_max_shape, trt_max_shape]
}
opt_input_shape = {
'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
}
config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
opt_input_shape)
print('trt set dynamic shape done!')
dynamic_shape_file = os.path.join(FLAGS.model_path,
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True)
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
rerun_flag = True
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
if delete_shuffle_pass:
config.delete_pass("shuffle_channel_detect_pass")
predictor = create_predictor(config)
return predictor
return predictor, rerun_flag
def get_current_memory_mb():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
try:
pkg.require('pynvml')
except:
from pip._internal import main
main(['install', 'pynvml'])
try:
pkg.require('psutil')
except:
from pip._internal import main
main(['install', 'psutil'])
try:
pkg.require('GPUtil')
except:
from pip._internal import main
main(['install', 'GPUtil'])
import pynvml
import psutil
import GPUtil
gpu_id = int(os.environ.get("CUDA_VISIBLE_DEVICES", 0))
pid = os.getpid()
p = psutil.Process(pid)
info = p.memory_full_info()
cpu_mem = info.uss / 1024.0 / 1024.0
gpu_mem = 0
gpu_percent = 0
gpus = GPUtil.getGPUs()
if gpu_id is not None and len(gpus) > 0:
gpu_percent = gpus[gpu_id].load
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
gpu_mem = meminfo.used / 1024.0 / 1024.0
return round(cpu_mem, 4), round(gpu_mem, 4)
def predict_image(predictor,
......@@ -230,15 +331,17 @@ def predict_image(predictor,
image_shape=[640, 640],
warmup=1,
repeats=1,
threshold=0.5,
include_nms=True):
threshold=0.5):
"""
predict image main func
"""
img, scale_factor = image_preprocess(image_file, image_shape)
inputs = {}
inputs['image'] = img
inputs["image"] = img
if include_nms:
inputs['scale_factor'] = scale_factor
input_names = predictor.get_input_names()
for i in range(len(input_names)):
for i, _ in enumerate(input_names):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
......@@ -246,16 +349,17 @@ def predict_image(predictor,
predictor.run()
np_boxes, np_boxes_num = None, None
predict_time = 0.
cpu_mems, gpu_mems = 0, 0
predict_time = 0.0
time_min = float("inf")
time_max = float('-inf')
time_max = float("-inf")
for i in range(repeats):
start_time = time.time()
predictor.run()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
if include_nms:
if FLAGS.include_nms:
boxes_num = predictor.get_output_handle(output_names[1])
np_boxes_num = boxes_num.copy_to_cpu()
end_time = time.time()
......@@ -263,61 +367,132 @@ def predict_image(predictor,
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
cpu_mem, gpu_mem = get_current_memory_mb()
cpu_mems += cpu_mem
gpu_mems += gpu_mem
time_avg = predict_time / repeats
print('Inference time(ms): min={}, max={}, avg={}'.format(
print("[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB".format(
cpu_mems / repeats, gpu_mems / repeats))
print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
if not include_nms:
if not FLAGS.include_nms:
postprocess = PPYOLOEPostProcess(score_threshold=0.3, nms_threshold=0.6)
res = postprocess(np_boxes, scale_factor)
else:
res = {'bbox': np_boxes, 'bbox_num': np_boxes_num}
res_img = draw_box(
image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
cv2.imwrite('result.jpg', res_img)
image_file, res["bbox"], CLASS_LABEL, threshold=threshold)
cv2.imwrite("result.jpg", res_img)
if __name__ == '__main__':
def eval(predictor, val_loader, metric, rerun_flag=False):
"""
eval main func
"""
cpu_mems, gpu_mems = 0, 0
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
sample_nums = len(val_loader)
input_names = predictor.get_input_names()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
boxes_num = predictor.get_output_handle(output_names[1])
for batch_id, data in enumerate(val_loader):
data_all = {k: np.array(v) for k, v in data.items()}
for i, _ in enumerate(input_names):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(data_all[input_names[i]])
start_time = time.time()
predictor.run()
np_boxes = boxes_tensor.copy_to_cpu()
if FLAGS.include_nms:
np_boxes_num = boxes_num.copy_to_cpu()
if rerun_flag:
return
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
cpu_mem, gpu_mem = get_current_memory_mb()
cpu_mems += cpu_mem
gpu_mems += gpu_mem
if not FLAGS.include_nms:
postprocess = PPYOLOEPostProcess(
score_threshold=0.3, nms_threshold=0.6)
res = postprocess(np_boxes, data_all['scale_factor'])
else:
res = {'bbox': np_boxes, 'bbox_num': np_boxes_num}
metric.update(data_all, res)
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
metric.accumulate()
metric.log()
map_res = metric.get_results()
metric.reset()
time_avg = predict_time / sample_nums
print("[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB".format(
cpu_mems / sample_nums, gpu_mems / sample_nums))
print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
print("[Benchmark] COCO mAP: {}".format(map_res["bbox"][0]))
sys.stdout.flush()
parser = argparse.ArgumentParser()
parser.add_argument(
'--image_file', type=str, default=None, help="image path")
parser.add_argument(
'--model_path', type=str, help="inference model filepath")
parser.add_argument(
'--benchmark',
type=bool,
default=False,
help="Whether run benchmark or not.")
parser.add_argument(
'--run_mode',
type=str,
default='paddle',
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
parser.add_argument(
'--device',
type=str,
default='GPU',
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
)
parser.add_argument('--img_shape', type=int, default=640, help="input_size")
parser.add_argument(
'--include_nms',
type=bool,
default=True,
help="Whether include nms or not.")
args = parser.parse_args()
predictor = load_predictor(
args.model_path, run_mode=args.run_mode, device=args.device)
def main():
"""
main func
"""
predictor, rerun_flag = load_predictor(
FLAGS.model_path,
device=FLAGS.device,
use_trt=FLAGS.use_trt,
use_mkldnn=FLAGS.use_mkldnn,
precision=FLAGS.precision,
use_dynamic_shape=FLAGS.use_dynamic_shape,
cpu_threads=FLAGS.cpu_threads)
if FLAGS.image_file:
warmup, repeats = 1, 1
if args.benchmark:
if FLAGS.benchmark:
warmup, repeats = 50, 100
predict_image(
predictor,
args.image_file,
image_shape=[args.img_shape, args.img_shape],
FLAGS.image_file,
image_shape=[FLAGS.img_shape, FLAGS.img_shape],
warmup=warmup,
repeats=repeats,
include_nms=args.include_nms)
repeats=repeats)
else:
reader_cfg = load_config(FLAGS.reader_config)
dataset = reader_cfg["EvalDataset"]
global val_loader
val_loader = create("EvalReader")(reader_cfg["EvalDataset"],
reader_cfg["worker_num"],
return_list=True)
clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
anno_file = dataset.get_anno()
metric = COCOMetric(
anno_file=anno_file, clsid2catid=clsid2catid, IouType="bbox")
eval(predictor, val_loader, metric, rerun_flag=rerun_flag)
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if __name__ == "__main__":
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
# DataLoader need run on cpu
paddle.set_device("cpu")
main()
......@@ -113,48 +113,56 @@ python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' -
注意 ```learning rate``````batch size``` 呈线性关系,这里单卡 ```batch size``` 为32,对应的 ```learning rate``` 为0.015,那么如果 ```batch size``` 减小4倍改为8,```learning rate``` 也需除以4;多卡时 ```batch size``` 为32,```learning rate``` 需乘上卡数。所以改变 ```batch size``` 或改变训练卡数都需要对应修改 ```learning rate```
**验证精度**
根据训练log可以看到模型验证的精度,若需再次验证精度,修改配置文件```./configs/MobileNetV1/qat_dis.yaml```中所需验证模型的文件夹路径及模型和参数名称```model_dir, model_filename, params_filename```,然后使用以下命令进行验证:
```shell
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path='./configs/MobileNetV1/qat_dis.yaml'
```
## 4.预测部署
#### 4.1 Python预测推理
环境配置:若使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
#### 4.1 Paddle Inference 验证性能
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
- ```inference_model_dir```:inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
- ```model_filename```:inference_model_dir文件夹下的模型文件名称
- ```params_filename```:inference_model_dir文件夹下的参数文件名称
- ```batch_size```:预测一个batch的大小
- ```image_size```:输入图像的大小
- ```use_tensorrt```:是否使用 TesorRT 预测引擎
- ```use_gpu```:是否使用 GPU 预测
- ```enable_mkldnn```:是否启用```MKL-DNN```加速库,注意```enable_mkldnn``````use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测
- ```use_fp16```:是否启用```FP16```
- ```use_int8```:是否启用```INT8```
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件 |
| model_filename | inference_model_dir文件夹下的模型文件名称 |
| params_filename | inference_model_dir文件夹下的参数文件名称 |
| data_path | 数据集路径 |
| batch_size | 预测一个batch的大小 |
| image_size | 输入图像的大小 |
| use_gpu | 是否使用 GPU 预测 |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn``````use_gpu```同时为```True```时,将忽略```use_mkldnn```,而使用```GPU```预测 |
| cpu_num_threads | CPU预测时,使用CPU线程数量,默认10 |
| use_fp16 | 使用TensorRT时,是否启用```FP16``` |
| use_int8 | 是否启用```INT8``` |
注意:
- 请注意模型的输入数据尺寸,如InceptionV3输入尺寸为299,部分模型需要修改参数:```image_size```
- 如果希望提升评测模型速度,使用 ```GPU``` 评测时,建议开启 ```TensorRT``` 加速预测,使用 ```CPU``` 评测时,建议开启 ```MKL-DNN``` 加速预测
准备好inference模型后,使用以下命令进行预测:
- TensorRT预测:
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
```shell
python paddle_inference_eval.py \
--model_path=models/ResNet50_vd_QAT \
--use_trt=True \
--use_int8=True \
--use_gpu=True \
--data_path=./dataset/ILSVRC2012/
```
- MKLDNN预测:
```shell
python infer.py --model_dir='MobileNetV1_infer' \
--model_filename='inference.pdmodel' \
--model_filename='inference.pdiparams' \
--eval=True \
--use_gpu=True \
--enable_mkldnn=True \
--use_int8=True
python paddle_inference_eval.py \
--model_path=models/ResNet50_vd_QAT \
--data_path=./dataset/ILSVRC2012/ \
--cpu_num_threads=10 \
--use_mkldnn=True \
--use_int8=True
```
#### 4.2 PaddleLite端侧部署
......
......@@ -13,76 +13,72 @@
# limitations under the License.
import os
import numpy as np
import cv2
import time
import sys
import argparse
import numpy as np
import cv2
import yaml
from tqdm import tqdm
from utils import preprocess, postprocess
import paddle
from paddle.inference import create_predictor
from paddleslim.common import load_config
from paddle.io import DataLoader
from imagenet_reader import ImageNetDataset, process_image
from imagenet_reader import ImageNetDataset
def argsparser():
"""
argsparser func
"""
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--model_dir',
"--model_path",
type=str,
default='./MobileNetV1_infer',
help='model directory')
default="./MobileNetV1_infer",
help="model directory")
parser.add_argument(
'--model_filename',
"--model_filename",
type=str,
default='inference.pdmodel',
help='model file name')
default="inference.pdmodel",
help="model file name")
parser.add_argument(
'--params_filename',
"--params_filename",
type=str,
default='inference.pdiparams',
help='params file name')
parser.add_argument('--batch_size', type=int, default=1)
parser.add_argument('--img_size', type=int, default=224)
parser.add_argument('--resize_size', type=int, default=256)
default="inference.pdiparams",
help="params file name")
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--img_size", type=int, default=224)
parser.add_argument("--resize_size", type=int, default=256)
parser.add_argument(
'--eval', type=bool, default=False, help='Whether to evaluate')
parser.add_argument('--data_path', type=str, default='./ILSVRC2012/')
"--data_path", type=str, default="./dataset/ILSVRC2012/")
parser.add_argument(
'--use_gpu', type=bool, default=False, help='Whether to use gpu')
"--use_gpu", type=bool, default=False, help="Whether to use gpu")
parser.add_argument(
'--enable_mkldnn',
type=bool,
default=False,
help='Whether to use mkldnn')
"--use_trt", type=bool, default=False, help="Whether to use tensorrt")
parser.add_argument(
'--cpu_num_threads', type=int, default=10, help='Number of cpu threads')
"--use_mkldnn", type=bool, default=False, help="Whether to use mkldnn")
parser.add_argument(
'--use_fp16', type=bool, default=False, help='Whether to use fp16')
"--cpu_num_threads", type=int, default=10, help="Number of cpu threads")
parser.add_argument(
'--use_int8', type=bool, default=False, help='Whether to use int8')
"--use_fp16", type=bool, default=False, help="Whether to use fp16")
parser.add_argument(
'--use_tensorrt',
type=bool,
default=True,
help='Whether to use tensorrt')
"--use_int8", type=bool, default=False, help="Whether to use int8")
parser.add_argument("--gpu_mem", type=int, default=8000, help="GPU memory")
parser.add_argument("--ir_optim", type=bool, default=True)
parser.add_argument(
'--enable_profile',
"--use_dynamic_shape",
type=bool,
default=False,
help='Whether to enable profile')
parser.add_argument('--gpu_mem', type=int, default=8000, help='GPU memory')
parser.add_argument('--ir_optim', type=bool, default=True)
default=True,
help="Whether use dynamic shape or not.")
return parser
def eval_reader(data_dir, batch_size, crop_size, resize_size):
"""
eval reader func
"""
val_reader = ImageNetDataset(
mode='val',
mode="val",
data_dir=data_dir,
crop_size=crop_size,
resize_size=resize_size)
......@@ -96,14 +92,17 @@ def eval_reader(data_dir, batch_size, crop_size, resize_size):
class Predictor(object):
def __init__(self, args):
"""
Paddle Inference Predictor class
"""
def __init__(self):
# HALF precission predict only work when using tensorrt
if args.use_fp16 is True:
assert args.use_tensorrt is True
self.args = args
assert args.use_trt is True
self.paddle_predictor = self.create_paddle_predictor()
self.rerun_flag = False
self.paddle_predictor = self._create_paddle_predictor()
input_names = self.paddle_predictor.get_input_names()
self.input_tensor = self.paddle_predictor.get_input_handle(input_names[
0])
......@@ -112,96 +111,94 @@ class Predictor(object):
self.output_tensor = self.paddle_predictor.get_output_handle(
output_names[0])
def create_paddle_predictor(self):
inference_model_dir = self.args.model_dir
model_file = os.path.join(inference_model_dir, self.args.model_filename)
params_file = os.path.join(inference_model_dir,
self.args.params_filename)
def _create_paddle_predictor(self):
inference_model_dir = args.model_path
model_file = os.path.join(inference_model_dir, args.model_filename)
params_file = os.path.join(inference_model_dir, args.params_filename)
config = paddle.inference.Config(model_file, params_file)
precision = paddle.inference.Config.Precision.Float32
if self.args.use_int8:
if args.use_int8:
precision = paddle.inference.Config.Precision.Int8
elif self.args.use_fp16:
elif args.use_fp16:
precision = paddle.inference.Config.Precision.Half
if self.args.use_gpu:
config.enable_use_gpu(self.args.gpu_mem, 0)
if args.use_gpu:
config.enable_use_gpu(args.gpu_mem, 0)
else:
config.disable_gpu()
if self.args.enable_mkldnn:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.set_cpu_math_library_num_threads(args.cpu_num_threads)
config.switch_ir_optim()
if args.use_mkldnn:
config.enable_mkldnn()
config.set_cpu_math_library_num_threads(self.args.cpu_num_threads)
if args.use_int8:
config.enable_mkldnn_int8(
{"conv2d", "depthwise_conv2d", "transpose2", "pool2d"})
if self.args.enable_profile:
config.enable_profile()
config.switch_ir_optim(self.args.ir_optim) # default true
if self.args.use_tensorrt:
config.switch_ir_optim(args.ir_optim) # default true
if args.use_trt:
config.enable_tensorrt_engine(
precision_mode=precision,
max_batch_size=self.args.batch_size,
max_batch_size=args.batch_size,
workspace_size=1 << 30,
min_subgraph_size=30,
use_calib_mode=False)
use_static=True,
use_calib_mode=False, )
if args.use_dynamic_shape:
dynamic_shape_file = os.path.join(inference_model_dir,
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(
dynamic_shape_file, True)
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
self.rerun_flag = True
config.enable_memory_optim()
# use zero copy
config.switch_use_feed_fetch_ops(False)
predictor = create_predictor(config)
return predictor
def predict(self):
test_num = 1000
test_time = 0.0
for i in range(0, test_num + 10):
inputs = np.random.rand(self.args.batch_size, 3, self.args.img_size,
self.args.img_size).astype(np.float32)
start_time = time.time()
self.input_tensor.copy_from_cpu(inputs)
self.paddle_predictor.run()
batch_output = self.output_tensor.copy_to_cpu().flatten()
if i >= 10:
test_time += time.time() - start_time
time.sleep(0.01) # sleep for T4 GPU
fp_message = "FP16" if self.args.use_fp16 else "FP32"
fp_message = "INT8" if self.args.use_int8 else fp_message
trt_msg = "using tensorrt" if self.args.use_tensorrt else "not using tensorrt"
print("{0}\t{1}\tbatch size: {2}\ttime(ms): {3}".format(
trt_msg, fp_message, args.batch_size, 1000 * test_time / test_num))
def eval(self):
if os.path.exists(self.args.data_path):
"""
eval func
"""
if os.path.exists(args.data_path):
val_loader = eval_reader(
self.args.data_path,
batch_size=self.args.batch_size,
crop_size=self.args.img_size,
resize_size=self.args.resize_size)
args.data_path,
batch_size=args.batch_size,
crop_size=args.img_size,
resize_size=args.resize_size)
else:
image = np.ones((1, 3, self.args.img_size,
self.args.img_size)).astype(np.float32)
image = np.ones(
(1, 3, args.img_size, args.img_size)).astype(np.float32)
label = None
val_loader = [[image, label]]
results = []
with tqdm(
total=len(val_loader),
bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
ncols=80) as t:
for batch_id, (image, label) in enumerate(val_loader):
input_names = self.paddle_predictor.get_input_names()
input_tensor = self.paddle_predictor.get_input_handle(
input_names[0])
input_tensor = self.paddle_predictor.get_input_handle(input_names[0])
output_names = self.paddle_predictor.get_output_names()
output_tensor = self.paddle_predictor.get_output_handle(
output_names[0])
output_tensor = self.paddle_predictor.get_output_handle(output_names[0])
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
sample_nums = len(val_loader)
for batch_id, (image, label) in enumerate(val_loader):
image = np.array(image)
input_tensor.copy_from_cpu(image)
start_time = time.time()
self.paddle_predictor.run()
batch_output = output_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
if self.rerun_flag:
return
sort_array = batch_output.argsort(axis=1)
top_1_pred = sort_array[:, -1:][:, ::-1]
if label is None:
......@@ -211,22 +208,43 @@ class Predictor(object):
top_1 = np.mean(label == top_1_pred)
top_5_pred = sort_array[:, -5:][:, ::-1]
acc_num = 0
for i in range(len(label)):
for i, _ in enumerate(label):
if label[i][0] in top_5_pred[i]:
acc_num += 1
top_5 = float(acc_num) / len(label)
results.append([top_1, top_5])
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
result = np.mean(np.array(results), axis=0)
t.update()
print('Evaluation result: {}'.format(result[0]))
fp_message = "FP16" if args.use_fp16 else "FP32"
fp_message = "INT8" if args.use_int8 else fp_message
print_msg = "Paddle"
if args.use_trt:
print_msg = "using TensorRT"
elif args.use_mkldnn:
print_msg = "using MKLDNN"
time_avg = predict_time / sample_nums
print(
"[Benchmark]{}\t{}\tbatch size: {}.Inference time(ms): min={}, max={}, avg={}".
format(
print_msg,
fp_message,
args.batch_size,
round(time_min * 1000, 2),
round(time_max * 1000, 1),
round(time_avg * 1000, 1), ))
print("[Benchmark] Evaluation acc result: {}".format(result[0]))
sys.stdout.flush()
if __name__ == "__main__":
parser = argsparser()
global args
args = parser.parse_args()
predictor = Predictor(args)
predictor.predict()
if args.eval:
predictor = Predictor()
predictor.eval()
if predictor.rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
......@@ -194,25 +194,41 @@ Quantization:
## 5. 预测部署
- Python部署:
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[infer.py](./infer.py)进行部署
- TensorRT预测
本示例将以ERNIE 3.0-Medium模型、afqmc数据集的为例,介绍如何利用Paddle—TensorRT测试压缩后模型的精度和速度。
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
精度测试方法
首先下载量化好的模型
```shell
python infer.py --task_name='afqmc' --model_path='./save_ernie3.0_afqmc/' --device='gpu' --use_trt --int8
wget https://bj.bcebos.com/v1/paddle-slim-models/act/save_ppminilm_afqmc_new_calib.tar
tar -xf save_ppminilm_afqmc_new_calib.tar
```
速度测试方法
```shell
python infer.py --task_name='afqmc' --model_path='./save_ernie3.0_afqmc/' --device='gpu' --use_trt --int8 --perf
python paddle_inference_eval.py \
--model_path=save_ernie3_afqmc_new_cablib \
--model_filename=infer.pdmodel \
--params_filename=infer.pdiparams \
--task_name='afqmc' \
--use_trt \
--precision=int8
```
- [PP-MiniLM Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/model_compression/pp-minilm)
- [ERNIE-3.0 Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0)
- MKLDNN预测:
```shell
python paddle_inference_eval.py \
--model_path=save_ernie3_afqmc_new_cablib \
--model_filename=infer.pdmodel \
--params_filename=infer.pdiparams \
--task_name='afqmc' \
--device=cpu \
--use_mkldnn=True \
--cpu_threads=10 \
--precision=int8
```
## 6. FAQ
......@@ -45,96 +45,42 @@ METRIC_CLASSES = {
}
def convert_example(example, dataset, tokenizer, label_list,
max_seq_length=512):
assert dataset in ['glue', 'clue'
], "This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if dataset == 'glue':
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example['labels']
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
return example['input_ids'], example['token_type_ids'], label
else: #if dataset == 'clue':
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
example['label'] = np.array(
example["label"], dtype="int64").reshape((-1, 1))
label = example['label']
# Convert raw text to feature
if 'keyword' in example: # CSL
sentence1 = " ".join(example['keyword'])
example = {
'sentence1': sentence1,
'sentence2': example['abst'],
'label': example['label']
}
elif 'target' in example: # wsc
text, query, pronoun, query_idx, pronoun_idx = example[
'text'], example['target']['span1_text'], example['target'][
'span2_text'], example['target']['span1_index'], example[
'target']['span2_index']
text_list = list(text)
assert text[pronoun_idx:(pronoun_idx + len(
pronoun))] == pronoun, "pronoun: {}".format(pronoun)
assert text[query_idx:(query_idx + len(query)
)] == query, "query: {}".format(query)
if pronoun_idx > query_idx:
text_list.insert(query_idx, "_")
text_list.insert(query_idx + len(query) + 1, "_")
text_list.insert(pronoun_idx + 2, "[")
text_list.insert(pronoun_idx + len(pronoun) + 2 + 1, "]")
else:
text_list.insert(pronoun_idx, "[")
text_list.insert(pronoun_idx + len(pronoun) + 1, "]")
text_list.insert(query_idx + 2, "_")
text_list.insert(query_idx + len(query) + 2 + 1, "_")
text = "".join(text_list)
example['sentence'] = text
if tokenizer is None:
return example
if 'sentence' in example:
example = tokenizer(example['sentence'], max_seq_len=max_seq_length)
elif 'sentence1' in example:
example = tokenizer(
example['sentence1'],
text_pair=example['sentence2'],
max_seq_len=max_seq_length)
return example['input_ids'], example['token_type_ids'], label
def parse_args():
"""
parse_args func
"""
parser = argparse.ArgumentParser()
# Required parameters
parser.add_argument(
"--model_path",
default="./afqmc",
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument(
"--model_filename",
type=str,
default="inference.pdmodel",
help="model file name")
parser.add_argument(
"--params_filename",
type=str,
default="inference.pdiparams",
help="params file name")
parser.add_argument(
"--task_name",
default='afqmc',
default="afqmc",
type=str,
help="The name of the task to perform predict, selected in the list: " +
", ".join(METRIC_CLASSES.keys()), )
parser.add_argument(
"--dataset",
default='clue',
default="clue",
type=str,
help="The dataset of model.", )
parser.add_argument(
"--model_path",
default='./quant_models/model',
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument(
"--device",
default="gpu",
choices=["gpu", "cpu", "xpu"],
choices=["gpu", "cpu"],
help="Device selected for inference.", )
parser.add_argument(
"--batch_size",
......@@ -154,25 +100,101 @@ def parse_args():
help="Warmup steps for performance test.", )
parser.add_argument(
"--use_trt",
action='store_true',
action="store_true",
help="Whether to use inference engin TensorRT.", )
parser.add_argument(
"--perf",
action='store_true',
help="Whether to test performance.", )
"--precision",
type=str,
default="fp32",
choices=["fp32", "fp16", "int8"],
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
)
parser.add_argument(
"--int8",
action='store_true',
help="Whether to use int8 inference.", )
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--fp16",
action='store_true',
help="Whether to use float16 inference.", )
"--cpu_threads", type=int, default=1, help="Num of cpu threads.")
args = parser.parse_args()
return args
def _convert_example(example,
dataset,
tokenizer,
label_list,
max_seq_length=512):
assert dataset in ["glue", "clue"
], "This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if dataset == "glue":
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example["labels"]
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
example = tokenizer(example["sentence"], max_seq_len=max_seq_length)
return example["input_ids"], example["token_type_ids"], label
else: # if dataset == 'clue':
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
example["label"] = np.array(
example["label"], dtype="int64").reshape((-1, 1))
label = example["label"]
# Convert raw text to feature
if "keyword" in example: # CSL
sentence1 = " ".join(example["keyword"])
example = {
"sentence1": sentence1,
"sentence2": example["abst"],
"label": example["label"]
}
elif "target" in example: # wsc
text, query, pronoun, query_idx, pronoun_idx = (
example["text"],
example["target"]["span1_text"],
example["target"]["span2_text"],
example["target"]["span1_index"],
example["target"]["span2_index"], )
text_list = list(text)
assert text[pronoun_idx:(pronoun_idx + len(
pronoun))] == pronoun, "pronoun: {}".format(pronoun)
assert text[query_idx:(query_idx + len(query)
)] == query, "query: {}".format(query)
if pronoun_idx > query_idx:
text_list.insert(query_idx, "_")
text_list.insert(query_idx + len(query) + 1, "_")
text_list.insert(pronoun_idx + 2, "[")
text_list.insert(pronoun_idx + len(pronoun) + 2 + 1, "]")
else:
text_list.insert(pronoun_idx, "[")
text_list.insert(pronoun_idx + len(pronoun) + 1, "]")
text_list.insert(query_idx + 2, "_")
text_list.insert(query_idx + len(query) + 2 + 1, "_")
text = "".join(text_list)
example["sentence"] = text
if tokenizer is None:
return example
if "sentence" in example:
example = tokenizer(example["sentence"], max_seq_len=max_seq_length)
elif "sentence1" in example:
example = tokenizer(
example["sentence1"],
text_pair=example["sentence2"],
max_seq_len=max_seq_length)
return example["input_ids"], example["token_type_ids"], label
class Predictor(object):
"""
Inference Predictor class
"""
def __init__(self, predictor, input_handles, output_handles):
self.predictor = predictor
self.input_handles = input_handles
......@@ -180,60 +202,50 @@ class Predictor(object):
@classmethod
def create_predictor(cls, args):
config = paddle.inference.Config(args.model_path + "infer.pdmodel",
args.model_path + "infer.pdiparams")
"""
create_predictor func
"""
cls.rerun_flag = False
config = paddle.inference.Config(
os.path.join(args.model_path, args.model_filename),
os.path.join(args.model_path, args.params_filename))
if args.device == "gpu":
# set GPU configs accordingly
config.enable_use_gpu(100, 0)
cls.device = paddle.set_device("gpu")
elif args.device == "cpu":
# set CPU configs accordingly,
# such as enable_mkldnn, set_cpu_math_library_num_threads
config.disable_gpu()
cls.device = paddle.set_device("cpu")
elif args.device == "xpu":
# set XPU configs accordingly
config.enable_xpu(100)
if args.use_trt:
if args.int8:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Int8,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
elif args.fp16:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Half,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(args.cpu_threads)
config.switch_ir_optim()
if args.use_mkldnn:
config.enable_mkldnn()
if args.precision == "int8":
config.enable_mkldnn_int8()
precision_map = {
"int8": inference.PrecisionType.Int8,
"fp32": inference.PrecisionType.Float32,
"fp16": inference.PrecisionType.Half,
}
if args.precision in precision_map.keys() and args.use_trt:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Float32,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
print("Enable TensorRT is: {}".format(
config.tensorrt_engine_enabled()))
precision_mode=precision_map[args.precision],
use_static=True,
use_calib_mode=False, )
dynamic_shape_file = os.path.join(args.model_path,
'dynamic_shape.txt')
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True)
print('trt set dynamic shape done!')
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print(
'Start collect dynamic shape... Please eval again to get real result in TensorRT'
)
sys.exit()
print("Start collect dynamic shape...")
cls.rerun_flag = True
predictor = paddle.inference.create_predictor(config)
......@@ -249,6 +261,9 @@ class Predictor(object):
return cls(predictor, input_handles, output_handles)
def predict_batch(self, data):
"""
predict from batch func
"""
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field)
self.predictor.run()
......@@ -257,11 +272,11 @@ class Predictor(object):
]
return output
def convert_predict_batch(self, args, data, tokenizer, batchify_fn,
def _convert_predict_batch(self, args, data, tokenizer, batchify_fn,
label_list):
examples = []
for example in data:
example = convert_example(
example = _convert_example(
example,
args.dataset,
tokenizer,
......@@ -272,64 +287,82 @@ class Predictor(object):
return examples
def predict(self, dataset, tokenizer, batchify_fn, args):
"""
predict func
"""
batches = [
dataset[idx:idx + args.batch_size]
for idx in range(0, len(dataset), args.batch_size)
]
if args.perf:
for i, batch in enumerate(batches):
examples = self.convert_predict_batch(
examples = self._convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, label = batchify_fn(examples)
output = self.predict_batch([input_ids, segment_ids])
if i > args.perf_warmup_steps:
break
start_time = time.time()
for i, batch in enumerate(batches):
examples = self.convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, _ = batchify_fn(examples)
output = self.predict_batch([input_ids, segment_ids])
end_time = time.time()
sequences_num = i * args.batch_size
print("task name: %s, time: %s qps/s, " %
(args.task_name, sequences_num / (end_time - start_time)))
if self.rerun_flag:
return
else:
metric = METRIC_CLASSES[args.task_name]()
metric.reset()
predict_time = 0.0
for i, batch in enumerate(batches):
examples = self.convert_predict_batch(
examples = self._convert_predict_batch(
args, batch, tokenizer, batchify_fn, dataset.label_list)
input_ids, segment_ids, label = batchify_fn(examples)
start_time = time.time()
output = self.predict_batch([input_ids, segment_ids])
end_time = time.time()
predict_time += end_time - start_time
correct = metric.compute(
paddle.to_tensor(output),
paddle.to_tensor(np.array(label).flatten()))
metric.update(correct)
sequences_num = i * args.batch_size
print(
"[benchmark]task name: {}, batch size: {} Inference time per batch: {}ms, qps: {}.".
format(
args.task_name,
args.batch_size,
round(predict_time * 1000 / i, 2),
round(sequences_num / predict_time, 2), ))
res = metric.accumulate()
print("task name: %s, acc: %s, \n" % (args.task_name, res), end='')
print(
"[benchmark]task name: %s, acc: %s. \n" % (args.task_name, res),
end="")
sys.stdout.flush()
def main():
"""
main func
"""
paddle.seed(42)
args = parse_args()
args.task_name = args.task_name.lower()
if args.use_mkldnn:
paddle.set_device("cpu")
predictor = Predictor.create_predictor(args)
dev_ds = load_dataset('clue', args.task_name, splits='dev')
dev_ds = load_dataset("clue", args.task_name, splits="dev")
tokenizer = AutoTokenizer.from_pretrained(args.model_path)
batchify_fn = lambda samples, fn=Tuple(
Pad(axis=0, pad_val=tokenizer.pad_token_id), # input
Pad(axis=0, pad_val=tokenizer.pad_token_id), # segment
Stack(dtype="int64" if dev_ds.label_list else "float32") # label
Stack(dtype="int64" if dev_ds.label_list else "float32"), # label
): fn(samples)
outputs = predictor.predict(dev_ds, tokenizer, batchify_fn, args)
predictor.predict(dev_ds, tokenizer, batchify_fn, args)
if predictor.rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if __name__ == "__main__":
paddle.set_device("cpu")
main()
......@@ -14,12 +14,9 @@
## 1. 简介
飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。
本示例将以[Pytorch](https://github.com/pytorch/pytorch)框架的自然语言处理模型为例,介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用[huggingface](https://github.com/huggingface/transformers)开源transformers库,将Pytorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和量化训练。
## 2. Benchmark
[BERT](https://arxiv.org/abs/1810.04805)```Bidirectional Encoder Representations from Transformers```)以Transformer 编码器为网络基本组件,使用掩码语言模型(```Masked Language Model```)和邻接句子预测(```Next Sentence Prediction```)两个任务在大规模无标注文本语料上进行预训练(pre-train),得到融合了双向内容的通用语义表示模型。以预训练产生的通用语义表示模型为基础,结合任务适配的简单输出层,微调(fine-tune)后即可应用到下游的NLP任务,效果通常也较直接在下游的任务上训练的模型更优。此前BERT即在[GLUE](https://gluebenchmark.com/tasks)评测任务上取得了SOTA的结果。
......@@ -192,41 +189,38 @@ python run.py --config_path=./configs/cola.yaml --eval True
## 4. 预测部署
环境配置:若使用 Paddle TensorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
启动配置
- TensorRT预测
除需传入```task_name```任务名称,```model_name_or_path```模型名称,```model_path```保存inference模型的路径等基本参数外,还需根据预测环境传入预测参数:
- ```device```:默认为gpu,可选为gpu, cpu, xpu
- ```use_trt```:是否使用 TesorRT 预测引擎
- ```int8```:是否启用```INT8```
- ```fp16```:是否启用```FP16```
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
准备好inference模型后,可以使用```infer.py```进行预测,如使用 TesorRT 预测引擎测试 FP32 模型:
首先下载量化好的模型:
```shell
python -u ./infer.py \
--task_name cola \
--model_name_or_path bert-base-cased \
--model_path ./x2paddle_cola/model \
--batch_size 1 \
--max_seq_length 128 \
--device gpu \
--use_trt
wget https://bj.bcebos.com/v1/paddle-slim-models/act/x2paddle_cola_new_calib.tar
tar -xf x2paddle_cola_new_calib.tar
```
如使用 TesorRT 预测引擎测试 INT8 模型:
```shell
python -u ./infer.py \
--task_name cola \
--model_name_or_path bert-base-cased \
--model_path ./output/cola/model \
--batch_size 1 \
--max_seq_length 128 \
--device gpu \
python paddle_inference_eval.py \
--model_path=x2paddle_cola_new_calib \
--use_trt \
--int8
--precision=int8 \
--batch_size=1
```
- MKLDNN预测:
```shell
python paddle_inference_eval.py \
--model_path=x2paddle_cola_new_calib \
--device=cpu \
--use_mkldnn=True \
--cpu_threads=10 \
--batch_size=1 \
--precision=int8
```
......
......@@ -22,9 +22,9 @@ import numpy as np
import paddle
from paddle import inference
from paddle.metric import Metric, Accuracy, Precision, Recall
from paddlenlp.datasets import load_dataset
from paddlenlp.data import Stack, Tuple, Pad
from paddle.metric import Metric, Accuracy, Precision, Recall
from paddlenlp.metrics import AccuracyAndF1, Mcc, PearsonAndSpearman
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer
......@@ -53,35 +53,46 @@ task_to_keys = {
def parse_args():
"""
parse_args func
"""
parser = argparse.ArgumentParser()
# Required parameters
parser.add_argument(
"--model_path",
default="./x2paddle_cola",
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument(
"--model_filename",
type=str,
default="model.pdmodel",
help="model file name")
parser.add_argument(
"--params_filename",
type=str,
default="model.pdiparams",
help="params file name")
parser.add_argument(
"--task_name",
default='cola',
default="cola",
type=str,
help="The name of the task to perform predict, selected in the list: " +
", ".join(METRIC_CLASSES.keys()), )
parser.add_argument(
"--model_type",
default='bert-base-cased',
default="bert-base-cased",
type=str,
help="Model type selected in bert.")
parser.add_argument(
"--model_name_or_path",
default='bert-base-cased',
default="bert-base-cased",
type=str,
help="The directory or name of model.", )
parser.add_argument(
"--model_path",
default='./quant_models/model',
type=str,
required=True,
help="The path prefix of inference model to be used.", )
parser.add_argument(
"--device",
default="gpu",
choices=["gpu", "cpu", "xpu"],
choices=["gpu", "cpu"],
help="Device selected for inference.", )
parser.add_argument(
"--batch_size",
......@@ -101,42 +112,45 @@ def parse_args():
help="Warmup steps for performance test.", )
parser.add_argument(
"--use_trt",
action='store_true',
action="store_true",
help="Whether to use inference engin TensorRT.", )
parser.add_argument(
"--perf",
action='store_true',
help="Whether to test performance.", )
"--precision",
type=str,
default="fp32",
choices=["fp32", "fp16", "int8"],
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
)
parser.add_argument(
"--int8",
action='store_true',
help="Whether to use int8 inference.", )
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--fp16",
action='store_true',
help="Whether to use float16 inference.", )
"--cpu_threads", type=int, default=1, help="Num of cpu threads.")
args = parser.parse_args()
return args
def convert_example(example,
def _convert_example(
example,
tokenizer,
label_list,
max_seq_length=512,
task_name=None,
is_test=False,
padding='max_length',
return_attention_mask=True):
padding="max_length",
return_attention_mask=True, ):
if not is_test:
# `label_list == None` is for regression task
label_dtype = "int64" if label_list else "float32"
# Get the label
label = example['labels']
label = example["labels"]
label = np.array([label], dtype=label_dtype)
# Convert raw text to feature
sentence1_key, sentence2_key = task_to_keys[task_name]
texts = ((example[sentence1_key], ) if sentence2_key is None else
(example[sentence1_key], example[sentence2_key]))
texts = (example[sentence1_key], ) if sentence2_key is None else (
example[sentence1_key], example[sentence2_key])
example = tokenizer(
*texts,
max_seq_len=max_seq_length,
......@@ -144,19 +158,23 @@ def convert_example(example,
return_attention_mask=return_attention_mask)
if not is_test:
if return_attention_mask:
return example['input_ids'], example['attention_mask'], example[
'token_type_ids'], label
return example["input_ids"], example["attention_mask"], example[
"token_type_ids"], label
else:
return example['input_ids'], example['token_type_ids'], label
return example["input_ids"], example["token_type_ids"], label
else:
if return_attention_mask:
return example['input_ids'], example['attention_mask'], example[
'token_type_ids']
return example["input_ids"], example["attention_mask"], example[
"token_type_ids"]
else:
return example['input_ids'], example['token_type_ids']
return example["input_ids"], example["token_type_ids"]
class Predictor(object):
"""
Inference Predictor class
"""
def __init__(self, predictor, input_handles, output_handles):
self.predictor = predictor
self.input_handles = input_handles
......@@ -164,60 +182,51 @@ class Predictor(object):
@classmethod
def create_predictor(cls, args):
config = paddle.inference.Config(args.model_path + ".pdmodel",
args.model_path + ".pdiparams")
"""
create_predictor func
"""
cls.rerun_flag = False
config = paddle.inference.Config(
os.path.join(args.model_path, args.model_filename),
os.path.join(args.model_path, args.params_filename))
if args.device == "gpu":
# set GPU configs accordingly
config.enable_use_gpu(100, 0)
cls.device = paddle.set_device("gpu")
elif args.device == "cpu":
# set CPU configs accordingly,
# such as enable_mkldnn, set_cpu_math_library_num_threads
config.disable_gpu()
cls.device = paddle.set_device("cpu")
elif args.device == "xpu":
# set XPU configs accordingly
config.enable_xpu(100)
if args.use_trt:
if args.int8:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Int8,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
elif args.fp16:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Half,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(args.cpu_threads)
config.switch_ir_optim()
if args.use_mkldnn:
config.enable_mkldnn()
if args.precision == "int8":
config.enable_mkldnn_int8(
{"fc", "reshape2", "transpose2", "slice"})
precision_map = {
"int8": inference.PrecisionType.Int8,
"fp32": inference.PrecisionType.Float32,
"fp16": inference.PrecisionType.Half,
}
if args.precision in precision_map.keys() and args.use_trt:
config.enable_tensorrt_engine(
workspace_size=1 << 30,
precision_mode=inference.PrecisionType.Float32,
max_batch_size=args.batch_size,
min_subgraph_size=5,
use_static=False,
use_calib_mode=False)
print("Enable TensorRT is: {}".format(
config.tensorrt_engine_enabled()))
precision_mode=precision_map[args.precision],
use_static=True,
use_calib_mode=False, )
model_dir = os.path.dirname(args.model_path)
dynamic_shape_file = os.path.join(model_dir, 'dynamic_shape.txt')
dynamic_shape_file = os.path.join(args.model_path,
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True)
print('trt set dynamic shape done!')
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print(
'Start collect dynamic shape... Please eval again to get real result in TensorRT'
)
sys.exit()
print("Start collect dynamic shape...")
cls.rerun_flag = True
predictor = paddle.inference.create_predictor(config)
......@@ -233,6 +242,9 @@ class Predictor(object):
return cls(predictor, input_handles, output_handles)
def predict(self, dataset, collate_fn, args):
"""
predict func
"""
batch_sampler = paddle.io.BatchSampler(
dataset, batch_size=args.batch_size, shuffle=False)
data_loader = paddle.io.DataLoader(
......@@ -241,94 +253,92 @@ class Predictor(object):
collate_fn=collate_fn,
num_workers=0,
return_list=True)
end_time = 0
if args.perf:
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy(
) if isinstance(input_field, paddle.Tensor) else
input_field)
input_handle.copy_from_cpu(input_field.numpy() if isinstance(
input_field, paddle.Tensor) else input_field)
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
if i > args.perf_warmup_steps:
break
if self.rerun_flag:
return
time1 = time.time()
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy(
) if isinstance(input_field, paddle.Tensor) else
input_field)
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
sequences_num = i * args.batch_size
print("task name: %s, time: %s qps/s, " %
(args.task_name, sequences_num / (time.time() - time1)))
else:
metric = METRIC_CLASSES[args.task_name]()
metric.reset()
predict_time = 0.0
for i, data in enumerate(data_loader):
for input_field, input_handle in zip(data, self.input_handles):
input_handle.copy_from_cpu(input_field.numpy(
) if isinstance(input_field, paddle.Tensor) else
input_field)
input_handle.copy_from_cpu(input_field.numpy() if isinstance(
input_field, paddle.Tensor) else input_field)
start_time = time.time()
self.predictor.run()
output = [
output_handle.copy_to_cpu()
for output_handle in self.output_handles
]
end_time = time.time()
predict_time += end_time - start_time
label = data[-1]
correct = metric.compute(
paddle.to_tensor(output[0]),
paddle.to_tensor(np.array(label).flatten()))
print(correct)
metric.update(correct)
sequences_num = i * args.batch_size
print(
"[benchmark]task name: {}, batch size: {} Inference time per batch: {}ms, qps: {}.".
format(
args.task_name,
args.batch_size,
round(predict_time * 1000 / i, 2),
round(sequences_num / predict_time, 2), ))
res = metric.accumulate()
print("task name: %s, acc: %s, \n" % (args.task_name, res), end='')
print(
"[benchmark]task name: %s, acc: %s. \n" % (args.task_name, res),
end="")
sys.stdout.flush()
def main():
"""
main func
"""
paddle.seed(42)
args = parse_args()
if args.use_mkldnn:
paddle.set_device("cpu")
predictor = Predictor.create_predictor(args)
args.task_name = args.task_name.lower()
args.model_type = args.model_type.lower()
dev_ds = load_dataset('glue', args.task_name, splits='dev')
dev_ds = load_dataset("glue", args.task_name, splits="dev")
tokenizer = BertTokenizer.from_pretrained(args.model_name_or_path)
trans_func = partial(
convert_example,
_convert_example,
tokenizer=tokenizer,
label_list=dev_ds.label_list,
max_seq_length=args.max_seq_length,
task_name=args.task_name,
return_attention_mask=True)
return_attention_mask=True, )
dev_ds = dev_ds.map(trans_func)
batchify_fn = lambda samples, fn=Tuple(
Pad(axis=0, pad_val=tokenizer.pad_token_id), # input
Pad(axis=0, pad_val=0),
Pad(axis=0, pad_val=tokenizer.pad_token_id), # segment
Stack(dtype="int64" if dev_ds.label_list else "float32") # label
Stack(dtype="int64" if dev_ds.label_list else "float32"), # label
): fn(samples)
predictor.predict(dev_ds, batchify_fn, args)
if __name__ == "__main__":
paddle.set_device("cpu")
main()
......@@ -8,7 +8,6 @@
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
- [3.5 测试模型精度](#35-测试模型精度)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
......@@ -149,14 +148,6 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
--config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```
## 4.预测部署
......@@ -164,31 +155,60 @@ python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```shell
├── model.pdiparams # Paddle预测模型权重
├── model.pdmodel # Paddle预测模型文件
├── calibration_table.txt # Paddle量化后校准表
├── ONNX
│ ├── quant_model.onnx # 量化后转出的ONNX模型
│ ├── calibration.cache # TensorRT可以直接加载的校准表
```
#### 导出至ONNX使用TensorRT部署
#### Paddle Inference部署测试
加载`quant_model.onnx``calibration.cache`,可以直接使用TensorRT测试脚本进行验证,详细代码可参考[TensorRT部署](./TensorRT)
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| dataset_dir | eval时数据验证集路径, 默认`dataset/coco` |
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn``````use_gpu```同时为```True```时,将忽略```enable_mkldnn```,而使用```GPU```预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括`fp32/fp16/int8` |
TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[paddle_inference_eval.py](./paddle_inference_eval.py)进行部署:
- python测试:
```shell
cd TensorRT
python trt_eval.py --onnx_model_file=output/ONNX/quant_model.onnx \
--calibration_file=output/ONNX/calibration.cache \
--image_file=../images/000000570688.jpg \
--precision_mode=int8
python paddle_inference_eval.py \
--model_path=output \
--reader_config=configs/yoloe_reader.yml \
--use_trt=True \
--precision=int8
```
- 速度测试
- MKLDNN预测:
```shell
trtexec --onnx=output/ONNX/quant_model.onnx --avgRuns=1000 --workspace=1024 --calib=output/ONNX/calibration.cache --int8
python paddle_inference_eval.py \
--model_path=output \
--reader_config=configs/yoloe_reader.yml \
--device=CPU \
--use_mkldnn=True \
--cpu_threads=10 \
--precision=int8
```
- 测试单张图片
```shell
python paddle_inference_eval.py --model_path=output --image_file=images/000000570688.jpg --use_trt=True --precision=int8
```
#### Paddle-TensorRT部署
- C++部署
进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
......@@ -199,13 +219,22 @@ bash compile.sh
./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
```
- Python部署:
#### 导出至ONNX使用TensorRT部署
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
加载`quant_model.onnx``calibration.cache`,可以直接使用TensorRT测试脚本进行验证,详细代码可参考[TensorRT部署](./TensorRT)
- python测试:
```shell
cd TensorRT
python trt_eval.py --onnx_model_file=output/ONNX/quant_model.onnx \
--calibration_file=output/ONNX/calibration.cache \
--image_file=../images/000000570688.jpg \
--precision_mode=int8
```
然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
- 速度测试
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
trtexec --onnx=output/ONNX/quant_model.onnx --avgRuns=1000 --workspace=1024 --calib=output/ONNX/calibration.cache --int8
```
## 5.FAQ
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import sys
import argparse
import cv2
import numpy as np
from tqdm import tqdm
import pkg_resources as pkg
import paddle
from paddle.inference import Config
from paddle.inference import create_predictor
from dataset import COCOValDataset
from post_process import YOLOPostProcess, coco_metric
def argsparser():
"""
argsparser func
"""
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--model_path", type=str, help="inference model filepath")
parser.add_argument(
"--image_file",
type=str,
default=None,
help="image path, if set image_file, it will not eval coco.")
parser.add_argument(
"--dataset_dir",
type=str,
default="dataset/coco",
help="COCO dataset dir.")
parser.add_argument(
"--val_image_dir",
type=str,
default="val2017",
help="COCO dataset val image dir.")
parser.add_argument(
"--val_anno_path",
type=str,
default="annotations/instances_val2017.json",
help="COCO dataset anno path.")
parser.add_argument(
"--benchmark",
type=bool,
default=False,
help="Whether run benchmark or not.")
parser.add_argument(
"--use_dynamic_shape",
type=bool,
default=True,
help="Whether use dynamic shape or not.")
parser.add_argument(
"--use_trt",
type=bool,
default=False,
help="Whether use TensorRT or not.")
parser.add_argument(
"--precision",
type=str,
default="paddle",
help="mode of running(fp32/fp16/int8)")
parser.add_argument(
"--device",
type=str,
default="GPU",
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU",
)
parser.add_argument(
"--arch", type=str, default="YOLOv5", help="architectures name.")
parser.add_argument("--img_shape", type=int, default=640, help="input_size")
parser.add_argument(
"--batch_size", type=int, default=1, help="Batch size of model input.")
parser.add_argument(
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--cpu_threads", type=int, default=1, help="Num of cpu threads.")
return parser
CLASS_LABEL = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
def preprocess(image, input_size, mean=None, std=None, swap=(2, 0, 1)):
"""
image preprocess func
"""
if len(image.shape) == 3:
padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
else:
padded_img = np.ones(input_size) * 114.0
img = np.array(image)
r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
resized_img = cv2.resize(
img,
(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR, ).astype(np.float32)
padded_img[:int(img.shape[0] * r), :int(img.shape[1] * r)] = resized_img
padded_img = padded_img[:, :, ::-1]
padded_img /= 255.0
if mean is not None:
padded_img -= mean
if std is not None:
padded_img /= std
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, r
def get_color_map_list(num_classes):
"""
get_color_map_list func
"""
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= ((lab >> 0) & 1) << (7 - j)
color_map[i * 3 + 1] |= ((lab >> 1) & 1) << (7 - j)
color_map[i * 3 + 2] |= ((lab >> 2) & 1) << (7 - j)
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def draw_box(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
"""
draw_box func
"""
color_list = get_color_map_list(len(class_names))
for i, _ in enumerate(boxes):
box = boxes[i]
cls_id = int(cls_ids[i])
color = tuple(color_list[cls_id])
score = scores[i]
if score < conf:
continue
x0 = int(box[0])
y0 = int(box[1])
x1 = int(box[2])
y1 = int(box[3])
text = "{}:{:.1f}%".format(class_names[cls_id], score * 100)
font = cv2.FONT_HERSHEY_SIMPLEX
txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
cv2.rectangle(img, (x0, y0 + 1), (
x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])), color, -1)
cv2.putText(
img,
text, (x0, y0 + txt_size[1]),
font,
0.8, (0, 255, 0),
thickness=2)
return img
def get_current_memory_mb():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
import pynvml
import psutil
import GPUtil
gpu_id = int(os.environ.get("CUDA_VISIBLE_DEVICES", 0))
pid = os.getpid()
p = psutil.Process(pid)
info = p.memory_full_info()
cpu_mem = info.uss / 1024.0 / 1024.0
gpu_mem = 0
gpu_percent = 0
gpus = GPUtil.getGPUs()
if gpu_id is not None and len(gpus) > 0:
gpu_percent = gpus[gpu_id].load
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
gpu_mem = meminfo.used / 1024.0 / 1024.0
return round(cpu_mem, 4), round(gpu_mem, 4)
def load_predictor(
model_dir,
precision="fp32",
use_trt=False,
use_mkldnn=False,
batch_size=1,
device="CPU",
min_subgraph_size=3,
use_dynamic_shape=False,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
cpu_threads=1, ):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
precision (str): mode of running(fp32/fp16/int8)
use_trt (bool): whether use TensorRT or not.
use_mkldnn (bool): whether use MKLDNN or not in CPU.
device (str): Choose the device you want to run, it can be: CPU/GPU, default is CPU
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
rerun_flag = False
if device != "GPU" and use_trt:
raise ValueError(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}".
format(precision, device))
config = Config(
os.path.join(model_dir, "model.pdmodel"),
os.path.join(model_dir, "model.pdiparams"))
if device == "GPU":
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(cpu_threads)
config.switch_ir_optim()
if use_mkldnn:
config.enable_mkldnn()
if precision == "int8":
config.enable_mkldnn_int8({"conv2d", "transpose2", "pool2d"})
precision_map = {
"int8": Config.Precision.Int8,
"fp32": Config.Precision.Float32,
"fp16": Config.Precision.Half,
}
if precision in precision_map.keys() and use_trt:
config.enable_tensorrt_engine(
workspace_size=(1 << 25) * batch_size,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[precision],
use_static=True,
use_calib_mode=False, )
if use_dynamic_shape:
dynamic_shape_file = os.path.join(FLAGS.model_path,
"dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
config.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True)
print("trt set dynamic shape done!")
else:
config.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
rerun_flag = True
# enable shared memory
config.enable_memory_optim()
predictor = create_predictor(config)
return predictor, rerun_flag
def eval(predictor, val_loader, anno_file, rerun_flag=False):
"""
eval main func
"""
bboxes_list, bbox_nums_list, image_id_list = [], [], []
cpu_mems, gpu_mems = 0, 0
sample_nums = len(val_loader)
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
input_names = predictor.get_input_names()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
for batch_id, data in enumerate(val_loader):
data_all = {k: np.array(v) for k, v in data.items()}
inputs = {}
if FLAGS.arch == "YOLOv6":
inputs["x2paddle_image_arrays"] = data_all["image"]
else:
inputs["x2paddle_images"] = data_all["image"]
for i, _ in enumerate(input_names):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
start_time = time.time()
predictor.run()
outs = boxes_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
if rerun_flag:
return
postprocess = YOLOPostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs), data_all["scale_factor"])
bboxes_list.append(res["bbox"])
bbox_nums_list.append(res["bbox_num"])
image_id_list.append(np.array(data_all["im_id"]))
cpu_mem, gpu_mem = get_current_memory_mb()
cpu_mems += cpu_mem
gpu_mems += gpu_mem
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
print("[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB".format(
cpu_mems / sample_nums, gpu_mems / sample_nums))
time_avg = predict_time / sample_nums
print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
map_res = coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
print("[Benchmark] COCO mAP: {}".format(map_res[0]))
sys.stdout.flush()
def infer(predictor):
"""
infer image main func
"""
warmup, repeats = 1, 1
if FLAGS.benchmark:
warmup, repeats = 50, 100
origin_img = cv2.imread(FLAGS.image_file)
input_image, scale_factor = preprocess(origin_img,
[FLAGS.img_shape, FLAGS.img_shape])
input_image = np.expand_dims(input_image, axis=0)
scale_factor = np.array([[scale_factor, scale_factor]])
inputs = {}
if FLAGS.arch == "YOLOv6":
inputs["x2paddle_image_arrays"] = input_image
else:
inputs["x2paddle_images"] = input_image
input_names = predictor.get_input_names()
for i, _ in enumerate(input_names):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
for i in range(warmup):
predictor.run()
np_boxes = None
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
cpu_mems, gpu_mems = 0, 0
for i in range(repeats):
start_time = time.time()
predictor.run()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
cpu_mem, gpu_mem = get_current_memory_mb()
cpu_mems += cpu_mem
gpu_mems += gpu_mem
print("[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB".format(
cpu_mems / repeats, gpu_mems / repeats))
time_avg = predict_time / repeats
print("[Benchmark]Inference time(ms): min={}, max={}, avg={}".format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
postprocess = YOLOPostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np_boxes, scale_factor)
# Draw rectangles and labels on the original image
dets = res["bbox"]
if dets is not None:
final_boxes, final_scores, final_class = dets[:, 2:], dets[:,
1], dets[:,
0]
res_img = draw_box(
origin_img,
final_boxes,
final_scores,
final_class,
conf=0.5,
class_names=CLASS_LABEL)
cv2.imwrite("output.jpg", res_img)
print("The prediction results are saved in output.jpg.")
def main():
"""
main func
"""
predictor, rerun_flag = load_predictor(
FLAGS.model_path,
device=FLAGS.device,
use_trt=FLAGS.use_trt,
use_mkldnn=FLAGS.use_mkldnn,
precision=FLAGS.precision,
use_dynamic_shape=FLAGS.use_dynamic_shape,
cpu_threads=FLAGS.cpu_threads, )
if FLAGS.image_file:
infer(predictor)
else:
dataset = COCOValDataset(
dataset_dir=FLAGS.dataset_dir,
image_dir=FLAGS.val_image_dir,
anno_path=FLAGS.val_anno_path)
anno_file = dataset.ann_file
val_loader = paddle.io.DataLoader(
dataset, batch_size=FLAGS.batch_size, drop_last=True)
eval(predictor, val_loader, anno_file, rerun_flag=rerun_flag)
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if __name__ == "__main__":
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
# DataLoader need run on cpu
paddle.set_device("cpu")
main()
......@@ -8,8 +8,7 @@
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 自动压缩并产出模型](#34-自动压缩并产出模型)
- [4.评估精度](#4评估精度)
- [5.预测部署](#5预测部署)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
## 1.简介
......@@ -156,104 +155,68 @@ python -m paddle.distributed.launch run.py --config_path='./configs/pp_humanseg/
压缩完成后会在`save_dir`中产出压缩好的预测模型,可直接预测部署。
## 4.评估精度
## 4.预测部署
本小节以人像分割模型和小数据集为例, 介绍如何在测试集上评估压缩后的模型.
#### 4.1 Paddle Inference 验证性能
下载经过量化训练压缩后的推理模型:
```
wget https://bj.bcebos.com/v1/paddle-slim-models/act/PaddleSeg/qat/pp_humanseg_qat.zip
unzip pp_humanseg_qat.zip
```
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
通过以下命令下载人像分割示例数据:
以下字段用于配置预测参数:
```shell
cd ./data
python download_data.py mini_humanseg
cd -
```
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件 |
| model_filename | inference_model_dir文件夹下的模型文件名称 |
| params_filename | inference_model_dir文件夹下的参数文件名称 |
| dataset | 选择数据集的类型,可选:`human`, `cityscape`。 |
| dataset_config | 数据集配置的config |
| image_file | 待测试单张图片的路径,如果设置image_file,则dataset_config将无效。 |
| device | 预测时的设备,可选:`CPU`, `GPU`。 |
| use_trt | 是否使用 TesorRT 预测引擎,在device为```GPU```时生效。 |
| use_mkldnn | 是否启用```MKL-DNN```加速库,注意```use_mkldnn```,在device为```CPU```时生效。 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测时精度,可选:`fp32`, `fp16`, `int8`。 |
执行以下命令评估模型在测试集上的精度:
```
python eval.py \
--model_dir ./pp_humanseg_qat \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--dataset_config configs/dataset/humanseg_dataset.yaml
```
## 5.预测部署
- TensorRT预测:
本小节以人像分割为例, 介绍如何使用Paddle Inference推理库执行压缩后的模型.
环境配置:如果使用 TesorRT 预测引擎,需安装 ```WITH_TRT=ON``` 的Paddle,下载地址:[Python预测库](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python)
### 5.1 安装推理库
准备好预测模型,并且修改dataset_config中数据集路径为正确的路径后,启动测试:
请参考该链接安装Python版本的PaddleInference推理库: [推理库安装教程](https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python)
### 5.2 准备模型和数据
[2.Benchmark](#2Benchmark) 的表格中获得压缩前后的推理模型的下载链接,执行以下命令下载并解压推理模型:
下载Float32数值类型的模型:
```
wget https://paddleseg.bj.bcebos.com/dygraph/ppseg/ppseg_lite_portrait_398x224_with_softmax.tar.gz
tar -xzf ppseg_lite_portrait_398x224_with_softmax.tar.gz
mv ppseg_lite_portrait_398x224_with_softmax pp_humanseg_fp32
```
下载经过量化训练压缩后的推理模型:
```
wget https://bj.bcebos.com/v1/paddle-slim-models/act/PaddleSeg/qat/pp_humanseg_qat.zip
unzip pp_humanseg_qat.zip
```shell
python paddle_inference_eval.py \
--model_path=pp_liteseg_qat \
--dataset='cityscape' \
--dataset_config=configs/dataset/cityscapes_1024x512_scale1.0.yml \
--use_trt=True \
--precision=int8
```
准备好需要处理的图片,这里直接使用人像示例图片 `./data/human_demo.jpg`
### 5.3 执行推理
执行以下命令,直接使用飞桨框架的原生推理(仅支持Float32, 无需依赖TensorRT):
- MKLDNN预测:
```
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--image_file "./data/human_demo.jpg" \
--model_path "./pp_humanseg_fp32/model.pdmodel" \
--params_path "./pp_humanseg_fp32/model.pdiparams" \
--save_file "./humanseg_result_fp32.png" \
--dataset "human" \
--benchmark True \
--precision "fp32"
```shell
python paddle_inference_eval.py \
--model_path=pp_liteseg_qat \
--dataset='cityscape' \
--dataset_config=configs/dataset/cityscapes_1024x512_scale1.0.yml \
--device=CPU \
--use_mkldnn=True \
--precision=int8 \
--cpu_threads=10
```
执行以下命令,使用Int8推理:
#### 4.2 Paddle Inference 测试单张图片
```
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--image_file "./data/human_demo.jpg" \
--model_path "./pp_humanseg_qat/model.pdmodel" \
--params_path "./pp_humanseg_qat/model.pdiparams" \
--save_file "./humanseg_result_qat.png" \
--dataset "human" \
--benchmark True \
--use_trt True \
--precision "int8"
```
利用人像分割测试单张图片:
执行以下命令,使用Paddle Inference在相应数据集上测试精度:
```
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--model_path "./pp_humanseg_qat/model.pdmodel" \
--params_path "./pp_humanseg_qat/model.pdiparams" \
--dataset_config configs/dataset/humanseg_dataset.yaml \
--use_trt True \
--precision "int8"
```shell
python paddle_inference_eval.py \
--model_path=pp_humanseg_qat \
--dataset='human' \
--image_file=./data/human_demo.jpg \
--use_trt=True \
--precision=int8
```
<table><tbody>
......@@ -287,17 +250,11 @@ Int8推理结果
</tbody></table>
执行以下命令查看更多关于 `infer.py` 使用说明:
```
python infer.py --help
```
### 5.4 更多部署教程
### 4.3 更多部署教程
- [Paddle Inference Python部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/python_inference.md)
- [Paddle Inference C++部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/cpp_inference.md)
- [Paddle Lite部署](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/lite/lite.md)
## 6.FAQ
## 5.FAQ
......@@ -12,11 +12,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import numpy as np
import argparse
import time
from tqdm import tqdm
import os
import sys
import cv2
import numpy as np
import paddle
import paddleseg.transforms as T
from paddleseg.cvlibs import Config as PaddleSegDataConfig
......@@ -38,31 +39,35 @@ def _transforms(dataset):
return transforms
def auto_tune_trt(args, data):
auto_tuned_shape_file = "./auto_tuning_shape"
pred_cfg = PredictConfig(args.model_path, args.params_path)
pred_cfg.enable_use_gpu(100, 0)
pred_cfg.collect_shape_range_info("./auto_tuning_shape")
predictor = create_predictor(pred_cfg)
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
input_handle.reshape(data.shape)
input_handle.copy_from_cpu(data)
predictor.run()
return auto_tuned_shape_file
def load_predictor(args, data):
pred_cfg = PredictConfig(args.model_path, args.params_path)
pred_cfg.disable_glog_info()
def load_predictor(args):
"""
load predictor func
"""
rerun_flag = False
model_file = os.path.join(args.model_path, args.model_filename)
params_file = os.path.join(args.model_path, args.params_filename)
pred_cfg = PredictConfig(model_file, params_file)
pred_cfg.enable_memory_optim()
pred_cfg.switch_ir_optim(True)
if args.device == "GPU":
pred_cfg.enable_use_gpu(100, 0)
else:
pred_cfg.disable_gpu()
pred_cfg.set_cpu_math_library_num_threads(args.cpu_threads)
if args.use_mkldnn:
pred_cfg.enable_mkldnn()
if args.precision == "int8":
pred_cfg.enable_mkldnn_int8({
"conv2d", "depthwise_conv2d", "pool2d", "elementwise_mul"
})
if args.use_trt:
# To collect the dynamic shapes of inputs for TensorRT engine
auto_tuned_shape_file = auto_tune_trt(args, data)
dynamic_shape_file = os.path.join(args.model_path, "dynamic_shape.txt")
if os.path.exists(dynamic_shape_file):
pred_cfg.enable_tuned_tensorrt_dynamic_shape(dynamic_shape_file,
True)
print("trt set dynamic shape done!")
precision_map = {
"fp16": PrecisionType.Half,
"fp32": PrecisionType.Float32,
......@@ -73,27 +78,33 @@ def load_predictor(args, data):
max_batch_size=1,
min_subgraph_size=4,
precision_mode=precision_map[args.precision],
use_static=False,
use_calib_mode=False)
allow_build_at_runtime = True
pred_cfg.enable_tuned_tensorrt_dynamic_shape(auto_tuned_shape_file,
allow_build_at_runtime)
use_static=True,
use_calib_mode=False, )
else:
pred_cfg.disable_gpu()
pred_cfg.set_cpu_math_library_num_threads(10)
pred_cfg.collect_shape_range_info(dynamic_shape_file)
print("Start collect dynamic shape...")
rerun_flag = True
predictor = create_predictor(pred_cfg)
return predictor
return predictor, rerun_flag
def predict_image(args):
"""
predict image func
"""
transforms = _transforms(args.dataset)
transform = T.Compose(transforms)
# Step1: Load image and preprocess
im = cv2.imread(args.image_file).astype('float32')
im = cv2.imread(args.image_file).astype("float32")
data, _ = transform(im)
data = np.array(data)[np.newaxis, :]
# Step2: Prepare prdictor
predictor = load_predictor(args, data)
predictor, rerun_flag = load_predictor(args)
# Step3: Inference
input_names = predictor.get_input_names()
......@@ -114,14 +125,21 @@ def predict_image(args):
for i in range(repeats):
predictor.run()
results = output_handle.copy_to_cpu()
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
return
total_time = time.time() - start_time
avg_time = float(total_time) / repeats
print(f"Average inference time: \033[91m{round(avg_time*1000, 2)}ms\033[0m")
print(
f"[Benchmark]Average inference time: \033[91m{round(avg_time*1000, 2)}ms\033[0m"
)
# Step4: Post process
if args.dataset == "human":
results = reverse_transform(
paddle.to_tensor(results), im.shape, transforms, mode='bilinear')
paddle.to_tensor(results), im.shape, transforms, mode="bilinear")
results = np.argmax(results, axis=1)
result = get_pseudo_color_map(results[0])
......@@ -132,8 +150,11 @@ def predict_image(args):
def eval(args):
"""
eval mIoU func
"""
# DataLoader need run on cpu
paddle.set_device('cpu')
paddle.set_device("cpu")
data_cfg = PaddleSegDataConfig(args.dataset_config)
eval_dataset = data_cfg.val_dataset
......@@ -142,48 +163,56 @@ def eval(args):
loader = paddle.io.DataLoader(
eval_dataset,
batch_sampler=batch_sampler,
num_workers=1,
num_workers=0,
return_list=True)
total_iters = len(loader)
predictor, rerun_flag = load_predictor(args)
intersect_area_all = 0
pred_area_all = 0
label_area_all = 0
print("Start evaluating (total_samples: {}, total_iters: {})...".format(
len(eval_dataset), total_iters))
init_predictor = False
for (image, label) in tqdm(loader):
label = np.array(label).astype('int64')
ori_shape = np.array(label).shape[-2:]
data = np.array(image)
if not init_predictor:
predictor = load_predictor(args, data)
init_predictor = True
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
input_handle.reshape(data.shape)
input_handle.copy_from_cpu(data)
predictor.run()
output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
total_samples = len(eval_dataset)
sample_nums = len(loader)
batch_size = int(total_samples / sample_nums)
predict_time = 0.0
time_min = float("inf")
time_max = float("-inf")
print("Start evaluating (total_samples: {}, total_iters: {}).".format(
total_samples, sample_nums))
for batch_id, data in enumerate(loader):
image = np.array(data[0])
label = np.array(data[1]).astype("int64")
ori_shape = np.array(label).shape[-2:]
input_handle.reshape(image.shape)
input_handle.copy_from_cpu(image)
start_time = time.time()
predictor.run()
results = output_handle.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
if rerun_flag:
print(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
return
logit = reverse_transform(
paddle.to_tensor(results),
ori_shape,
eval_dataset.transforms.transforms,
mode='bilinear')
mode="bilinear")
pred = paddle.to_tensor(logit)
if len(
pred.shape
) == 4: # for humanseg model whose prediction is distribution but not class id
pred = paddle.argmax(pred, axis=1, keepdim=True, dtype='int32')
pred = paddle.argmax(pred, axis=1, keepdim=True, dtype="int32")
intersect_area, pred_area, label_area = metrics.calculate_area(
pred,
......@@ -193,71 +222,95 @@ def eval(args):
intersect_area_all = intersect_area_all + intersect_area
pred_area_all = pred_area_all + pred_area
label_area_all = label_area_all + label_area
if batch_id % 100 == 0:
print("Eval iter:", batch_id)
sys.stdout.flush()
class_iou, miou = metrics.mean_iou(intersect_area_all, pred_area_all,
_, miou = metrics.mean_iou(intersect_area_all, pred_area_all,
label_area_all)
class_acc, acc = metrics.accuracy(intersect_area_all, pred_area_all)
_, acc = metrics.accuracy(intersect_area_all, pred_area_all)
kappa = metrics.kappa(intersect_area_all, pred_area_all, label_area_all)
class_dice, mdice = metrics.dice(intersect_area_all, pred_area_all,
label_area_all)
infor = "[EVAL] #Images: {} mIoU: {:.4f} Acc: {:.4f} Kappa: {:.4f} Dice: {:.4f}".format(
len(eval_dataset), miou, acc, kappa, mdice)
_, mdice = metrics.dice(intersect_area_all, pred_area_all, label_area_all)
time_avg = predict_time / sample_nums
print(
"[Benchmark]Batch size: {}, Inference time(ms): min={}, max={}, avg={}".
format(batch_size,
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
infor = "[Benchmark] #Images: {} mIoU: {:.4f} Acc: {:.4f} Kappa: {:.4f} Dice: {:.4f}".format(
total_samples, miou, acc, kappa, mdice)
print(infor)
sys.stdout.flush()
if __name__ == '__main__':
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--image_file',
"--model_path", type=str, help="inference model filepath")
parser.add_argument(
"--model_filename",
type=str,
default="model.pdmodel",
help="model file name")
parser.add_argument(
"--params_filename",
type=str,
default="model.pdiparams",
help="params file name")
parser.add_argument(
"--image_file",
type=str,
default=None,
help="Image path to be processed.")
parser.add_argument(
'--save_file',
"--save_file",
type=str,
default=None,
help="The path to save the processed image.")
parser.add_argument(
'--model_path', type=str, help="Inference model filepath.")
parser.add_argument(
'--params_path', type=str, help="Inference parameters filepath.")
parser.add_argument(
'--dataset',
"--dataset",
type=str,
default="human",
choices=["human", "cityscape"],
help="The type of given image which can be 'human' or 'cityscape'.")
help="The type of given image which can be 'human' or 'cityscape'.", )
parser.add_argument(
'--dataset_config',
"--dataset_config",
type=str,
default=None,
help="path of dataset config.")
parser.add_argument(
'--benchmark',
"--benchmark",
type=bool,
default=False,
help="Whether to run benchmark or not.")
parser.add_argument(
'--use_trt',
"--use_trt",
type=bool,
default=False,
help="Whether to use tensorrt engine or not.")
parser.add_argument(
'--device',
"--device",
type=str,
default='GPU',
default="GPU",
choices=["CPU", "GPU"],
help="Choose the device you want to run, it can be: CPU/GPU, default is GPU"
help="Choose the device you want to run, it can be: CPU/GPU, default is GPU",
)
parser.add_argument(
'--precision',
"--precision",
type=str,
default='fp32',
default="fp32",
choices=["fp32", "fp16", "int8"],
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'."
help="The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'.",
)
parser.add_argument(
"--use_mkldnn",
type=bool,
default=False,
help="Whether use mkldnn or not.")
parser.add_argument(
"--cpu_threads", type=int, default=1, help="Num of cpu threads.")
args = parser.parse_args()
if args.image_file:
predict_image(args)
......
......@@ -230,7 +230,6 @@ def export_onnx(model_dir,
opset_version=opset_version,
enable_onnx_checker=True,
deploy_backend=deploy_backend,
scale_file=os.path.join(model_dir, 'calibration_table.txt'),
calibration_file=os.path.join(
save_file_path.rstrip(os.path.split(save_file_path)[-1]),
'calibration.cache'))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册