Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleSlim
提交
456b9fed
P
PaddleSlim
项目概览
PaddlePaddle
/
PaddleSlim
大约 2 年 前同步成功
通知
51
Star
1434
Fork
344
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
53
列表
看板
标记
里程碑
合并请求
16
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleSlim
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
53
Issue
53
列表
看板
标记
里程碑
合并请求
16
合并请求
16
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
456b9fed
编写于
9月 30, 2022
作者:
G
Guanghua Yu
提交者:
GitHub
9月 30, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update act paddle inference demo (#1432)
Co-authored-by:
N
ceci3
<
ceci3@users.noreply.github.com
>
上级
fe33833a
变更
13
显示空白变更内容
内联
并排
Showing
13 changed file
with
1953 addition
and
614 deletion
+1953
-614
example/auto_compression/detection/README.md
example/auto_compression/detection/README.md
+42
-22
example/auto_compression/detection/paddle_inference_eval.py
example/auto_compression/detection/paddle_inference_eval.py
+498
-0
example/auto_compression/image_classification/README.md
example/auto_compression/image_classification/README.md
+39
-31
example/auto_compression/image_classification/paddle_inference_eval.py
...compression/image_classification/paddle_inference_eval.py
+250
-0
example/auto_compression/nlp/README.md
example/auto_compression/nlp/README.md
+26
-10
example/auto_compression/nlp/paddle_inference_eval.py
example/auto_compression/nlp/paddle_inference_eval.py
+199
-166
example/auto_compression/pytorch_huggingface/README.md
example/auto_compression/pytorch_huggingface/README.md
+23
-29
example/auto_compression/pytorch_huggingface/paddle_inference_eval.py
..._compression/pytorch_huggingface/paddle_inference_eval.py
+154
-144
example/auto_compression/pytorch_yolo_series/README.md
example/auto_compression/pytorch_yolo_series/README.md
+54
-25
example/auto_compression/pytorch_yolo_series/paddle_inference_eval.py
..._compression/pytorch_yolo_series/paddle_inference_eval.py
+472
-0
example/auto_compression/semantic_segmentation/README.md
example/auto_compression/semantic_segmentation/README.md
+49
-92
example/auto_compression/semantic_segmentation/paddle_inference_eval.py
...ompression/semantic_segmentation/paddle_inference_eval.py
+147
-94
paddleslim/common/load_model.py
paddleslim/common/load_model.py
+0
-1
未找到文件。
example/auto_compression/detection/README.md
浏览文件 @
456b9fed
...
...
@@ -7,8 +7,7 @@
-
[
3.1 环境准备
](
#31-准备环境
)
-
[
3.2 准备数据集
](
#32-准备数据集
)
-
[
3.3 准备预测模型
](
#33-准备预测模型
)
-
[
3.4 测试模型精度
](
#34-测试模型精度
)
-
[
3.5 自动压缩并产出模型
](
#35-自动压缩并产出模型
)
-
[
3.4 自动压缩并产出模型
](
#34-自动压缩并产出模型
)
-
[
4.预测部署
](
#4预测部署
)
-
[
5.FAQ
](
5FAQ
)
...
...
@@ -110,23 +109,52 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
--config_path=./configs/ppyoloe_l_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
```
## 4.预测部署
**注意**
:
-
要测试的模型路径可以在配置文件中
`model_dir`
字段下进行修改。
#### 4.1 Paddle Inference 验证性能
## 4.预测部署
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
-
如果模型包含NMS,可以参考
[
PaddleDetection部署教程
](
https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/deploy
)
,GPU上量化模型开启TensorRT并设置trt_int8模式进行部署。
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| reader_config | eval时模型reader的配置文件路径 |
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用
```MKL-DNN```
加速库,注意
```use_mkldnn```
与
```use_gpu```
同时为
```True```
时,将忽略
```enable_mkldnn```
,而使用
```GPU```
预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括
`fp32/fp16/int8`
|
-
模型为PPYOLOE,同时不包含NMS,使用以下预测demo进行部署:
-
Paddle-TensorRT C++部署
-
TensorRT预测:
环境配置:如果使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
```
shell
python paddle_inference_eval.py
\
--model_path
=
models/ppyoloe_crn_l_300e_coco_quant
\
--reader_config
=
configs/yoloe_reader.yml
\
--use_trt
=
True
\
--precision
=
int8
```
-
MKLDNN预测:
```
shell
python paddle_inference_eval.py
\
--model_path
=
models/ppyoloe_crn_l_300e_coco_quant
\
--reader_config
=
configs/yoloe_reader.yml
\
--device
=
CPU
\
--use_mkldnn
=
True
\
--cpu_threads
=
10
\
--precision
=
int8
```
-
模型为PPYOLOE,同时不包含NMS,可以使用C++预测demo进行测速:
进入
[
cpp_infer
](
./cpp_infer_ppyoloe
)
文件夹内,请按照
[
C++ TensorRT Benchmark测试教程
](
./cpp_infer_ppyoloe/README.md
)
进行准备环境及编译,然后开始测试:
```
shell
...
...
@@ -136,14 +164,6 @@ python eval.py --config_path=./configs/ppyoloe_l_qat_dis.yaml
./build/trt_run
--model_file
ppyoloe_s_quant/model.pdmodel
--params_file
ppyoloe_s_quant/model.pdiparams
--run_mode
=
trt_int8
```
-
Paddle-TensorRT Python部署:
首先安装带有TensorRT的
[
Paddle安装包
](
https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python
)
。然后使用
[
paddle_trt_infer.py
](
./paddle_trt_infer.py
)
进行部署:
```
shell
python paddle_trt_infer.py
--model_path
=
output
--image_file
=
images/000000570688.jpg
--benchmark
=
True
--run_mode
=
trt_int8
```
## 5.FAQ
-
如果想对模型进行离线量化,可进入
[
Detection模型离线量化示例
](
https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/detection
)
中进行实验。
example/auto_compression/detection/paddle_
trt_infer
.py
→
example/auto_compression/detection/paddle_
inference_eval
.py
浏览文件 @
456b9fed
...
...
@@ -13,16 +13,81 @@
# limitations under the License.
import
os
import
cv2
import
numpy
as
np
import
argparse
import
time
import
sys
import
cv2
import
numpy
as
np
import
paddle
from
paddle.inference
import
Config
from
paddle.inference
import
create_predictor
from
ppdet.core.workspace
import
load_config
,
create
from
ppdet.metrics
import
COCOMetric
from
post_process
import
PPYOLOEPostProcess
def
argsparser
():
"""
argsparser func
"""
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--model_path"
,
type
=
str
,
help
=
"inference model filepath"
)
parser
.
add_argument
(
"--image_file"
,
type
=
str
,
default
=
None
,
help
=
"image path, if set image_file, it will not eval coco."
)
parser
.
add_argument
(
"--reader_config"
,
type
=
str
,
default
=
None
,
help
=
"path of datset and reader config."
)
parser
.
add_argument
(
"--benchmark"
,
type
=
bool
,
default
=
False
,
help
=
"Whether run benchmark or not."
)
parser
.
add_argument
(
"--use_trt"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use TensorRT or not."
)
parser
.
add_argument
(
"--precision"
,
type
=
str
,
default
=
"paddle"
,
help
=
"mode of running(fp32/fp16/int8)"
)
parser
.
add_argument
(
"--device"
,
type
=
str
,
default
=
"GPU"
,
help
=
"Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
,
)
parser
.
add_argument
(
"--use_dynamic_shape"
,
type
=
bool
,
default
=
True
,
help
=
"Whether use dynamic shape or not."
)
parser
.
add_argument
(
"--use_mkldnn"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use mkldnn or not."
)
parser
.
add_argument
(
"--cpu_threads"
,
type
=
int
,
default
=
10
,
help
=
"Num of cpu threads."
)
parser
.
add_argument
(
"--img_shape"
,
type
=
int
,
default
=
640
,
help
=
"input_size"
)
parser
.
add_argument
(
'--include_nms'
,
type
=
bool
,
default
=
True
,
help
=
"Whether include nms or not."
)
return
parser
CLASS_LABEL
=
[
'person'
,
'bicycle'
,
'car'
,
'motorcycle'
,
'airplane'
,
'bus'
,
'train'
,
'truck'
,
'boat'
,
'traffic light'
,
'fire hydrant'
,
'stop sign'
,
...
...
@@ -67,6 +132,9 @@ def generate_scale(im, target_shape, keep_ratio=True):
def
image_preprocess
(
img_path
,
target_shape
):
"""
image_preprocess func
"""
img
=
cv2
.
imread
(
img_path
)
im_scale_y
,
im_scale_x
=
generate_scale
(
img
,
target_shape
,
keep_ratio
=
False
)
img
=
cv2
.
resize
(
...
...
@@ -84,14 +152,17 @@ def image_preprocess(img_path, target_shape):
def
get_color_map_list
(
num_classes
):
"""
get_color_map_list func
"""
color_map
=
num_classes
*
[
0
,
0
,
0
]
for
i
in
range
(
0
,
num_classes
):
j
=
0
lab
=
i
while
lab
:
color_map
[
i
*
3
]
|=
((
(
lab
>>
0
)
&
1
)
<<
(
7
-
j
)
)
color_map
[
i
*
3
+
1
]
|=
((
(
lab
>>
1
)
&
1
)
<<
(
7
-
j
)
)
color_map
[
i
*
3
+
2
]
|=
((
(
lab
>>
2
)
&
1
)
<<
(
7
-
j
)
)
color_map
[
i
*
3
]
|=
((
lab
>>
0
)
&
1
)
<<
(
7
-
j
)
color_map
[
i
*
3
+
1
]
|=
((
lab
>>
1
)
&
1
)
<<
(
7
-
j
)
color_map
[
i
*
3
+
2
]
|=
((
lab
>>
2
)
&
1
)
<<
(
7
-
j
)
j
+=
1
lab
>>=
3
color_map
=
[
color_map
[
i
:
i
+
3
]
for
i
in
range
(
0
,
len
(
color_map
),
3
)]
...
...
@@ -99,6 +170,9 @@ def get_color_map_list(num_classes):
def
draw_box
(
image_file
,
results
,
class_label
,
threshold
=
0.5
):
"""
draw_box func
"""
srcimg
=
cv2
.
imread
(
image_file
,
1
)
for
i
in
range
(
len
(
results
)):
color_list
=
get_color_map_list
(
len
(
class_label
))
...
...
@@ -114,115 +188,142 @@ def draw_box(image_file, results, class_label, threshold=0.5):
color
=
tuple
(
clsid2color
[
classid
])
cv2
.
rectangle
(
srcimg
,
(
xmin
,
ymin
),
(
xmax
,
ymax
),
color
,
thickness
=
2
)
print
(
class_label
[
classid
]
+
': '
+
str
(
round
(
conf
,
3
)))
print
(
class_label
[
classid
]
+
": "
+
str
(
round
(
conf
,
3
)))
cv2
.
putText
(
srcimg
,
class_label
[
classid
]
+
':'
+
str
(
round
(
conf
,
3
)),
(
xmin
,
ymin
-
10
),
class_label
[
classid
]
+
":"
+
str
(
round
(
conf
,
3
)),
(
xmin
,
ymin
-
10
),
cv2
.
FONT_HERSHEY_SIMPLEX
,
0.8
,
(
0
,
255
,
0
),
thickness
=
2
)
0.8
,
(
0
,
255
,
0
),
thickness
=
2
,
)
return
srcimg
def
load_predictor
(
model_dir
,
run_mode
=
'paddle'
,
def
load_predictor
(
model_dir
,
precision
=
"fp32"
,
use_trt
=
False
,
use_mkldnn
=
False
,
batch_size
=
1
,
device
=
'CPU'
,
device
=
"CPU"
,
min_subgraph_size
=
3
,
use_dynamic_shape
=
False
,
trt_min_shape
=
3
,
trt_min_shape
=
1
,
trt_max_shape
=
1280
,
trt_opt_shape
=
640
,
trt_calib_mode
=
False
,
cpu_threads
=
1
,
enable_mkldnn
=
False
,
enable_mkldnn_bfloat16
=
False
,
delete_shuffle_pass
=
False
):
cpu_threads
=
1
,
):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
precision (str): mode of running(fp32/fp16/int8)
use_trt (bool): whether use TensorRT or not.
use_mkldnn (bool): whether use MKLDNN or not in CPU.
device (str): Choose the device you want to run, it can be: CPU/GPU, default is CPU
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT.
Used by action model.
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
if
device
!=
'GPU'
and
run_mode
!=
'paddle'
:
rerun_flag
=
False
if
device
!=
"GPU"
and
use_trt
:
raise
ValueError
(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.
format
(
run_mode
,
device
))
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.
format
(
precision
,
device
))
config
=
Config
(
os
.
path
.
join
(
model_dir
,
'model.pdmodel'
),
os
.
path
.
join
(
model_dir
,
'model.pdiparams'
))
if
device
==
'GPU'
:
os
.
path
.
join
(
model_dir
,
"model.pdmodel"
),
os
.
path
.
join
(
model_dir
,
"model.pdiparams"
))
if
device
==
"GPU"
:
# initial GPU memory(M), device ID
config
.
enable_use_gpu
(
200
,
0
)
# optimize graph and fuse op
config
.
switch_ir_optim
(
True
)
elif
device
==
'XPU'
:
config
.
enable_lite_engine
()
config
.
enable_xpu
(
10
*
1024
*
1024
)
else
:
config
.
disable_gpu
()
config
.
set_cpu_math_library_num_threads
(
cpu_threads
)
if
enable_mkldnn
:
try
:
# cache 10 different shapes for mkldnn to avoid memory leak
config
.
set_mkldnn_cache_capacity
(
10
)
config
.
switch_ir_optim
()
if
use_mkldnn
:
config
.
enable_mkldnn
()
if
enable_mkldnn_bfloat16
:
config
.
enable_mkldnn_bfloat16
()
except
Exception
as
e
:
print
(
"The current environment does not support `mkldnn`, so disable mkldnn."
)
pass
if
precision
==
"int8"
:
config
.
enable_mkldnn_int8
(
{
"conv2d"
,
"depthwise_conv2d"
,
"transpose2"
,
"pool2d"
})
precision_map
=
{
'trt_int8'
:
Config
.
Precision
.
Int8
,
'trt_fp32'
:
Config
.
Precision
.
Float32
,
'trt_fp16'
:
Config
.
Precision
.
Half
"int8"
:
Config
.
Precision
.
Int8
,
"fp32"
:
Config
.
Precision
.
Float32
,
"fp16"
:
Config
.
Precision
.
Half
,
}
if
run_mode
in
precision_map
.
keys
()
:
if
precision
in
precision_map
.
keys
()
and
use_trt
:
config
.
enable_tensorrt_engine
(
workspace_size
=
(
1
<<
25
)
*
batch_size
,
max_batch_size
=
batch_size
,
min_subgraph_size
=
min_subgraph_size
,
precision_mode
=
precision_map
[
run_mode
],
use_static
=
Fals
e
,
use_calib_mode
=
trt_calib_mode
)
precision_mode
=
precision_map
[
precision
],
use_static
=
Tru
e
,
use_calib_mode
=
False
,
)
if
use_dynamic_shape
:
min_input_shape
=
{
'image'
:
[
batch_size
,
3
,
trt_min_shape
,
trt_min_shape
]
}
max_input_shape
=
{
'image'
:
[
batch_size
,
3
,
trt_max_shape
,
trt_max_shape
]
}
opt_input_shape
=
{
'image'
:
[
batch_size
,
3
,
trt_opt_shape
,
trt_opt_shape
]
}
config
.
set_trt_dynamic_shape_info
(
min_input_shape
,
max_input_shape
,
opt_input_shape
)
print
(
'trt set dynamic shape done!'
)
dynamic_shape_file
=
os
.
path
.
join
(
FLAGS
.
model_path
,
"dynamic_shape.txt"
)
if
os
.
path
.
exists
(
dynamic_shape_file
):
config
.
enable_tuned_tensorrt_dynamic_shape
(
dynamic_shape_file
,
True
)
print
(
"trt set dynamic shape done!"
)
else
:
config
.
collect_shape_range_info
(
dynamic_shape_file
)
print
(
"Start collect dynamic shape..."
)
rerun_flag
=
True
# enable shared memory
config
.
enable_memory_optim
()
# disable feed, fetch OP, needed by zero_copy_run
config
.
switch_use_feed_fetch_ops
(
False
)
if
delete_shuffle_pass
:
config
.
delete_pass
(
"shuffle_channel_detect_pass"
)
predictor
=
create_predictor
(
config
)
return
predictor
return
predictor
,
rerun_flag
def
get_current_memory_mb
():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
try
:
pkg
.
require
(
'pynvml'
)
except
:
from
pip._internal
import
main
main
([
'install'
,
'pynvml'
])
try
:
pkg
.
require
(
'psutil'
)
except
:
from
pip._internal
import
main
main
([
'install'
,
'psutil'
])
try
:
pkg
.
require
(
'GPUtil'
)
except
:
from
pip._internal
import
main
main
([
'install'
,
'GPUtil'
])
import
pynvml
import
psutil
import
GPUtil
gpu_id
=
int
(
os
.
environ
.
get
(
"CUDA_VISIBLE_DEVICES"
,
0
))
pid
=
os
.
getpid
()
p
=
psutil
.
Process
(
pid
)
info
=
p
.
memory_full_info
()
cpu_mem
=
info
.
uss
/
1024.0
/
1024.0
gpu_mem
=
0
gpu_percent
=
0
gpus
=
GPUtil
.
getGPUs
()
if
gpu_id
is
not
None
and
len
(
gpus
)
>
0
:
gpu_percent
=
gpus
[
gpu_id
].
load
pynvml
.
nvmlInit
()
handle
=
pynvml
.
nvmlDeviceGetHandleByIndex
(
0
)
meminfo
=
pynvml
.
nvmlDeviceGetMemoryInfo
(
handle
)
gpu_mem
=
meminfo
.
used
/
1024.0
/
1024.0
return
round
(
cpu_mem
,
4
),
round
(
gpu_mem
,
4
)
def
predict_image
(
predictor
,
...
...
@@ -230,15 +331,17 @@ def predict_image(predictor,
image_shape
=
[
640
,
640
],
warmup
=
1
,
repeats
=
1
,
threshold
=
0.5
,
include_nms
=
True
):
threshold
=
0.5
):
"""
predict image main func
"""
img
,
scale_factor
=
image_preprocess
(
image_file
,
image_shape
)
inputs
=
{}
inputs
[
'image'
]
=
img
inputs
[
"image"
]
=
img
if
include_nms
:
inputs
[
'scale_factor'
]
=
scale_factor
input_names
=
predictor
.
get_input_names
()
for
i
in
range
(
len
(
input_names
)
):
for
i
,
_
in
enumerate
(
input_names
):
input_tensor
=
predictor
.
get_input_handle
(
input_names
[
i
])
input_tensor
.
copy_from_cpu
(
inputs
[
input_names
[
i
]])
...
...
@@ -246,16 +349,17 @@ def predict_image(predictor,
predictor
.
run
()
np_boxes
,
np_boxes_num
=
None
,
None
predict_time
=
0.
cpu_mems
,
gpu_mems
=
0
,
0
predict_time
=
0.0
time_min
=
float
(
"inf"
)
time_max
=
float
(
'-inf'
)
time_max
=
float
(
"-inf"
)
for
i
in
range
(
repeats
):
start_time
=
time
.
time
()
predictor
.
run
()
output_names
=
predictor
.
get_output_names
()
boxes_tensor
=
predictor
.
get_output_handle
(
output_names
[
0
])
np_boxes
=
boxes_tensor
.
copy_to_cpu
()
if
include_nms
:
if
FLAGS
.
include_nms
:
boxes_num
=
predictor
.
get_output_handle
(
output_names
[
1
])
np_boxes_num
=
boxes_num
.
copy_to_cpu
()
end_time
=
time
.
time
()
...
...
@@ -263,61 +367,132 @@ def predict_image(predictor,
time_min
=
min
(
time_min
,
timed
)
time_max
=
max
(
time_max
,
timed
)
predict_time
+=
timed
cpu_mem
,
gpu_mem
=
get_current_memory_mb
()
cpu_mems
+=
cpu_mem
gpu_mems
+=
gpu_mem
time_avg
=
predict_time
/
repeats
print
(
'Inference time(ms): min={}, max={}, avg={}'
.
format
(
print
(
"[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB"
.
format
(
cpu_mems
/
repeats
,
gpu_mems
/
repeats
))
print
(
"[Benchmark]Inference time(ms): min={}, max={}, avg={}"
.
format
(
round
(
time_min
*
1000
,
2
),
round
(
time_max
*
1000
,
1
),
round
(
time_avg
*
1000
,
1
)))
if
not
include_nms
:
if
not
FLAGS
.
include_nms
:
postprocess
=
PPYOLOEPostProcess
(
score_threshold
=
0.3
,
nms_threshold
=
0.6
)
res
=
postprocess
(
np_boxes
,
scale_factor
)
else
:
res
=
{
'bbox'
:
np_boxes
,
'bbox_num'
:
np_boxes_num
}
res_img
=
draw_box
(
image_file
,
res
[
'bbox'
],
CLASS_LABEL
,
threshold
=
threshold
)
cv2
.
imwrite
(
'result.jpg'
,
res_img
)
image_file
,
res
[
"bbox"
],
CLASS_LABEL
,
threshold
=
threshold
)
cv2
.
imwrite
(
"result.jpg"
,
res_img
)
if
__name__
==
'__main__'
:
def
eval
(
predictor
,
val_loader
,
metric
,
rerun_flag
=
False
):
"""
eval main func
"""
cpu_mems
,
gpu_mems
=
0
,
0
predict_time
=
0.0
time_min
=
float
(
"inf"
)
time_max
=
float
(
"-inf"
)
sample_nums
=
len
(
val_loader
)
input_names
=
predictor
.
get_input_names
()
output_names
=
predictor
.
get_output_names
()
boxes_tensor
=
predictor
.
get_output_handle
(
output_names
[
0
])
boxes_num
=
predictor
.
get_output_handle
(
output_names
[
1
])
for
batch_id
,
data
in
enumerate
(
val_loader
):
data_all
=
{
k
:
np
.
array
(
v
)
for
k
,
v
in
data
.
items
()}
for
i
,
_
in
enumerate
(
input_names
):
input_tensor
=
predictor
.
get_input_handle
(
input_names
[
i
])
input_tensor
.
copy_from_cpu
(
data_all
[
input_names
[
i
]])
start_time
=
time
.
time
()
predictor
.
run
()
np_boxes
=
boxes_tensor
.
copy_to_cpu
()
if
FLAGS
.
include_nms
:
np_boxes_num
=
boxes_num
.
copy_to_cpu
()
if
rerun_flag
:
return
end_time
=
time
.
time
()
timed
=
end_time
-
start_time
time_min
=
min
(
time_min
,
timed
)
time_max
=
max
(
time_max
,
timed
)
predict_time
+=
timed
cpu_mem
,
gpu_mem
=
get_current_memory_mb
()
cpu_mems
+=
cpu_mem
gpu_mems
+=
gpu_mem
if
not
FLAGS
.
include_nms
:
postprocess
=
PPYOLOEPostProcess
(
score_threshold
=
0.3
,
nms_threshold
=
0.6
)
res
=
postprocess
(
np_boxes
,
data_all
[
'scale_factor'
])
else
:
res
=
{
'bbox'
:
np_boxes
,
'bbox_num'
:
np_boxes_num
}
metric
.
update
(
data_all
,
res
)
if
batch_id
%
100
==
0
:
print
(
"Eval iter:"
,
batch_id
)
sys
.
stdout
.
flush
()
metric
.
accumulate
()
metric
.
log
()
map_res
=
metric
.
get_results
()
metric
.
reset
()
time_avg
=
predict_time
/
sample_nums
print
(
"[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB"
.
format
(
cpu_mems
/
sample_nums
,
gpu_mems
/
sample_nums
))
print
(
"[Benchmark]Inference time(ms): min={}, max={}, avg={}"
.
format
(
round
(
time_min
*
1000
,
2
),
round
(
time_max
*
1000
,
1
),
round
(
time_avg
*
1000
,
1
)))
print
(
"[Benchmark] COCO mAP: {}"
.
format
(
map_res
[
"bbox"
][
0
]))
sys
.
stdout
.
flush
()
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'--image_file'
,
type
=
str
,
default
=
None
,
help
=
"image path"
)
parser
.
add_argument
(
'--model_path'
,
type
=
str
,
help
=
"inference model filepath"
)
parser
.
add_argument
(
'--benchmark'
,
type
=
bool
,
default
=
False
,
help
=
"Whether run benchmark or not."
)
parser
.
add_argument
(
'--run_mode'
,
type
=
str
,
default
=
'paddle'
,
help
=
"mode of running(paddle/trt_fp32/trt_fp16/trt_int8)"
)
parser
.
add_argument
(
'--device'
,
type
=
str
,
default
=
'GPU'
,
help
=
"Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
)
parser
.
add_argument
(
'--img_shape'
,
type
=
int
,
default
=
640
,
help
=
"input_size"
)
parser
.
add_argument
(
'--include_nms'
,
type
=
bool
,
default
=
True
,
help
=
"Whether include nms or not."
)
args
=
parser
.
parse_args
()
predictor
=
load_predictor
(
args
.
model_path
,
run_mode
=
args
.
run_mode
,
device
=
args
.
device
)
def
main
():
"""
main func
"""
predictor
,
rerun_flag
=
load_predictor
(
FLAGS
.
model_path
,
device
=
FLAGS
.
device
,
use_trt
=
FLAGS
.
use_trt
,
use_mkldnn
=
FLAGS
.
use_mkldnn
,
precision
=
FLAGS
.
precision
,
use_dynamic_shape
=
FLAGS
.
use_dynamic_shape
,
cpu_threads
=
FLAGS
.
cpu_threads
)
if
FLAGS
.
image_file
:
warmup
,
repeats
=
1
,
1
if
args
.
benchmark
:
if
FLAGS
.
benchmark
:
warmup
,
repeats
=
50
,
100
predict_image
(
predictor
,
args
.
image_file
,
image_shape
=
[
args
.
img_shape
,
args
.
img_shape
],
FLAGS
.
image_file
,
image_shape
=
[
FLAGS
.
img_shape
,
FLAGS
.
img_shape
],
warmup
=
warmup
,
repeats
=
repeats
,
include_nms
=
args
.
include_nms
)
repeats
=
repeats
)
else
:
reader_cfg
=
load_config
(
FLAGS
.
reader_config
)
dataset
=
reader_cfg
[
"EvalDataset"
]
global
val_loader
val_loader
=
create
(
"EvalReader"
)(
reader_cfg
[
"EvalDataset"
],
reader_cfg
[
"worker_num"
],
return_list
=
True
)
clsid2catid
=
{
v
:
k
for
k
,
v
in
dataset
.
catid2clsid
.
items
()}
anno_file
=
dataset
.
get_anno
()
metric
=
COCOMetric
(
anno_file
=
anno_file
,
clsid2catid
=
clsid2catid
,
IouType
=
"bbox"
)
eval
(
predictor
,
val_loader
,
metric
,
rerun_flag
=
rerun_flag
)
if
rerun_flag
:
print
(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if
__name__
==
"__main__"
:
paddle
.
enable_static
()
parser
=
argsparser
()
FLAGS
=
parser
.
parse_args
()
# DataLoader need run on cpu
paddle
.
set_device
(
"cpu"
)
main
()
example/auto_compression/image_classification/README.md
浏览文件 @
456b9fed
...
...
@@ -113,48 +113,56 @@ python -m paddle.distributed.launch run.py --save_dir='./save_quant_mobilev1/' -
注意
```learning rate```
与
```batch size```
呈线性关系,这里单卡
```batch size```
为32,对应的
```learning rate```
为0.015,那么如果
```batch size```
减小4倍改为8,
```learning rate```
也需除以4;多卡时
```batch size```
为32,
```learning rate```
需乘上卡数。所以改变
```batch size```
或改变训练卡数都需要对应修改
```learning rate```
。
**验证精度**
根据训练log可以看到模型验证的精度,若需再次验证精度,修改配置文件
```./configs/MobileNetV1/qat_dis.yaml```
中所需验证模型的文件夹路径及模型和参数名称
```model_dir, model_filename, params_filename```
,然后使用以下命令进行验证:
```
shell
export
CUDA_VISIBLE_DEVICES
=
0
python eval.py
--config_path
=
'./configs/MobileNetV1/qat_dis.yaml'
```
## 4.预测部署
#### 4.1 Python预测推理
环境配置:若使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
#### 4.1 Paddle Inference 验证性能
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
-
```inference_model_dir```
:inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件
-
```model_filename```
:inference_model_dir文件夹下的模型文件名称
-
```params_filename```
:inference_model_dir文件夹下的参数文件名称
-
```batch_size```
:预测一个batch的大小
-
```image_size```
:输入图像的大小
-
```use_tensorrt```
:是否使用 TesorRT 预测引擎
-
```use_gpu```
:是否使用 GPU 预测
-
```enable_mkldnn```
:是否启用
```MKL-DNN```
加速库,注意
```enable_mkldnn```
与
```use_gpu```
同时为
```True```
时,将忽略
```enable_mkldnn```
,而使用
```GPU```
预测
-
```use_fp16```
:是否启用
```FP16```
-
```use_int8```
:是否启用
```INT8```
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件 |
| model_filename | inference_model_dir文件夹下的模型文件名称 |
| params_filename | inference_model_dir文件夹下的参数文件名称 |
| data_path | 数据集路径 |
| batch_size | 预测一个batch的大小 |
| image_size | 输入图像的大小 |
| use_gpu | 是否使用 GPU 预测 |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用
```MKL-DNN```
加速库,注意
```use_mkldnn```
与
```use_gpu```
同时为
```True```
时,将忽略
```use_mkldnn```
,而使用
```GPU```
预测 |
| cpu_num_threads | CPU预测时,使用CPU线程数量,默认10 |
| use_fp16 | 使用TensorRT时,是否启用
```FP16```
|
| use_int8 | 是否启用
```INT8```
|
注意:
-
请注意模型的输入数据尺寸,如InceptionV3输入尺寸为299,部分模型需要修改参数:
```image_size```
-
如果希望提升评测模型速度,使用
```GPU```
评测时,建议开启
```TensorRT```
加速预测,使用
```CPU```
评测时,建议开启
```MKL-DNN```
加速预测
准备好inference模型后,使用以下命令进行预测:
-
TensorRT预测:
环境配置:如果使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
```
shell
python paddle_inference_eval.py
\
--model_path
=
models/ResNet50_vd_QAT
\
--use_trt
=
True
\
--use_int8
=
True
\
--use_gpu
=
True
\
--data_path
=
./dataset/ILSVRC2012/
```
-
MKLDNN预测:
```
shell
python infer.py
--model_dir
=
'MobileNetV1_infer'
\
--model_filename
=
'inference.pdmodel'
\
--model_filename
=
'inference.pdiparams'
\
--eval
=
True
\
--use_gpu
=
True
\
--enable_mkldnn
=
True
\
--use_int8
=
True
python paddle_inference_eval.py
\
--model_path
=
models/ResNet50_vd_QAT
\
--data_path
=
./dataset/ILSVRC2012/
\
--cpu_num_threads
=
10
\
--use_mkldnn
=
True
\
--use_int8
=
True
```
#### 4.2 PaddleLite端侧部署
...
...
example/auto_compression/image_classification/
infer
.py
→
example/auto_compression/image_classification/
paddle_inference_eval
.py
浏览文件 @
456b9fed
...
...
@@ -13,76 +13,72 @@
# limitations under the License.
import
os
import
numpy
as
np
import
cv2
import
time
import
sys
import
argparse
import
numpy
as
np
import
cv2
import
yaml
from
tqdm
import
tqdm
from
utils
import
preprocess
,
postprocess
import
paddle
from
paddle.inference
import
create_predictor
from
paddleslim.common
import
load_config
from
paddle.io
import
DataLoader
from
imagenet_reader
import
ImageNetDataset
,
process_image
from
imagenet_reader
import
ImageNetDataset
def
argsparser
():
"""
argsparser func
"""
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
.
add_argument
(
'--model_dir'
,
"--model_path"
,
type
=
str
,
default
=
'./MobileNetV1_infer'
,
help
=
'model directory'
)
default
=
"./MobileNetV1_infer"
,
help
=
"model directory"
)
parser
.
add_argument
(
'--model_filename'
,
"--model_filename"
,
type
=
str
,
default
=
'inference.pdmodel'
,
help
=
'model file name'
)
default
=
"inference.pdmodel"
,
help
=
"model file name"
)
parser
.
add_argument
(
'--params_filename'
,
"--params_filename"
,
type
=
str
,
default
=
'inference.pdiparams'
,
help
=
'params file name'
)
parser
.
add_argument
(
'--batch_size'
,
type
=
int
,
default
=
1
)
parser
.
add_argument
(
'--img_size'
,
type
=
int
,
default
=
224
)
parser
.
add_argument
(
'--resize_size'
,
type
=
int
,
default
=
256
)
default
=
"inference.pdiparams"
,
help
=
"params file name"
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
1
)
parser
.
add_argument
(
"--img_size"
,
type
=
int
,
default
=
224
)
parser
.
add_argument
(
"--resize_size"
,
type
=
int
,
default
=
256
)
parser
.
add_argument
(
'--eval'
,
type
=
bool
,
default
=
False
,
help
=
'Whether to evaluate'
)
parser
.
add_argument
(
'--data_path'
,
type
=
str
,
default
=
'./ILSVRC2012/'
)
"--data_path"
,
type
=
str
,
default
=
"./dataset/ILSVRC2012/"
)
parser
.
add_argument
(
'--use_gpu'
,
type
=
bool
,
default
=
False
,
help
=
'Whether to use gpu'
)
"--use_gpu"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to use gpu"
)
parser
.
add_argument
(
'--enable_mkldnn'
,
type
=
bool
,
default
=
False
,
help
=
'Whether to use mkldnn'
)
"--use_trt"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to use tensorrt"
)
parser
.
add_argument
(
'--cpu_num_threads'
,
type
=
int
,
default
=
10
,
help
=
'Number of cpu threads'
)
"--use_mkldnn"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to use mkldnn"
)
parser
.
add_argument
(
'--use_fp16'
,
type
=
bool
,
default
=
False
,
help
=
'Whether to use fp16'
)
"--cpu_num_threads"
,
type
=
int
,
default
=
10
,
help
=
"Number of cpu threads"
)
parser
.
add_argument
(
'--use_int8'
,
type
=
bool
,
default
=
False
,
help
=
'Whether to use int8'
)
"--use_fp16"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to use fp16"
)
parser
.
add_argument
(
'--use_tensorrt'
,
type
=
bool
,
default
=
True
,
help
=
'Whether to use tensorrt'
)
"--use_int8"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to use int8"
)
parser
.
add_argument
(
"--gpu_mem"
,
type
=
int
,
default
=
8000
,
help
=
"GPU memory"
)
parser
.
add_argument
(
"--ir_optim"
,
type
=
bool
,
default
=
True
)
parser
.
add_argument
(
'--enable_profile'
,
"--use_dynamic_shape"
,
type
=
bool
,
default
=
False
,
help
=
'Whether to enable profile'
)
parser
.
add_argument
(
'--gpu_mem'
,
type
=
int
,
default
=
8000
,
help
=
'GPU memory'
)
parser
.
add_argument
(
'--ir_optim'
,
type
=
bool
,
default
=
True
)
default
=
True
,
help
=
"Whether use dynamic shape or not."
)
return
parser
def
eval_reader
(
data_dir
,
batch_size
,
crop_size
,
resize_size
):
"""
eval reader func
"""
val_reader
=
ImageNetDataset
(
mode
=
'val'
,
mode
=
"val"
,
data_dir
=
data_dir
,
crop_size
=
crop_size
,
resize_size
=
resize_size
)
...
...
@@ -96,14 +92,17 @@ def eval_reader(data_dir, batch_size, crop_size, resize_size):
class
Predictor
(
object
):
def
__init__
(
self
,
args
):
"""
Paddle Inference Predictor class
"""
def
__init__
(
self
):
# HALF precission predict only work when using tensorrt
if
args
.
use_fp16
is
True
:
assert
args
.
use_tensorrt
is
True
self
.
args
=
args
assert
args
.
use_trt
is
True
self
.
paddle_predictor
=
self
.
create_paddle_predictor
()
self
.
rerun_flag
=
False
self
.
paddle_predictor
=
self
.
_create_paddle_predictor
()
input_names
=
self
.
paddle_predictor
.
get_input_names
()
self
.
input_tensor
=
self
.
paddle_predictor
.
get_input_handle
(
input_names
[
0
])
...
...
@@ -112,96 +111,94 @@ class Predictor(object):
self
.
output_tensor
=
self
.
paddle_predictor
.
get_output_handle
(
output_names
[
0
])
def
create_paddle_predictor
(
self
):
inference_model_dir
=
self
.
args
.
model_dir
model_file
=
os
.
path
.
join
(
inference_model_dir
,
self
.
args
.
model_filename
)
params_file
=
os
.
path
.
join
(
inference_model_dir
,
self
.
args
.
params_filename
)
def
_create_paddle_predictor
(
self
):
inference_model_dir
=
args
.
model_path
model_file
=
os
.
path
.
join
(
inference_model_dir
,
args
.
model_filename
)
params_file
=
os
.
path
.
join
(
inference_model_dir
,
args
.
params_filename
)
config
=
paddle
.
inference
.
Config
(
model_file
,
params_file
)
precision
=
paddle
.
inference
.
Config
.
Precision
.
Float32
if
self
.
args
.
use_int8
:
if
args
.
use_int8
:
precision
=
paddle
.
inference
.
Config
.
Precision
.
Int8
elif
self
.
args
.
use_fp16
:
elif
args
.
use_fp16
:
precision
=
paddle
.
inference
.
Config
.
Precision
.
Half
if
self
.
args
.
use_gpu
:
config
.
enable_use_gpu
(
self
.
args
.
gpu_mem
,
0
)
if
args
.
use_gpu
:
config
.
enable_use_gpu
(
args
.
gpu_mem
,
0
)
else
:
config
.
disable_gpu
()
if
self
.
args
.
enable_mkldnn
:
# cache 10 different shapes for mkldnn to avoid memory leak
config
.
set_mkldnn_cache_capacity
(
10
)
config
.
set_cpu_math_library_num_threads
(
args
.
cpu_num_threads
)
config
.
switch_ir_optim
()
if
args
.
use_mkldnn
:
config
.
enable_mkldnn
()
config
.
set_cpu_math_library_num_threads
(
self
.
args
.
cpu_num_threads
)
if
args
.
use_int8
:
config
.
enable_mkldnn_int8
(
{
"conv2d"
,
"depthwise_conv2d"
,
"transpose2"
,
"pool2d"
})
if
self
.
args
.
enable_profile
:
config
.
enable_profile
()
config
.
switch_ir_optim
(
self
.
args
.
ir_optim
)
# default true
if
self
.
args
.
use_tensorrt
:
config
.
switch_ir_optim
(
args
.
ir_optim
)
# default true
if
args
.
use_trt
:
config
.
enable_tensorrt_engine
(
precision_mode
=
precision
,
max_batch_size
=
self
.
args
.
batch_size
,
max_batch_size
=
args
.
batch_size
,
workspace_size
=
1
<<
30
,
min_subgraph_size
=
30
,
use_calib_mode
=
False
)
use_static
=
True
,
use_calib_mode
=
False
,
)
if
args
.
use_dynamic_shape
:
dynamic_shape_file
=
os
.
path
.
join
(
inference_model_dir
,
"dynamic_shape.txt"
)
if
os
.
path
.
exists
(
dynamic_shape_file
):
config
.
enable_tuned_tensorrt_dynamic_shape
(
dynamic_shape_file
,
True
)
print
(
"trt set dynamic shape done!"
)
else
:
config
.
collect_shape_range_info
(
dynamic_shape_file
)
print
(
"Start collect dynamic shape..."
)
self
.
rerun_flag
=
True
config
.
enable_memory_optim
()
# use zero copy
config
.
switch_use_feed_fetch_ops
(
False
)
predictor
=
create_predictor
(
config
)
return
predictor
def
predict
(
self
):
test_num
=
1000
test_time
=
0.0
for
i
in
range
(
0
,
test_num
+
10
):
inputs
=
np
.
random
.
rand
(
self
.
args
.
batch_size
,
3
,
self
.
args
.
img_size
,
self
.
args
.
img_size
).
astype
(
np
.
float32
)
start_time
=
time
.
time
()
self
.
input_tensor
.
copy_from_cpu
(
inputs
)
self
.
paddle_predictor
.
run
()
batch_output
=
self
.
output_tensor
.
copy_to_cpu
().
flatten
()
if
i
>=
10
:
test_time
+=
time
.
time
()
-
start_time
time
.
sleep
(
0.01
)
# sleep for T4 GPU
fp_message
=
"FP16"
if
self
.
args
.
use_fp16
else
"FP32"
fp_message
=
"INT8"
if
self
.
args
.
use_int8
else
fp_message
trt_msg
=
"using tensorrt"
if
self
.
args
.
use_tensorrt
else
"not using tensorrt"
print
(
"{0}
\t
{1}
\t
batch size: {2}
\t
time(ms): {3}"
.
format
(
trt_msg
,
fp_message
,
args
.
batch_size
,
1000
*
test_time
/
test_num
))
def
eval
(
self
):
if
os
.
path
.
exists
(
self
.
args
.
data_path
):
"""
eval func
"""
if
os
.
path
.
exists
(
args
.
data_path
):
val_loader
=
eval_reader
(
self
.
args
.
data_path
,
batch_size
=
self
.
args
.
batch_size
,
crop_size
=
self
.
args
.
img_size
,
resize_size
=
self
.
args
.
resize_size
)
args
.
data_path
,
batch_size
=
args
.
batch_size
,
crop_size
=
args
.
img_size
,
resize_size
=
args
.
resize_size
)
else
:
image
=
np
.
ones
(
(
1
,
3
,
self
.
args
.
img_size
,
self
.
args
.
img_size
)).
astype
(
np
.
float32
)
image
=
np
.
ones
(
(
1
,
3
,
args
.
img_size
,
args
.
img_size
)).
astype
(
np
.
float32
)
label
=
None
val_loader
=
[[
image
,
label
]]
results
=
[]
with
tqdm
(
total
=
len
(
val_loader
),
bar_format
=
'Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}'
,
ncols
=
80
)
as
t
:
for
batch_id
,
(
image
,
label
)
in
enumerate
(
val_loader
):
input_names
=
self
.
paddle_predictor
.
get_input_names
()
input_tensor
=
self
.
paddle_predictor
.
get_input_handle
(
input_names
[
0
])
input_tensor
=
self
.
paddle_predictor
.
get_input_handle
(
input_names
[
0
])
output_names
=
self
.
paddle_predictor
.
get_output_names
()
output_tensor
=
self
.
paddle_predictor
.
get_output_handle
(
output_names
[
0
])
output_tensor
=
self
.
paddle_predictor
.
get_output_handle
(
output_names
[
0
])
predict_time
=
0.0
time_min
=
float
(
"inf"
)
time_max
=
float
(
"-inf"
)
sample_nums
=
len
(
val_loader
)
for
batch_id
,
(
image
,
label
)
in
enumerate
(
val_loader
):
image
=
np
.
array
(
image
)
input_tensor
.
copy_from_cpu
(
image
)
start_time
=
time
.
time
()
self
.
paddle_predictor
.
run
()
batch_output
=
output_tensor
.
copy_to_cpu
()
end_time
=
time
.
time
()
timed
=
end_time
-
start_time
time_min
=
min
(
time_min
,
timed
)
time_max
=
max
(
time_max
,
timed
)
predict_time
+=
timed
if
self
.
rerun_flag
:
return
sort_array
=
batch_output
.
argsort
(
axis
=
1
)
top_1_pred
=
sort_array
[:,
-
1
:][:,
::
-
1
]
if
label
is
None
:
...
...
@@ -211,22 +208,43 @@ class Predictor(object):
top_1
=
np
.
mean
(
label
==
top_1_pred
)
top_5_pred
=
sort_array
[:,
-
5
:][:,
::
-
1
]
acc_num
=
0
for
i
in
range
(
len
(
label
)
):
for
i
,
_
in
enumerate
(
label
):
if
label
[
i
][
0
]
in
top_5_pred
[
i
]:
acc_num
+=
1
top_5
=
float
(
acc_num
)
/
len
(
label
)
results
.
append
([
top_1
,
top_5
])
if
batch_id
%
100
==
0
:
print
(
"Eval iter:"
,
batch_id
)
sys
.
stdout
.
flush
()
result
=
np
.
mean
(
np
.
array
(
results
),
axis
=
0
)
t
.
update
()
print
(
'Evaluation result: {}'
.
format
(
result
[
0
]))
fp_message
=
"FP16"
if
args
.
use_fp16
else
"FP32"
fp_message
=
"INT8"
if
args
.
use_int8
else
fp_message
print_msg
=
"Paddle"
if
args
.
use_trt
:
print_msg
=
"using TensorRT"
elif
args
.
use_mkldnn
:
print_msg
=
"using MKLDNN"
time_avg
=
predict_time
/
sample_nums
print
(
"[Benchmark]{}
\t
{}
\t
batch size: {}.Inference time(ms): min={}, max={}, avg={}"
.
format
(
print_msg
,
fp_message
,
args
.
batch_size
,
round
(
time_min
*
1000
,
2
),
round
(
time_max
*
1000
,
1
),
round
(
time_avg
*
1000
,
1
),
))
print
(
"[Benchmark] Evaluation acc result: {}"
.
format
(
result
[
0
]))
sys
.
stdout
.
flush
()
if
__name__
==
"__main__"
:
parser
=
argsparser
()
global
args
args
=
parser
.
parse_args
()
predictor
=
Predictor
(
args
)
predictor
.
predict
()
if
args
.
eval
:
predictor
=
Predictor
()
predictor
.
eval
()
if
predictor
.
rerun_flag
:
print
(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
example/auto_compression/nlp/README.md
浏览文件 @
456b9fed
...
...
@@ -194,25 +194,41 @@ Quantization:
## 5. 预测部署
-
Python部署:
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
首先安装带有TensorRT的
[
Paddle安装包
](
https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python
)
。
然后使用
[
infer.py
](
./infer.py
)
进行部署
:
-
TensorRT预测
:
本示例将以ERNIE 3.0-Medium模型、afqmc数据集的为例,介绍如何利用Paddle—TensorRT测试压缩后模型的精度和速度。
环境配置:如果使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
精度测试方法
:
首先下载量化好的模型
:
```
shell
python infer.py
--task_name
=
'afqmc'
--model_path
=
'./save_ernie3.0_afqmc/'
--device
=
'gpu'
--use_trt
--int8
wget https://bj.bcebos.com/v1/paddle-slim-models/act/save_ppminilm_afqmc_new_calib.tar
tar
-xf
save_ppminilm_afqmc_new_calib.tar
```
速度测试方法
```
shell
python infer.py
--task_name
=
'afqmc'
--model_path
=
'./save_ernie3.0_afqmc/'
--device
=
'gpu'
--use_trt
--int8
--perf
python paddle_inference_eval.py
\
--model_path
=
save_ernie3_afqmc_new_cablib
\
--model_filename
=
infer.pdmodel
\
--params_filename
=
infer.pdiparams
\
--task_name
=
'afqmc'
\
--use_trt
\
--precision
=
int8
```
-
[
PP-MiniLM Paddle Inference Python部署
](
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/model_compression/pp-minilm
)
-
[
ERNIE-3.0 Paddle Inference Python部署
](
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0
)
-
MKLDNN预测:
```
shell
python paddle_inference_eval.py
\
--model_path
=
save_ernie3_afqmc_new_cablib
\
--model_filename
=
infer.pdmodel
\
--params_filename
=
infer.pdiparams
\
--task_name
=
'afqmc'
\
--device
=
cpu
\
--use_mkldnn
=
True
\
--cpu_threads
=
10
\
--precision
=
int8
```
## 6. FAQ
example/auto_compression/nlp/
infer
.py
→
example/auto_compression/nlp/
paddle_inference_eval
.py
浏览文件 @
456b9fed
...
...
@@ -45,96 +45,42 @@ METRIC_CLASSES = {
}
def
convert_example
(
example
,
dataset
,
tokenizer
,
label_list
,
max_seq_length
=
512
):
assert
dataset
in
[
'glue'
,
'clue'
],
"This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if
dataset
==
'glue'
:
# `label_list == None` is for regression task
label_dtype
=
"int64"
if
label_list
else
"float32"
# Get the label
label
=
example
[
'labels'
]
label
=
np
.
array
([
label
],
dtype
=
label_dtype
)
# Convert raw text to feature
example
=
tokenizer
(
example
[
'sentence'
],
max_seq_len
=
max_seq_length
)
return
example
[
'input_ids'
],
example
[
'token_type_ids'
],
label
else
:
#if dataset == 'clue':
# `label_list == None` is for regression task
label_dtype
=
"int64"
if
label_list
else
"float32"
# Get the label
example
[
'label'
]
=
np
.
array
(
example
[
"label"
],
dtype
=
"int64"
).
reshape
((
-
1
,
1
))
label
=
example
[
'label'
]
# Convert raw text to feature
if
'keyword'
in
example
:
# CSL
sentence1
=
" "
.
join
(
example
[
'keyword'
])
example
=
{
'sentence1'
:
sentence1
,
'sentence2'
:
example
[
'abst'
],
'label'
:
example
[
'label'
]
}
elif
'target'
in
example
:
# wsc
text
,
query
,
pronoun
,
query_idx
,
pronoun_idx
=
example
[
'text'
],
example
[
'target'
][
'span1_text'
],
example
[
'target'
][
'span2_text'
],
example
[
'target'
][
'span1_index'
],
example
[
'target'
][
'span2_index'
]
text_list
=
list
(
text
)
assert
text
[
pronoun_idx
:(
pronoun_idx
+
len
(
pronoun
))]
==
pronoun
,
"pronoun: {}"
.
format
(
pronoun
)
assert
text
[
query_idx
:(
query_idx
+
len
(
query
)
)]
==
query
,
"query: {}"
.
format
(
query
)
if
pronoun_idx
>
query_idx
:
text_list
.
insert
(
query_idx
,
"_"
)
text_list
.
insert
(
query_idx
+
len
(
query
)
+
1
,
"_"
)
text_list
.
insert
(
pronoun_idx
+
2
,
"["
)
text_list
.
insert
(
pronoun_idx
+
len
(
pronoun
)
+
2
+
1
,
"]"
)
else
:
text_list
.
insert
(
pronoun_idx
,
"["
)
text_list
.
insert
(
pronoun_idx
+
len
(
pronoun
)
+
1
,
"]"
)
text_list
.
insert
(
query_idx
+
2
,
"_"
)
text_list
.
insert
(
query_idx
+
len
(
query
)
+
2
+
1
,
"_"
)
text
=
""
.
join
(
text_list
)
example
[
'sentence'
]
=
text
if
tokenizer
is
None
:
return
example
if
'sentence'
in
example
:
example
=
tokenizer
(
example
[
'sentence'
],
max_seq_len
=
max_seq_length
)
elif
'sentence1'
in
example
:
example
=
tokenizer
(
example
[
'sentence1'
],
text_pair
=
example
[
'sentence2'
],
max_seq_len
=
max_seq_length
)
return
example
[
'input_ids'
],
example
[
'token_type_ids'
],
label
def
parse_args
():
"""
parse_args func
"""
parser
=
argparse
.
ArgumentParser
()
# Required parameters
parser
.
add_argument
(
"--model_path"
,
default
=
"./afqmc"
,
type
=
str
,
required
=
True
,
help
=
"The path prefix of inference model to be used."
,
)
parser
.
add_argument
(
"--model_filename"
,
type
=
str
,
default
=
"inference.pdmodel"
,
help
=
"model file name"
)
parser
.
add_argument
(
"--params_filename"
,
type
=
str
,
default
=
"inference.pdiparams"
,
help
=
"params file name"
)
parser
.
add_argument
(
"--task_name"
,
default
=
'afqmc'
,
default
=
"afqmc"
,
type
=
str
,
help
=
"The name of the task to perform predict, selected in the list: "
+
", "
.
join
(
METRIC_CLASSES
.
keys
()),
)
parser
.
add_argument
(
"--dataset"
,
default
=
'clue'
,
default
=
"clue"
,
type
=
str
,
help
=
"The dataset of model."
,
)
parser
.
add_argument
(
"--model_path"
,
default
=
'./quant_models/model'
,
type
=
str
,
required
=
True
,
help
=
"The path prefix of inference model to be used."
,
)
parser
.
add_argument
(
"--device"
,
default
=
"gpu"
,
choices
=
[
"gpu"
,
"cpu"
,
"xpu"
],
choices
=
[
"gpu"
,
"cpu"
],
help
=
"Device selected for inference."
,
)
parser
.
add_argument
(
"--batch_size"
,
...
...
@@ -154,25 +100,101 @@ def parse_args():
help
=
"Warmup steps for performance test."
,
)
parser
.
add_argument
(
"--use_trt"
,
action
=
'store_true'
,
action
=
"store_true"
,
help
=
"Whether to use inference engin TensorRT."
,
)
parser
.
add_argument
(
"--perf"
,
action
=
'store_true'
,
help
=
"Whether to test performance."
,
)
"--precision"
,
type
=
str
,
default
=
"fp32"
,
choices
=
[
"fp32"
,
"fp16"
,
"int8"
],
help
=
"The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'."
,
)
parser
.
add_argument
(
"--int8"
,
action
=
'store_true'
,
help
=
"Whether to use int8 inference."
,
)
"--use_mkldnn"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use mkldnn or not."
)
parser
.
add_argument
(
"--fp16"
,
action
=
'store_true'
,
help
=
"Whether to use float16 inference."
,
)
"--cpu_threads"
,
type
=
int
,
default
=
1
,
help
=
"Num of cpu threads."
)
args
=
parser
.
parse_args
()
return
args
def
_convert_example
(
example
,
dataset
,
tokenizer
,
label_list
,
max_seq_length
=
512
):
assert
dataset
in
[
"glue"
,
"clue"
],
"This demo only supports for dataset glue or clue"
"""Convert a glue example into necessary features."""
if
dataset
==
"glue"
:
# `label_list == None` is for regression task
label_dtype
=
"int64"
if
label_list
else
"float32"
# Get the label
label
=
example
[
"labels"
]
label
=
np
.
array
([
label
],
dtype
=
label_dtype
)
# Convert raw text to feature
example
=
tokenizer
(
example
[
"sentence"
],
max_seq_len
=
max_seq_length
)
return
example
[
"input_ids"
],
example
[
"token_type_ids"
],
label
else
:
# if dataset == 'clue':
# `label_list == None` is for regression task
label_dtype
=
"int64"
if
label_list
else
"float32"
# Get the label
example
[
"label"
]
=
np
.
array
(
example
[
"label"
],
dtype
=
"int64"
).
reshape
((
-
1
,
1
))
label
=
example
[
"label"
]
# Convert raw text to feature
if
"keyword"
in
example
:
# CSL
sentence1
=
" "
.
join
(
example
[
"keyword"
])
example
=
{
"sentence1"
:
sentence1
,
"sentence2"
:
example
[
"abst"
],
"label"
:
example
[
"label"
]
}
elif
"target"
in
example
:
# wsc
text
,
query
,
pronoun
,
query_idx
,
pronoun_idx
=
(
example
[
"text"
],
example
[
"target"
][
"span1_text"
],
example
[
"target"
][
"span2_text"
],
example
[
"target"
][
"span1_index"
],
example
[
"target"
][
"span2_index"
],
)
text_list
=
list
(
text
)
assert
text
[
pronoun_idx
:(
pronoun_idx
+
len
(
pronoun
))]
==
pronoun
,
"pronoun: {}"
.
format
(
pronoun
)
assert
text
[
query_idx
:(
query_idx
+
len
(
query
)
)]
==
query
,
"query: {}"
.
format
(
query
)
if
pronoun_idx
>
query_idx
:
text_list
.
insert
(
query_idx
,
"_"
)
text_list
.
insert
(
query_idx
+
len
(
query
)
+
1
,
"_"
)
text_list
.
insert
(
pronoun_idx
+
2
,
"["
)
text_list
.
insert
(
pronoun_idx
+
len
(
pronoun
)
+
2
+
1
,
"]"
)
else
:
text_list
.
insert
(
pronoun_idx
,
"["
)
text_list
.
insert
(
pronoun_idx
+
len
(
pronoun
)
+
1
,
"]"
)
text_list
.
insert
(
query_idx
+
2
,
"_"
)
text_list
.
insert
(
query_idx
+
len
(
query
)
+
2
+
1
,
"_"
)
text
=
""
.
join
(
text_list
)
example
[
"sentence"
]
=
text
if
tokenizer
is
None
:
return
example
if
"sentence"
in
example
:
example
=
tokenizer
(
example
[
"sentence"
],
max_seq_len
=
max_seq_length
)
elif
"sentence1"
in
example
:
example
=
tokenizer
(
example
[
"sentence1"
],
text_pair
=
example
[
"sentence2"
],
max_seq_len
=
max_seq_length
)
return
example
[
"input_ids"
],
example
[
"token_type_ids"
],
label
class
Predictor
(
object
):
"""
Inference Predictor class
"""
def
__init__
(
self
,
predictor
,
input_handles
,
output_handles
):
self
.
predictor
=
predictor
self
.
input_handles
=
input_handles
...
...
@@ -180,60 +202,50 @@ class Predictor(object):
@
classmethod
def
create_predictor
(
cls
,
args
):
config
=
paddle
.
inference
.
Config
(
args
.
model_path
+
"infer.pdmodel"
,
args
.
model_path
+
"infer.pdiparams"
)
"""
create_predictor func
"""
cls
.
rerun_flag
=
False
config
=
paddle
.
inference
.
Config
(
os
.
path
.
join
(
args
.
model_path
,
args
.
model_filename
),
os
.
path
.
join
(
args
.
model_path
,
args
.
params_filename
))
if
args
.
device
==
"gpu"
:
# set GPU configs accordingly
config
.
enable_use_gpu
(
100
,
0
)
cls
.
device
=
paddle
.
set_device
(
"gpu"
)
elif
args
.
device
==
"cpu"
:
# set CPU configs accordingly,
# such as enable_mkldnn, set_cpu_math_library_num_threads
config
.
disable_gpu
()
cls
.
device
=
paddle
.
set_device
(
"cpu"
)
elif
args
.
device
==
"xpu"
:
# set XPU configs accordingly
config
.
enable_xpu
(
100
)
if
args
.
use_trt
:
if
args
.
int8
:
config
.
enable_tensorrt_engine
(
workspace_size
=
1
<<
30
,
precision_mode
=
inference
.
PrecisionType
.
Int8
,
max_batch_size
=
args
.
batch_size
,
min_subgraph_size
=
5
,
use_static
=
False
,
use_calib_mode
=
False
)
elif
args
.
fp16
:
config
.
enable_tensorrt_engine
(
workspace_size
=
1
<<
30
,
precision_mode
=
inference
.
PrecisionType
.
Half
,
max_batch_size
=
args
.
batch_size
,
min_subgraph_size
=
5
,
use_static
=
False
,
use_calib_mode
=
False
)
else
:
config
.
disable_gpu
()
config
.
set_cpu_math_library_num_threads
(
args
.
cpu_threads
)
config
.
switch_ir_optim
()
if
args
.
use_mkldnn
:
config
.
enable_mkldnn
()
if
args
.
precision
==
"int8"
:
config
.
enable_mkldnn_int8
()
precision_map
=
{
"int8"
:
inference
.
PrecisionType
.
Int8
,
"fp32"
:
inference
.
PrecisionType
.
Float32
,
"fp16"
:
inference
.
PrecisionType
.
Half
,
}
if
args
.
precision
in
precision_map
.
keys
()
and
args
.
use_trt
:
config
.
enable_tensorrt_engine
(
workspace_size
=
1
<<
30
,
precision_mode
=
inference
.
PrecisionType
.
Float32
,
max_batch_size
=
args
.
batch_size
,
min_subgraph_size
=
5
,
use_static
=
False
,
use_calib_mode
=
False
)
print
(
"Enable TensorRT is: {}"
.
format
(
config
.
tensorrt_engine_enabled
()))
precision_mode
=
precision_map
[
args
.
precision
],
use_static
=
True
,
use_calib_mode
=
False
,
)
dynamic_shape_file
=
os
.
path
.
join
(
args
.
model_path
,
'dynamic_shape.txt'
)
"dynamic_shape.txt"
)
if
os
.
path
.
exists
(
dynamic_shape_file
):
config
.
enable_tuned_tensorrt_dynamic_shape
(
dynamic_shape_file
,
True
)
print
(
'trt set dynamic shape done!'
)
print
(
"trt set dynamic shape done!"
)
else
:
config
.
collect_shape_range_info
(
dynamic_shape_file
)
print
(
'Start collect dynamic shape... Please eval again to get real result in TensorRT'
)
sys
.
exit
()
print
(
"Start collect dynamic shape..."
)
cls
.
rerun_flag
=
True
predictor
=
paddle
.
inference
.
create_predictor
(
config
)
...
...
@@ -249,6 +261,9 @@ class Predictor(object):
return
cls
(
predictor
,
input_handles
,
output_handles
)
def
predict_batch
(
self
,
data
):
"""
predict from batch func
"""
for
input_field
,
input_handle
in
zip
(
data
,
self
.
input_handles
):
input_handle
.
copy_from_cpu
(
input_field
)
self
.
predictor
.
run
()
...
...
@@ -257,11 +272,11 @@ class Predictor(object):
]
return
output
def
convert_predict_batch
(
self
,
args
,
data
,
tokenizer
,
batchify_fn
,
def
_
convert_predict_batch
(
self
,
args
,
data
,
tokenizer
,
batchify_fn
,
label_list
):
examples
=
[]
for
example
in
data
:
example
=
convert_example
(
example
=
_
convert_example
(
example
,
args
.
dataset
,
tokenizer
,
...
...
@@ -272,64 +287,82 @@ class Predictor(object):
return
examples
def
predict
(
self
,
dataset
,
tokenizer
,
batchify_fn
,
args
):
"""
predict func
"""
batches
=
[
dataset
[
idx
:
idx
+
args
.
batch_size
]
for
idx
in
range
(
0
,
len
(
dataset
),
args
.
batch_size
)
]
if
args
.
perf
:
for
i
,
batch
in
enumerate
(
batches
):
examples
=
self
.
convert_predict_batch
(
examples
=
self
.
_
convert_predict_batch
(
args
,
batch
,
tokenizer
,
batchify_fn
,
dataset
.
label_list
)
input_ids
,
segment_ids
,
label
=
batchify_fn
(
examples
)
output
=
self
.
predict_batch
([
input_ids
,
segment_ids
])
if
i
>
args
.
perf_warmup_steps
:
break
start_time
=
time
.
time
()
for
i
,
batch
in
enumerate
(
batches
):
examples
=
self
.
convert_predict_batch
(
args
,
batch
,
tokenizer
,
batchify_fn
,
dataset
.
label_list
)
input_ids
,
segment_ids
,
_
=
batchify_fn
(
examples
)
output
=
self
.
predict_batch
([
input_ids
,
segment_ids
])
end_time
=
time
.
time
()
sequences_num
=
i
*
args
.
batch_size
print
(
"task name: %s, time: %s qps/s, "
%
(
args
.
task_name
,
sequences_num
/
(
end_time
-
start_time
)))
if
self
.
rerun_flag
:
return
else
:
metric
=
METRIC_CLASSES
[
args
.
task_name
]()
metric
.
reset
()
predict_time
=
0.0
for
i
,
batch
in
enumerate
(
batches
):
examples
=
self
.
convert_predict_batch
(
examples
=
self
.
_
convert_predict_batch
(
args
,
batch
,
tokenizer
,
batchify_fn
,
dataset
.
label_list
)
input_ids
,
segment_ids
,
label
=
batchify_fn
(
examples
)
start_time
=
time
.
time
()
output
=
self
.
predict_batch
([
input_ids
,
segment_ids
])
end_time
=
time
.
time
()
predict_time
+=
end_time
-
start_time
correct
=
metric
.
compute
(
paddle
.
to_tensor
(
output
),
paddle
.
to_tensor
(
np
.
array
(
label
).
flatten
()))
metric
.
update
(
correct
)
sequences_num
=
i
*
args
.
batch_size
print
(
"[benchmark]task name: {}, batch size: {} Inference time per batch: {}ms, qps: {}."
.
format
(
args
.
task_name
,
args
.
batch_size
,
round
(
predict_time
*
1000
/
i
,
2
),
round
(
sequences_num
/
predict_time
,
2
),
))
res
=
metric
.
accumulate
()
print
(
"task name: %s, acc: %s,
\n
"
%
(
args
.
task_name
,
res
),
end
=
''
)
print
(
"[benchmark]task name: %s, acc: %s.
\n
"
%
(
args
.
task_name
,
res
),
end
=
""
)
sys
.
stdout
.
flush
()
def
main
():
"""
main func
"""
paddle
.
seed
(
42
)
args
=
parse_args
()
args
.
task_name
=
args
.
task_name
.
lower
()
if
args
.
use_mkldnn
:
paddle
.
set_device
(
"cpu"
)
predictor
=
Predictor
.
create_predictor
(
args
)
dev_ds
=
load_dataset
(
'clue'
,
args
.
task_name
,
splits
=
'dev'
)
dev_ds
=
load_dataset
(
"clue"
,
args
.
task_name
,
splits
=
"dev"
)
tokenizer
=
AutoTokenizer
.
from_pretrained
(
args
.
model_path
)
batchify_fn
=
lambda
samples
,
fn
=
Tuple
(
Pad
(
axis
=
0
,
pad_val
=
tokenizer
.
pad_token_id
),
# input
Pad
(
axis
=
0
,
pad_val
=
tokenizer
.
pad_token_id
),
# segment
Stack
(
dtype
=
"int64"
if
dev_ds
.
label_list
else
"float32"
)
# label
Stack
(
dtype
=
"int64"
if
dev_ds
.
label_list
else
"float32"
)
,
# label
):
fn
(
samples
)
outputs
=
predictor
.
predict
(
dev_ds
,
tokenizer
,
batchify_fn
,
args
)
predictor
.
predict
(
dev_ds
,
tokenizer
,
batchify_fn
,
args
)
if
predictor
.
rerun_flag
:
print
(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if
__name__
==
"__main__"
:
paddle
.
set_device
(
"cpu"
)
main
()
example/auto_compression/pytorch_huggingface/README.md
浏览文件 @
456b9fed
...
...
@@ -14,12 +14,9 @@
## 1. 简介
飞桨模型转换工具
[
X2Paddle
](
https://github.com/PaddlePaddle/X2Paddle
)
支持将
```Caffe/TensorFlow/ONNX/PyTorch```
的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,PaddleSlim的自动压缩功能可方便地用于各种框架的推理模型。
本示例将以
[
Pytorch
](
https://github.com/pytorch/pytorch
)
框架的自然语言处理模型为例,介绍如何自动压缩其他框架中的自然语言处理模型。本示例会利用
[
huggingface
](
https://github.com/huggingface/transformers
)
开源transformers库,将Pytorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为剪枝蒸馏和量化训练。
## 2. Benchmark
[
BERT
](
https://arxiv.org/abs/1810.04805
)
(
```Bidirectional Encoder Representations from Transformers```
)以Transformer 编码器为网络基本组件,使用掩码语言模型(
```Masked Language Model```
)和邻接句子预测(
```Next Sentence Prediction```
)两个任务在大规模无标注文本语料上进行预训练(pre-train),得到融合了双向内容的通用语义表示模型。以预训练产生的通用语义表示模型为基础,结合任务适配的简单输出层,微调(fine-tune)后即可应用到下游的NLP任务,效果通常也较直接在下游的任务上训练的模型更优。此前BERT即在
[
GLUE
](
https://gluebenchmark.com/tasks
)
评测任务上取得了SOTA的结果。
...
...
@@ -192,41 +189,38 @@ python run.py --config_path=./configs/cola.yaml --eval True
## 4. 预测部署
环境配置:若使用 Paddle TensorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
启动配置
:
-
TensorRT预测
:
除需传入
```task_name```
任务名称,
```model_name_or_path```
模型名称,
```model_path```
保存inference模型的路径等基本参数外,还需根据预测环境传入预测参数:
-
```device```
:默认为gpu,可选为gpu, cpu, xpu
-
```use_trt```
:是否使用 TesorRT 预测引擎
-
```int8```
:是否启用
```INT8```
-
```fp16```
:是否启用
```FP16```
环境配置:如果使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
准备好inference模型后,可以使用
```infer.py```
进行预测,如使用 TesorRT 预测引擎测试 FP32
模型:
首先下载量化好的
模型:
```
shell
python
-u
./infer.py
\
--task_name
cola
\
--model_name_or_path
bert-base-cased
\
--model_path
./x2paddle_cola/model
\
--batch_size
1
\
--max_seq_length
128
\
--device
gpu
\
--use_trt
wget https://bj.bcebos.com/v1/paddle-slim-models/act/x2paddle_cola_new_calib.tar
tar
-xf
x2paddle_cola_new_calib.tar
```
如使用 TesorRT 预测引擎测试 INT8 模型:
```
shell
python
-u
./infer.py
\
--task_name
cola
\
--model_name_or_path
bert-base-cased
\
--model_path
./output/cola/model
\
--batch_size
1
\
--max_seq_length
128
\
--device
gpu
\
python paddle_inference_eval.py
\
--model_path
=
x2paddle_cola_new_calib
\
--use_trt
\
--int8
--precision
=
int8
\
--batch_size
=
1
```
-
MKLDNN预测:
```
shell
python paddle_inference_eval.py
\
--model_path
=
x2paddle_cola_new_calib
\
--device
=
cpu
\
--use_mkldnn
=
True
\
--cpu_threads
=
10
\
--batch_size
=
1
\
--precision
=
int8
```
...
...
example/auto_compression/pytorch_huggingface/
infer
.py
→
example/auto_compression/pytorch_huggingface/
paddle_inference_eval
.py
浏览文件 @
456b9fed
...
...
@@ -22,9 +22,9 @@ import numpy as np
import
paddle
from
paddle
import
inference
from
paddle.metric
import
Metric
,
Accuracy
,
Precision
,
Recall
from
paddlenlp.datasets
import
load_dataset
from
paddlenlp.data
import
Stack
,
Tuple
,
Pad
from
paddle.metric
import
Metric
,
Accuracy
,
Precision
,
Recall
from
paddlenlp.metrics
import
AccuracyAndF1
,
Mcc
,
PearsonAndSpearman
from
paddlenlp.transformers
import
BertForSequenceClassification
,
BertTokenizer
...
...
@@ -53,35 +53,46 @@ task_to_keys = {
def
parse_args
():
"""
parse_args func
"""
parser
=
argparse
.
ArgumentParser
()
# Required parameters
parser
.
add_argument
(
"--model_path"
,
default
=
"./x2paddle_cola"
,
type
=
str
,
required
=
True
,
help
=
"The path prefix of inference model to be used."
,
)
parser
.
add_argument
(
"--model_filename"
,
type
=
str
,
default
=
"model.pdmodel"
,
help
=
"model file name"
)
parser
.
add_argument
(
"--params_filename"
,
type
=
str
,
default
=
"model.pdiparams"
,
help
=
"params file name"
)
parser
.
add_argument
(
"--task_name"
,
default
=
'cola'
,
default
=
"cola"
,
type
=
str
,
help
=
"The name of the task to perform predict, selected in the list: "
+
", "
.
join
(
METRIC_CLASSES
.
keys
()),
)
parser
.
add_argument
(
"--model_type"
,
default
=
'bert-base-cased'
,
default
=
"bert-base-cased"
,
type
=
str
,
help
=
"Model type selected in bert."
)
parser
.
add_argument
(
"--model_name_or_path"
,
default
=
'bert-base-cased'
,
default
=
"bert-base-cased"
,
type
=
str
,
help
=
"The directory or name of model."
,
)
parser
.
add_argument
(
"--model_path"
,
default
=
'./quant_models/model'
,
type
=
str
,
required
=
True
,
help
=
"The path prefix of inference model to be used."
,
)
parser
.
add_argument
(
"--device"
,
default
=
"gpu"
,
choices
=
[
"gpu"
,
"cpu"
,
"xpu"
],
choices
=
[
"gpu"
,
"cpu"
],
help
=
"Device selected for inference."
,
)
parser
.
add_argument
(
"--batch_size"
,
...
...
@@ -101,42 +112,45 @@ def parse_args():
help
=
"Warmup steps for performance test."
,
)
parser
.
add_argument
(
"--use_trt"
,
action
=
'store_true'
,
action
=
"store_true"
,
help
=
"Whether to use inference engin TensorRT."
,
)
parser
.
add_argument
(
"--perf"
,
action
=
'store_true'
,
help
=
"Whether to test performance."
,
)
"--precision"
,
type
=
str
,
default
=
"fp32"
,
choices
=
[
"fp32"
,
"fp16"
,
"int8"
],
help
=
"The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'."
,
)
parser
.
add_argument
(
"--int8"
,
action
=
'store_true'
,
help
=
"Whether to use int8 inference."
,
)
"--use_mkldnn"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use mkldnn or not."
)
parser
.
add_argument
(
"--fp16"
,
action
=
'store_true'
,
help
=
"Whether to use float16 inference."
,
)
"--cpu_threads"
,
type
=
int
,
default
=
1
,
help
=
"Num of cpu threads."
)
args
=
parser
.
parse_args
()
return
args
def
convert_example
(
example
,
def
_convert_example
(
example
,
tokenizer
,
label_list
,
max_seq_length
=
512
,
task_name
=
None
,
is_test
=
False
,
padding
=
'max_length'
,
return_attention_mask
=
True
):
padding
=
"max_length"
,
return_attention_mask
=
True
,
):
if
not
is_test
:
# `label_list == None` is for regression task
label_dtype
=
"int64"
if
label_list
else
"float32"
# Get the label
label
=
example
[
'labels'
]
label
=
example
[
"labels"
]
label
=
np
.
array
([
label
],
dtype
=
label_dtype
)
# Convert raw text to feature
sentence1_key
,
sentence2_key
=
task_to_keys
[
task_name
]
texts
=
(
(
example
[
sentence1_key
],
)
if
sentence2_key
is
None
else
(
example
[
sentence1_key
],
example
[
sentence2_key
])
)
texts
=
(
example
[
sentence1_key
],
)
if
sentence2_key
is
None
else
(
example
[
sentence1_key
],
example
[
sentence2_key
]
)
example
=
tokenizer
(
*
texts
,
max_seq_len
=
max_seq_length
,
...
...
@@ -144,19 +158,23 @@ def convert_example(example,
return_attention_mask
=
return_attention_mask
)
if
not
is_test
:
if
return_attention_mask
:
return
example
[
'input_ids'
],
example
[
'attention_mask'
],
example
[
'token_type_ids'
],
label
return
example
[
"input_ids"
],
example
[
"attention_mask"
],
example
[
"token_type_ids"
],
label
else
:
return
example
[
'input_ids'
],
example
[
'token_type_ids'
],
label
return
example
[
"input_ids"
],
example
[
"token_type_ids"
],
label
else
:
if
return_attention_mask
:
return
example
[
'input_ids'
],
example
[
'attention_mask'
],
example
[
'token_type_ids'
]
return
example
[
"input_ids"
],
example
[
"attention_mask"
],
example
[
"token_type_ids"
]
else
:
return
example
[
'input_ids'
],
example
[
'token_type_ids'
]
return
example
[
"input_ids"
],
example
[
"token_type_ids"
]
class
Predictor
(
object
):
"""
Inference Predictor class
"""
def
__init__
(
self
,
predictor
,
input_handles
,
output_handles
):
self
.
predictor
=
predictor
self
.
input_handles
=
input_handles
...
...
@@ -164,60 +182,51 @@ class Predictor(object):
@
classmethod
def
create_predictor
(
cls
,
args
):
config
=
paddle
.
inference
.
Config
(
args
.
model_path
+
".pdmodel"
,
args
.
model_path
+
".pdiparams"
)
"""
create_predictor func
"""
cls
.
rerun_flag
=
False
config
=
paddle
.
inference
.
Config
(
os
.
path
.
join
(
args
.
model_path
,
args
.
model_filename
),
os
.
path
.
join
(
args
.
model_path
,
args
.
params_filename
))
if
args
.
device
==
"gpu"
:
# set GPU configs accordingly
config
.
enable_use_gpu
(
100
,
0
)
cls
.
device
=
paddle
.
set_device
(
"gpu"
)
elif
args
.
device
==
"cpu"
:
# set CPU configs accordingly,
# such as enable_mkldnn, set_cpu_math_library_num_threads
config
.
disable_gpu
()
cls
.
device
=
paddle
.
set_device
(
"cpu"
)
elif
args
.
device
==
"xpu"
:
# set XPU configs accordingly
config
.
enable_xpu
(
100
)
if
args
.
use_trt
:
if
args
.
int8
:
config
.
enable_tensorrt_engine
(
workspace_size
=
1
<<
30
,
precision_mode
=
inference
.
PrecisionType
.
Int8
,
max_batch_size
=
args
.
batch_size
,
min_subgraph_size
=
5
,
use_static
=
False
,
use_calib_mode
=
False
)
elif
args
.
fp16
:
config
.
enable_tensorrt_engine
(
workspace_size
=
1
<<
30
,
precision_mode
=
inference
.
PrecisionType
.
Half
,
max_batch_size
=
args
.
batch_size
,
min_subgraph_size
=
5
,
use_static
=
False
,
use_calib_mode
=
False
)
else
:
config
.
disable_gpu
()
config
.
set_cpu_math_library_num_threads
(
args
.
cpu_threads
)
config
.
switch_ir_optim
()
if
args
.
use_mkldnn
:
config
.
enable_mkldnn
()
if
args
.
precision
==
"int8"
:
config
.
enable_mkldnn_int8
(
{
"fc"
,
"reshape2"
,
"transpose2"
,
"slice"
})
precision_map
=
{
"int8"
:
inference
.
PrecisionType
.
Int8
,
"fp32"
:
inference
.
PrecisionType
.
Float32
,
"fp16"
:
inference
.
PrecisionType
.
Half
,
}
if
args
.
precision
in
precision_map
.
keys
()
and
args
.
use_trt
:
config
.
enable_tensorrt_engine
(
workspace_size
=
1
<<
30
,
precision_mode
=
inference
.
PrecisionType
.
Float32
,
max_batch_size
=
args
.
batch_size
,
min_subgraph_size
=
5
,
use_static
=
False
,
use_calib_mode
=
False
)
print
(
"Enable TensorRT is: {}"
.
format
(
config
.
tensorrt_engine_enabled
()))
precision_mode
=
precision_map
[
args
.
precision
],
use_static
=
True
,
use_calib_mode
=
False
,
)
model_dir
=
os
.
path
.
dirname
(
args
.
model_path
)
dynamic_shape_file
=
os
.
path
.
join
(
model_dir
,
'dynamic_shape.txt'
)
dynamic_shape_file
=
os
.
path
.
join
(
args
.
model_path
,
"dynamic_shape.txt"
)
if
os
.
path
.
exists
(
dynamic_shape_file
):
config
.
enable_tuned_tensorrt_dynamic_shape
(
dynamic_shape_file
,
True
)
print
(
'trt set dynamic shape done!'
)
print
(
"trt set dynamic shape done!"
)
else
:
config
.
collect_shape_range_info
(
dynamic_shape_file
)
print
(
'Start collect dynamic shape... Please eval again to get real result in TensorRT'
)
sys
.
exit
()
print
(
"Start collect dynamic shape..."
)
cls
.
rerun_flag
=
True
predictor
=
paddle
.
inference
.
create_predictor
(
config
)
...
...
@@ -233,6 +242,9 @@ class Predictor(object):
return
cls
(
predictor
,
input_handles
,
output_handles
)
def
predict
(
self
,
dataset
,
collate_fn
,
args
):
"""
predict func
"""
batch_sampler
=
paddle
.
io
.
BatchSampler
(
dataset
,
batch_size
=
args
.
batch_size
,
shuffle
=
False
)
data_loader
=
paddle
.
io
.
DataLoader
(
...
...
@@ -241,94 +253,92 @@ class Predictor(object):
collate_fn
=
collate_fn
,
num_workers
=
0
,
return_list
=
True
)
end_time
=
0
if
args
.
perf
:
for
i
,
data
in
enumerate
(
data_loader
):
for
input_field
,
input_handle
in
zip
(
data
,
self
.
input_handles
):
input_handle
.
copy_from_cpu
(
input_field
.
numpy
(
)
if
isinstance
(
input_field
,
paddle
.
Tensor
)
else
input_field
)
input_handle
.
copy_from_cpu
(
input_field
.
numpy
()
if
isinstance
(
input_field
,
paddle
.
Tensor
)
else
input_field
)
self
.
predictor
.
run
()
output
=
[
output_handle
.
copy_to_cpu
()
for
output_handle
in
self
.
output_handles
]
if
i
>
args
.
perf_warmup_steps
:
break
if
self
.
rerun_flag
:
return
time1
=
time
.
time
()
for
i
,
data
in
enumerate
(
data_loader
):
for
input_field
,
input_handle
in
zip
(
data
,
self
.
input_handles
):
input_handle
.
copy_from_cpu
(
input_field
.
numpy
(
)
if
isinstance
(
input_field
,
paddle
.
Tensor
)
else
input_field
)
self
.
predictor
.
run
()
output
=
[
output_handle
.
copy_to_cpu
()
for
output_handle
in
self
.
output_handles
]
sequences_num
=
i
*
args
.
batch_size
print
(
"task name: %s, time: %s qps/s, "
%
(
args
.
task_name
,
sequences_num
/
(
time
.
time
()
-
time1
)))
else
:
metric
=
METRIC_CLASSES
[
args
.
task_name
]()
metric
.
reset
()
predict_time
=
0.0
for
i
,
data
in
enumerate
(
data_loader
):
for
input_field
,
input_handle
in
zip
(
data
,
self
.
input_handles
):
input_handle
.
copy_from_cpu
(
input_field
.
numpy
(
)
if
isinstance
(
input_field
,
paddle
.
Tensor
)
else
input_field
)
input_handle
.
copy_from_cpu
(
input_field
.
numpy
()
if
isinstance
(
input_field
,
paddle
.
Tensor
)
else
input_field
)
start_time
=
time
.
time
(
)
self
.
predictor
.
run
()
output
=
[
output_handle
.
copy_to_cpu
()
for
output_handle
in
self
.
output_handles
]
end_time
=
time
.
time
()
predict_time
+=
end_time
-
start_time
label
=
data
[
-
1
]
correct
=
metric
.
compute
(
paddle
.
to_tensor
(
output
[
0
]),
paddle
.
to_tensor
(
np
.
array
(
label
).
flatten
()))
print
(
correct
)
metric
.
update
(
correct
)
sequences_num
=
i
*
args
.
batch_size
print
(
"[benchmark]task name: {}, batch size: {} Inference time per batch: {}ms, qps: {}."
.
format
(
args
.
task_name
,
args
.
batch_size
,
round
(
predict_time
*
1000
/
i
,
2
),
round
(
sequences_num
/
predict_time
,
2
),
))
res
=
metric
.
accumulate
()
print
(
"task name: %s, acc: %s,
\n
"
%
(
args
.
task_name
,
res
),
end
=
''
)
print
(
"[benchmark]task name: %s, acc: %s.
\n
"
%
(
args
.
task_name
,
res
),
end
=
""
)
sys
.
stdout
.
flush
()
def
main
():
"""
main func
"""
paddle
.
seed
(
42
)
args
=
parse_args
()
if
args
.
use_mkldnn
:
paddle
.
set_device
(
"cpu"
)
predictor
=
Predictor
.
create_predictor
(
args
)
args
.
task_name
=
args
.
task_name
.
lower
()
args
.
model_type
=
args
.
model_type
.
lower
()
dev_ds
=
load_dataset
(
'glue'
,
args
.
task_name
,
splits
=
'dev'
)
dev_ds
=
load_dataset
(
"glue"
,
args
.
task_name
,
splits
=
"dev"
)
tokenizer
=
BertTokenizer
.
from_pretrained
(
args
.
model_name_or_path
)
trans_func
=
partial
(
convert_example
,
_
convert_example
,
tokenizer
=
tokenizer
,
label_list
=
dev_ds
.
label_list
,
max_seq_length
=
args
.
max_seq_length
,
task_name
=
args
.
task_name
,
return_attention_mask
=
True
)
return_attention_mask
=
True
,
)
dev_ds
=
dev_ds
.
map
(
trans_func
)
batchify_fn
=
lambda
samples
,
fn
=
Tuple
(
Pad
(
axis
=
0
,
pad_val
=
tokenizer
.
pad_token_id
),
# input
Pad
(
axis
=
0
,
pad_val
=
0
),
Pad
(
axis
=
0
,
pad_val
=
tokenizer
.
pad_token_id
),
# segment
Stack
(
dtype
=
"int64"
if
dev_ds
.
label_list
else
"float32"
)
# label
Stack
(
dtype
=
"int64"
if
dev_ds
.
label_list
else
"float32"
)
,
# label
):
fn
(
samples
)
predictor
.
predict
(
dev_ds
,
batchify_fn
,
args
)
if
__name__
==
"__main__"
:
paddle
.
set_device
(
"cpu"
)
main
()
example/auto_compression/pytorch_yolo_series/README.md
浏览文件 @
456b9fed
...
...
@@ -8,7 +8,6 @@
-
[
3.2 准备数据集
](
#32-准备数据集
)
-
[
3.3 准备预测模型
](
#33-准备预测模型
)
-
[
3.4 自动压缩并产出模型
](
#34-自动压缩并产出模型
)
-
[
3.5 测试模型精度
](
#35-测试模型精度
)
-
[
4.预测部署
](
#4预测部署
)
-
[
5.FAQ
](
5FAQ
)
...
...
@@ -149,14 +148,6 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
--config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
修改
[
yolov7_qat_dis.yaml
](
./configs/yolov7_qat_dis.yaml
)
中
`model_dir`
字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```
## 4.预测部署
...
...
@@ -164,31 +155,60 @@ python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```
shell
├── model.pdiparams
# Paddle预测模型权重
├── model.pdmodel
# Paddle预测模型文件
├── calibration_table.txt
# Paddle量化后校准表
├── ONNX
│ ├── quant_model.onnx
# 量化后转出的ONNX模型
│ ├── calibration.cache
# TensorRT可以直接加载的校准表
```
####
导出至ONNX使用TensorRT部署
####
Paddle Inference部署测试
加载
`quant_model.onnx`
和
`calibration.cache`
,可以直接使用TensorRT测试脚本进行验证,详细代码可参考
[
TensorRT部署
](
./TensorRT
)
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
以下字段用于配置预测参数:
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 model.pdmodel 和 model.pdiparams 两个文件 |
| dataset_dir | eval时数据验证集路径, 默认
`dataset/coco`
|
| image_file | 如果只测试单张图片效果,直接根据image_file指定图片路径 |
| device | 使用GPU或者CPU预测,可选CPU/GPU |
| use_trt | 是否使用 TesorRT 预测引擎 |
| use_mkldnn | 是否启用
```MKL-DNN```
加速库,注意
```use_mkldnn```
与
```use_gpu```
同时为
```True```
时,将忽略
```enable_mkldnn```
,而使用
```GPU```
预测 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测精度,包括
`fp32/fp16/int8`
|
TensorRT Python部署:
首先安装带有TensorRT的
[
Paddle安装包
](
https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python
)
。
然后使用
[
paddle_inference_eval.py
](
./paddle_inference_eval.py
)
进行部署:
-
python测试:
```
shell
cd
TensorRT
python trt_eval.py
--onnx_model_file
=
output/ONNX/quant_model.onnx
\
--calibration_file
=
output/ONNX/calibration.cache
\
--image_file
=
../images/000000570688.jpg
\
--precision_mode
=
int8
python paddle_inference_eval.py
\
--model_path
=
output
\
--reader_config
=
configs/yoloe_reader.yml
\
--use_trt
=
True
\
--precision
=
int8
```
-
速度测试
-
MKLDNN预测:
```
shell
trtexec
--onnx
=
output/ONNX/quant_model.onnx
--avgRuns
=
1000
--workspace
=
1024
--calib
=
output/ONNX/calibration.cache
--int8
python paddle_inference_eval.py
\
--model_path
=
output
\
--reader_config
=
configs/yoloe_reader.yml
\
--device
=
CPU
\
--use_mkldnn
=
True
\
--cpu_threads
=
10
\
--precision
=
int8
```
-
测试单张图片
```
shell
python paddle_inference_eval.py
--model_path
=
output
--image_file
=
images/000000570688.jpg
--use_trt
=
True
--precision
=
int8
```
#### Paddle-TensorRT部署
-
C++部署
进入
[
cpp_infer
](
./cpp_infer
)
文件夹内,请按照
[
C++ TensorRT Benchmark测试教程
](
./cpp_infer/README.md
)
进行准备环境及编译,然后开始测试:
...
...
@@ -199,13 +219,22 @@ bash compile.sh
./build/trt_run
--model_file
yolov7_quant/model.pdmodel
--params_file
yolov7_quant/model.pdiparams
--run_mode
=
trt_int8
```
-
Python部署:
#### 导出至ONNX使用TensorRT部署
首先安装带有TensorRT的
[
Paddle安装包
](
https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python
)
。
加载
`quant_model.onnx`
和
`calibration.cache`
,可以直接使用TensorRT测试脚本进行验证,详细代码可参考
[
TensorRT部署
](
./TensorRT
)
-
python测试:
```
shell
cd
TensorRT
python trt_eval.py
--onnx_model_file
=
output/ONNX/quant_model.onnx
\
--calibration_file
=
output/ONNX/calibration.cache
\
--image_file
=
../images/000000570688.jpg
\
--precision_mode
=
int8
```
然后使用
[
paddle_trt_infer.py
](
./paddle_trt_infer.py
)
进行部署:
-
速度测试
```
shell
python paddle_trt_infer.py
--model_path
=
output
--image_file
=
images/000000570688.jpg
--benchmark
=
True
--run_mode
=
trt_
int8
trtexec
--onnx
=
output/ONNX/quant_model.onnx
--avgRuns
=
1000
--workspace
=
1024
--calib
=
output/ONNX/calibration.cache
--
int8
```
## 5.FAQ
...
...
example/auto_compression/pytorch_yolo_series/paddle_inference_eval.py
0 → 100644
浏览文件 @
456b9fed
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
time
import
os
import
sys
import
argparse
import
cv2
import
numpy
as
np
from
tqdm
import
tqdm
import
pkg_resources
as
pkg
import
paddle
from
paddle.inference
import
Config
from
paddle.inference
import
create_predictor
from
dataset
import
COCOValDataset
from
post_process
import
YOLOPostProcess
,
coco_metric
def
argsparser
():
"""
argsparser func
"""
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
.
add_argument
(
"--model_path"
,
type
=
str
,
help
=
"inference model filepath"
)
parser
.
add_argument
(
"--image_file"
,
type
=
str
,
default
=
None
,
help
=
"image path, if set image_file, it will not eval coco."
)
parser
.
add_argument
(
"--dataset_dir"
,
type
=
str
,
default
=
"dataset/coco"
,
help
=
"COCO dataset dir."
)
parser
.
add_argument
(
"--val_image_dir"
,
type
=
str
,
default
=
"val2017"
,
help
=
"COCO dataset val image dir."
)
parser
.
add_argument
(
"--val_anno_path"
,
type
=
str
,
default
=
"annotations/instances_val2017.json"
,
help
=
"COCO dataset anno path."
)
parser
.
add_argument
(
"--benchmark"
,
type
=
bool
,
default
=
False
,
help
=
"Whether run benchmark or not."
)
parser
.
add_argument
(
"--use_dynamic_shape"
,
type
=
bool
,
default
=
True
,
help
=
"Whether use dynamic shape or not."
)
parser
.
add_argument
(
"--use_trt"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use TensorRT or not."
)
parser
.
add_argument
(
"--precision"
,
type
=
str
,
default
=
"paddle"
,
help
=
"mode of running(fp32/fp16/int8)"
)
parser
.
add_argument
(
"--device"
,
type
=
str
,
default
=
"GPU"
,
help
=
"Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
,
)
parser
.
add_argument
(
"--arch"
,
type
=
str
,
default
=
"YOLOv5"
,
help
=
"architectures name."
)
parser
.
add_argument
(
"--img_shape"
,
type
=
int
,
default
=
640
,
help
=
"input_size"
)
parser
.
add_argument
(
"--batch_size"
,
type
=
int
,
default
=
1
,
help
=
"Batch size of model input."
)
parser
.
add_argument
(
"--use_mkldnn"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use mkldnn or not."
)
parser
.
add_argument
(
"--cpu_threads"
,
type
=
int
,
default
=
1
,
help
=
"Num of cpu threads."
)
return
parser
CLASS_LABEL
=
[
'person'
,
'bicycle'
,
'car'
,
'motorcycle'
,
'airplane'
,
'bus'
,
'train'
,
'truck'
,
'boat'
,
'traffic light'
,
'fire hydrant'
,
'stop sign'
,
'parking meter'
,
'bench'
,
'bird'
,
'cat'
,
'dog'
,
'horse'
,
'sheep'
,
'cow'
,
'elephant'
,
'bear'
,
'zebra'
,
'giraffe'
,
'backpack'
,
'umbrella'
,
'handbag'
,
'tie'
,
'suitcase'
,
'frisbee'
,
'skis'
,
'snowboard'
,
'sports ball'
,
'kite'
,
'baseball bat'
,
'baseball glove'
,
'skateboard'
,
'surfboard'
,
'tennis racket'
,
'bottle'
,
'wine glass'
,
'cup'
,
'fork'
,
'knife'
,
'spoon'
,
'bowl'
,
'banana'
,
'apple'
,
'sandwich'
,
'orange'
,
'broccoli'
,
'carrot'
,
'hot dog'
,
'pizza'
,
'donut'
,
'cake'
,
'chair'
,
'couch'
,
'potted plant'
,
'bed'
,
'dining table'
,
'toilet'
,
'tv'
,
'laptop'
,
'mouse'
,
'remote'
,
'keyboard'
,
'cell phone'
,
'microwave'
,
'oven'
,
'toaster'
,
'sink'
,
'refrigerator'
,
'book'
,
'clock'
,
'vase'
,
'scissors'
,
'teddy bear'
,
'hair drier'
,
'toothbrush'
]
def
preprocess
(
image
,
input_size
,
mean
=
None
,
std
=
None
,
swap
=
(
2
,
0
,
1
)):
"""
image preprocess func
"""
if
len
(
image
.
shape
)
==
3
:
padded_img
=
np
.
ones
((
input_size
[
0
],
input_size
[
1
],
3
))
*
114.0
else
:
padded_img
=
np
.
ones
(
input_size
)
*
114.0
img
=
np
.
array
(
image
)
r
=
min
(
input_size
[
0
]
/
img
.
shape
[
0
],
input_size
[
1
]
/
img
.
shape
[
1
])
resized_img
=
cv2
.
resize
(
img
,
(
int
(
img
.
shape
[
1
]
*
r
),
int
(
img
.
shape
[
0
]
*
r
)),
interpolation
=
cv2
.
INTER_LINEAR
,
).
astype
(
np
.
float32
)
padded_img
[:
int
(
img
.
shape
[
0
]
*
r
),
:
int
(
img
.
shape
[
1
]
*
r
)]
=
resized_img
padded_img
=
padded_img
[:,
:,
::
-
1
]
padded_img
/=
255.0
if
mean
is
not
None
:
padded_img
-=
mean
if
std
is
not
None
:
padded_img
/=
std
padded_img
=
padded_img
.
transpose
(
swap
)
padded_img
=
np
.
ascontiguousarray
(
padded_img
,
dtype
=
np
.
float32
)
return
padded_img
,
r
def
get_color_map_list
(
num_classes
):
"""
get_color_map_list func
"""
color_map
=
num_classes
*
[
0
,
0
,
0
]
for
i
in
range
(
0
,
num_classes
):
j
=
0
lab
=
i
while
lab
:
color_map
[
i
*
3
]
|=
((
lab
>>
0
)
&
1
)
<<
(
7
-
j
)
color_map
[
i
*
3
+
1
]
|=
((
lab
>>
1
)
&
1
)
<<
(
7
-
j
)
color_map
[
i
*
3
+
2
]
|=
((
lab
>>
2
)
&
1
)
<<
(
7
-
j
)
j
+=
1
lab
>>=
3
color_map
=
[
color_map
[
i
:
i
+
3
]
for
i
in
range
(
0
,
len
(
color_map
),
3
)]
return
color_map
def
draw_box
(
img
,
boxes
,
scores
,
cls_ids
,
conf
=
0.5
,
class_names
=
None
):
"""
draw_box func
"""
color_list
=
get_color_map_list
(
len
(
class_names
))
for
i
,
_
in
enumerate
(
boxes
):
box
=
boxes
[
i
]
cls_id
=
int
(
cls_ids
[
i
])
color
=
tuple
(
color_list
[
cls_id
])
score
=
scores
[
i
]
if
score
<
conf
:
continue
x0
=
int
(
box
[
0
])
y0
=
int
(
box
[
1
])
x1
=
int
(
box
[
2
])
y1
=
int
(
box
[
3
])
text
=
"{}:{:.1f}%"
.
format
(
class_names
[
cls_id
],
score
*
100
)
font
=
cv2
.
FONT_HERSHEY_SIMPLEX
txt_size
=
cv2
.
getTextSize
(
text
,
font
,
0.4
,
1
)[
0
]
cv2
.
rectangle
(
img
,
(
x0
,
y0
),
(
x1
,
y1
),
color
,
2
)
cv2
.
rectangle
(
img
,
(
x0
,
y0
+
1
),
(
x0
+
txt_size
[
0
]
+
1
,
y0
+
int
(
1.5
*
txt_size
[
1
])),
color
,
-
1
)
cv2
.
putText
(
img
,
text
,
(
x0
,
y0
+
txt_size
[
1
]),
font
,
0.8
,
(
0
,
255
,
0
),
thickness
=
2
)
return
img
def
get_current_memory_mb
():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
import
pynvml
import
psutil
import
GPUtil
gpu_id
=
int
(
os
.
environ
.
get
(
"CUDA_VISIBLE_DEVICES"
,
0
))
pid
=
os
.
getpid
()
p
=
psutil
.
Process
(
pid
)
info
=
p
.
memory_full_info
()
cpu_mem
=
info
.
uss
/
1024.0
/
1024.0
gpu_mem
=
0
gpu_percent
=
0
gpus
=
GPUtil
.
getGPUs
()
if
gpu_id
is
not
None
and
len
(
gpus
)
>
0
:
gpu_percent
=
gpus
[
gpu_id
].
load
pynvml
.
nvmlInit
()
handle
=
pynvml
.
nvmlDeviceGetHandleByIndex
(
0
)
meminfo
=
pynvml
.
nvmlDeviceGetMemoryInfo
(
handle
)
gpu_mem
=
meminfo
.
used
/
1024.0
/
1024.0
return
round
(
cpu_mem
,
4
),
round
(
gpu_mem
,
4
)
def
load_predictor
(
model_dir
,
precision
=
"fp32"
,
use_trt
=
False
,
use_mkldnn
=
False
,
batch_size
=
1
,
device
=
"CPU"
,
min_subgraph_size
=
3
,
use_dynamic_shape
=
False
,
trt_min_shape
=
1
,
trt_max_shape
=
1280
,
trt_opt_shape
=
640
,
cpu_threads
=
1
,
):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
precision (str): mode of running(fp32/fp16/int8)
use_trt (bool): whether use TensorRT or not.
use_mkldnn (bool): whether use MKLDNN or not in CPU.
device (str): Choose the device you want to run, it can be: CPU/GPU, default is CPU
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
rerun_flag
=
False
if
device
!=
"GPU"
and
use_trt
:
raise
ValueError
(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.
format
(
precision
,
device
))
config
=
Config
(
os
.
path
.
join
(
model_dir
,
"model.pdmodel"
),
os
.
path
.
join
(
model_dir
,
"model.pdiparams"
))
if
device
==
"GPU"
:
# initial GPU memory(M), device ID
config
.
enable_use_gpu
(
200
,
0
)
# optimize graph and fuse op
config
.
switch_ir_optim
(
True
)
else
:
config
.
disable_gpu
()
config
.
set_cpu_math_library_num_threads
(
cpu_threads
)
config
.
switch_ir_optim
()
if
use_mkldnn
:
config
.
enable_mkldnn
()
if
precision
==
"int8"
:
config
.
enable_mkldnn_int8
({
"conv2d"
,
"transpose2"
,
"pool2d"
})
precision_map
=
{
"int8"
:
Config
.
Precision
.
Int8
,
"fp32"
:
Config
.
Precision
.
Float32
,
"fp16"
:
Config
.
Precision
.
Half
,
}
if
precision
in
precision_map
.
keys
()
and
use_trt
:
config
.
enable_tensorrt_engine
(
workspace_size
=
(
1
<<
25
)
*
batch_size
,
max_batch_size
=
batch_size
,
min_subgraph_size
=
min_subgraph_size
,
precision_mode
=
precision_map
[
precision
],
use_static
=
True
,
use_calib_mode
=
False
,
)
if
use_dynamic_shape
:
dynamic_shape_file
=
os
.
path
.
join
(
FLAGS
.
model_path
,
"dynamic_shape.txt"
)
if
os
.
path
.
exists
(
dynamic_shape_file
):
config
.
enable_tuned_tensorrt_dynamic_shape
(
dynamic_shape_file
,
True
)
print
(
"trt set dynamic shape done!"
)
else
:
config
.
collect_shape_range_info
(
dynamic_shape_file
)
print
(
"Start collect dynamic shape..."
)
rerun_flag
=
True
# enable shared memory
config
.
enable_memory_optim
()
predictor
=
create_predictor
(
config
)
return
predictor
,
rerun_flag
def
eval
(
predictor
,
val_loader
,
anno_file
,
rerun_flag
=
False
):
"""
eval main func
"""
bboxes_list
,
bbox_nums_list
,
image_id_list
=
[],
[],
[]
cpu_mems
,
gpu_mems
=
0
,
0
sample_nums
=
len
(
val_loader
)
predict_time
=
0.0
time_min
=
float
(
"inf"
)
time_max
=
float
(
"-inf"
)
input_names
=
predictor
.
get_input_names
()
output_names
=
predictor
.
get_output_names
()
boxes_tensor
=
predictor
.
get_output_handle
(
output_names
[
0
])
for
batch_id
,
data
in
enumerate
(
val_loader
):
data_all
=
{
k
:
np
.
array
(
v
)
for
k
,
v
in
data
.
items
()}
inputs
=
{}
if
FLAGS
.
arch
==
"YOLOv6"
:
inputs
[
"x2paddle_image_arrays"
]
=
data_all
[
"image"
]
else
:
inputs
[
"x2paddle_images"
]
=
data_all
[
"image"
]
for
i
,
_
in
enumerate
(
input_names
):
input_tensor
=
predictor
.
get_input_handle
(
input_names
[
i
])
input_tensor
.
copy_from_cpu
(
inputs
[
input_names
[
i
]])
start_time
=
time
.
time
()
predictor
.
run
()
outs
=
boxes_tensor
.
copy_to_cpu
()
end_time
=
time
.
time
()
timed
=
end_time
-
start_time
time_min
=
min
(
time_min
,
timed
)
time_max
=
max
(
time_max
,
timed
)
predict_time
+=
timed
if
rerun_flag
:
return
postprocess
=
YOLOPostProcess
(
score_threshold
=
0.001
,
nms_threshold
=
0.65
,
multi_label
=
True
)
res
=
postprocess
(
np
.
array
(
outs
),
data_all
[
"scale_factor"
])
bboxes_list
.
append
(
res
[
"bbox"
])
bbox_nums_list
.
append
(
res
[
"bbox_num"
])
image_id_list
.
append
(
np
.
array
(
data_all
[
"im_id"
]))
cpu_mem
,
gpu_mem
=
get_current_memory_mb
()
cpu_mems
+=
cpu_mem
gpu_mems
+=
gpu_mem
if
batch_id
%
100
==
0
:
print
(
"Eval iter:"
,
batch_id
)
sys
.
stdout
.
flush
()
print
(
"[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB"
.
format
(
cpu_mems
/
sample_nums
,
gpu_mems
/
sample_nums
))
time_avg
=
predict_time
/
sample_nums
print
(
"[Benchmark]Inference time(ms): min={}, max={}, avg={}"
.
format
(
round
(
time_min
*
1000
,
2
),
round
(
time_max
*
1000
,
1
),
round
(
time_avg
*
1000
,
1
)))
map_res
=
coco_metric
(
anno_file
,
bboxes_list
,
bbox_nums_list
,
image_id_list
)
print
(
"[Benchmark] COCO mAP: {}"
.
format
(
map_res
[
0
]))
sys
.
stdout
.
flush
()
def
infer
(
predictor
):
"""
infer image main func
"""
warmup
,
repeats
=
1
,
1
if
FLAGS
.
benchmark
:
warmup
,
repeats
=
50
,
100
origin_img
=
cv2
.
imread
(
FLAGS
.
image_file
)
input_image
,
scale_factor
=
preprocess
(
origin_img
,
[
FLAGS
.
img_shape
,
FLAGS
.
img_shape
])
input_image
=
np
.
expand_dims
(
input_image
,
axis
=
0
)
scale_factor
=
np
.
array
([[
scale_factor
,
scale_factor
]])
inputs
=
{}
if
FLAGS
.
arch
==
"YOLOv6"
:
inputs
[
"x2paddle_image_arrays"
]
=
input_image
else
:
inputs
[
"x2paddle_images"
]
=
input_image
input_names
=
predictor
.
get_input_names
()
for
i
,
_
in
enumerate
(
input_names
):
input_tensor
=
predictor
.
get_input_handle
(
input_names
[
i
])
input_tensor
.
copy_from_cpu
(
inputs
[
input_names
[
i
]])
for
i
in
range
(
warmup
):
predictor
.
run
()
np_boxes
=
None
predict_time
=
0.0
time_min
=
float
(
"inf"
)
time_max
=
float
(
"-inf"
)
cpu_mems
,
gpu_mems
=
0
,
0
for
i
in
range
(
repeats
):
start_time
=
time
.
time
()
predictor
.
run
()
output_names
=
predictor
.
get_output_names
()
boxes_tensor
=
predictor
.
get_output_handle
(
output_names
[
0
])
np_boxes
=
boxes_tensor
.
copy_to_cpu
()
end_time
=
time
.
time
()
timed
=
end_time
-
start_time
time_min
=
min
(
time_min
,
timed
)
time_max
=
max
(
time_max
,
timed
)
predict_time
+=
timed
cpu_mem
,
gpu_mem
=
get_current_memory_mb
()
cpu_mems
+=
cpu_mem
gpu_mems
+=
gpu_mem
print
(
"[Benchmark]Avg cpu_mem:{} MB, avg gpu_mem: {} MB"
.
format
(
cpu_mems
/
repeats
,
gpu_mems
/
repeats
))
time_avg
=
predict_time
/
repeats
print
(
"[Benchmark]Inference time(ms): min={}, max={}, avg={}"
.
format
(
round
(
time_min
*
1000
,
2
),
round
(
time_max
*
1000
,
1
),
round
(
time_avg
*
1000
,
1
)))
postprocess
=
YOLOPostProcess
(
score_threshold
=
0.001
,
nms_threshold
=
0.65
,
multi_label
=
True
)
res
=
postprocess
(
np_boxes
,
scale_factor
)
# Draw rectangles and labels on the original image
dets
=
res
[
"bbox"
]
if
dets
is
not
None
:
final_boxes
,
final_scores
,
final_class
=
dets
[:,
2
:],
dets
[:,
1
],
dets
[:,
0
]
res_img
=
draw_box
(
origin_img
,
final_boxes
,
final_scores
,
final_class
,
conf
=
0.5
,
class_names
=
CLASS_LABEL
)
cv2
.
imwrite
(
"output.jpg"
,
res_img
)
print
(
"The prediction results are saved in output.jpg."
)
def
main
():
"""
main func
"""
predictor
,
rerun_flag
=
load_predictor
(
FLAGS
.
model_path
,
device
=
FLAGS
.
device
,
use_trt
=
FLAGS
.
use_trt
,
use_mkldnn
=
FLAGS
.
use_mkldnn
,
precision
=
FLAGS
.
precision
,
use_dynamic_shape
=
FLAGS
.
use_dynamic_shape
,
cpu_threads
=
FLAGS
.
cpu_threads
,
)
if
FLAGS
.
image_file
:
infer
(
predictor
)
else
:
dataset
=
COCOValDataset
(
dataset_dir
=
FLAGS
.
dataset_dir
,
image_dir
=
FLAGS
.
val_image_dir
,
anno_path
=
FLAGS
.
val_anno_path
)
anno_file
=
dataset
.
ann_file
val_loader
=
paddle
.
io
.
DataLoader
(
dataset
,
batch_size
=
FLAGS
.
batch_size
,
drop_last
=
True
)
eval
(
predictor
,
val_loader
,
anno_file
,
rerun_flag
=
rerun_flag
)
if
rerun_flag
:
print
(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
if
__name__
==
"__main__"
:
paddle
.
enable_static
()
parser
=
argsparser
()
FLAGS
=
parser
.
parse_args
()
# DataLoader need run on cpu
paddle
.
set_device
(
"cpu"
)
main
()
example/auto_compression/semantic_segmentation/README.md
浏览文件 @
456b9fed
...
...
@@ -8,8 +8,7 @@
-
[
3.2 准备数据集
](
#32-准备数据集
)
-
[
3.3 准备预测模型
](
#33-准备预测模型
)
-
[
3.4 自动压缩并产出模型
](
#34-自动压缩并产出模型
)
-
[
4.评估精度
](
#4评估精度
)
-
[
5.预测部署
](
#5预测部署
)
-
[
4.预测部署
](
#4预测部署
)
-
[
5.FAQ
](
5FAQ
)
## 1.简介
...
...
@@ -156,104 +155,68 @@ python -m paddle.distributed.launch run.py --config_path='./configs/pp_humanseg/
压缩完成后会在
`save_dir`
中产出压缩好的预测模型,可直接预测部署。
## 4.
评估精度
## 4.
预测部署
本小节以人像分割模型和小数据集为例, 介绍如何在测试集上评估压缩后的模型.
#### 4.1 Paddle Inference 验证性能
下载经过量化训练压缩后的推理模型:
```
wget https://bj.bcebos.com/v1/paddle-slim-models/act/PaddleSeg/qat/pp_humanseg_qat.zip
unzip pp_humanseg_qat.zip
```
量化模型在GPU上可以使用TensorRT进行加速,在CPU上可以使用MKLDNN进行加速。
通过以下命令下载人像分割示例数据:
以下字段用于配置预测参数:
```
shell
cd
./data
python download_data.py mini_humanseg
cd
-
```
| 参数名 | 含义 |
|:------:|:------:|
| model_path | inference 模型文件所在目录,该目录下需要有文件 .pdmodel 和 .pdiparams 两个文件 |
| model_filename | inference_model_dir文件夹下的模型文件名称 |
| params_filename | inference_model_dir文件夹下的参数文件名称 |
| dataset | 选择数据集的类型,可选:
`human`
,
`cityscape`
。 |
| dataset_config | 数据集配置的config |
| image_file | 待测试单张图片的路径,如果设置image_file,则dataset_config将无效。 |
| device | 预测时的设备,可选:
`CPU`
,
`GPU`
。 |
| use_trt | 是否使用 TesorRT 预测引擎,在device为
```GPU```
时生效。 |
| use_mkldnn | 是否启用
```MKL-DNN```
加速库,注意
```use_mkldnn```
,在device为
```CPU```
时生效。 |
| cpu_threads | CPU预测时,使用CPU线程数量,默认10 |
| precision | 预测时精度,可选:
`fp32`
,
`fp16`
,
`int8`
。 |
执行以下命令评估模型在测试集上的精度:
```
python eval.py \
--model_dir ./pp_humanseg_qat \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--dataset_config configs/dataset/humanseg_dataset.yaml
```
## 5.预测部署
-
TensorRT预测:
本小节以人像分割为例, 介绍如何使用Paddle Inference推理库执行压缩后的模型.
环境配置:如果使用 TesorRT 预测引擎,需安装
```WITH_TRT=ON```
的Paddle,下载地址:
[
Python预测库
](
https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python
)
### 5.1 安装推理库
准备好预测模型,并且修改dataset_config中数据集路径为正确的路径后,启动测试:
请参考该链接安装Python版本的PaddleInference推理库:
[
推理库安装教程
](
https://www.paddlepaddle.org.cn/inference/user_guides/download_lib.html#python
)
### 5.2 准备模型和数据
从
[
2.Benchmark
](
#2Benchmark
)
的表格中获得压缩前后的推理模型的下载链接,执行以下命令下载并解压推理模型:
下载Float32数值类型的模型:
```
wget https://paddleseg.bj.bcebos.com/dygraph/ppseg/ppseg_lite_portrait_398x224_with_softmax.tar.gz
tar -xzf ppseg_lite_portrait_398x224_with_softmax.tar.gz
mv ppseg_lite_portrait_398x224_with_softmax pp_humanseg_fp32
```
下载经过量化训练压缩后的推理模型:
```
wget https://bj.bcebos.com/v1/paddle-slim-models/act/PaddleSeg/qat/pp_humanseg_qat.zip
unzip pp_humanseg_qat.zip
```
shell
python paddle_inference_eval.py
\
--model_path
=
pp_liteseg_qat
\
--dataset
=
'cityscape'
\
--dataset_config
=
configs/dataset/cityscapes_1024x512_scale1.0.yml
\
--use_trt
=
True
\
--precision
=
int8
```
准备好需要处理的图片,这里直接使用人像示例图片
`./data/human_demo.jpg`
。
### 5.3 执行推理
执行以下命令,直接使用飞桨框架的原生推理(仅支持Float32, 无需依赖TensorRT):
-
MKLDNN预测:
```
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--image_file "./data/human_demo.jpg" \
--model_path "./pp_humanseg_fp32/model.pdmodel" \
--params_path "./pp_humanseg_fp32/model.pdiparams" \
--save_file "./humanseg_result_fp32.png" \
--dataset "human" \
--benchmark True \
--precision "fp32"
```
shell
python paddle_inference_eval.py
\
--model_path
=
pp_liteseg_qat
\
--dataset
=
'cityscape'
\
--dataset_config
=
configs/dataset/cityscapes_1024x512_scale1.0.yml
\
--device
=
CPU
\
--use_mkldnn
=
True
\
--precision
=
int8
\
--cpu_threads
=
10
```
执行以下命令,使用Int8推理:
#### 4.2 Paddle Inference 测试单张图片
```
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--image_file "./data/human_demo.jpg" \
--model_path "./pp_humanseg_qat/model.pdmodel" \
--params_path "./pp_humanseg_qat/model.pdiparams" \
--save_file "./humanseg_result_qat.png" \
--dataset "human" \
--benchmark True \
--use_trt True \
--precision "int8"
```
利用人像分割测试单张图片:
执行以下命令,使用Paddle Inference在相应数据集上测试精度:
```
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--model_path "./pp_humanseg_qat/model.pdmodel" \
--params_path "./pp_humanseg_qat/model.pdiparams" \
--dataset_config configs/dataset/humanseg_dataset.yaml \
--use_trt True \
--precision "int8"
```
shell
python paddle_inference_eval.py
\
--model_path
=
pp_humanseg_qat
\
--dataset
=
'human'
\
--image_file
=
./data/human_demo.jpg
\
--use_trt
=
True
\
--precision
=
int8
```
<table><tbody>
...
...
@@ -287,17 +250,11 @@ Int8推理结果
</tbody></table>
执行以下命令查看更多关于
`infer.py`
使用说明:
```
python infer.py --help
```
###
5.4
更多部署教程
###
4.3
更多部署教程
-
[
Paddle Inference Python部署
](
https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/python_inference.md
)
-
[
Paddle Inference C++部署
](
https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/inference/cpp_inference.md
)
-
[
Paddle Lite部署
](
https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/deployment/lite/lite.md
)
##
6
.FAQ
##
5
.FAQ
example/auto_compression/semantic_segmentation/
infer
.py
→
example/auto_compression/semantic_segmentation/
paddle_inference_eval
.py
浏览文件 @
456b9fed
...
...
@@ -12,11 +12,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import
cv2
import
numpy
as
np
import
argparse
import
time
from
tqdm
import
tqdm
import
os
import
sys
import
cv2
import
numpy
as
np
import
paddle
import
paddleseg.transforms
as
T
from
paddleseg.cvlibs
import
Config
as
PaddleSegDataConfig
...
...
@@ -38,31 +39,35 @@ def _transforms(dataset):
return
transforms
def
auto_tune_trt
(
args
,
data
):
auto_tuned_shape_file
=
"./auto_tuning_shape"
pred_cfg
=
PredictConfig
(
args
.
model_path
,
args
.
params_path
)
pred_cfg
.
enable_use_gpu
(
100
,
0
)
pred_cfg
.
collect_shape_range_info
(
"./auto_tuning_shape"
)
predictor
=
create_predictor
(
pred_cfg
)
input_names
=
predictor
.
get_input_names
()
input_handle
=
predictor
.
get_input_handle
(
input_names
[
0
])
input_handle
.
reshape
(
data
.
shape
)
input_handle
.
copy_from_cpu
(
data
)
predictor
.
run
()
return
auto_tuned_shape_file
def
load_predictor
(
args
,
data
):
pred_cfg
=
PredictConfig
(
args
.
model_path
,
args
.
params_path
)
pred_cfg
.
disable_glog_info
()
def
load_predictor
(
args
):
"""
load predictor func
"""
rerun_flag
=
False
model_file
=
os
.
path
.
join
(
args
.
model_path
,
args
.
model_filename
)
params_file
=
os
.
path
.
join
(
args
.
model_path
,
args
.
params_filename
)
pred_cfg
=
PredictConfig
(
model_file
,
params_file
)
pred_cfg
.
enable_memory_optim
()
pred_cfg
.
switch_ir_optim
(
True
)
if
args
.
device
==
"GPU"
:
pred_cfg
.
enable_use_gpu
(
100
,
0
)
else
:
pred_cfg
.
disable_gpu
()
pred_cfg
.
set_cpu_math_library_num_threads
(
args
.
cpu_threads
)
if
args
.
use_mkldnn
:
pred_cfg
.
enable_mkldnn
()
if
args
.
precision
==
"int8"
:
pred_cfg
.
enable_mkldnn_int8
({
"conv2d"
,
"depthwise_conv2d"
,
"pool2d"
,
"elementwise_mul"
})
if
args
.
use_trt
:
# To collect the dynamic shapes of inputs for TensorRT engine
auto_tuned_shape_file
=
auto_tune_trt
(
args
,
data
)
dynamic_shape_file
=
os
.
path
.
join
(
args
.
model_path
,
"dynamic_shape.txt"
)
if
os
.
path
.
exists
(
dynamic_shape_file
):
pred_cfg
.
enable_tuned_tensorrt_dynamic_shape
(
dynamic_shape_file
,
True
)
print
(
"trt set dynamic shape done!"
)
precision_map
=
{
"fp16"
:
PrecisionType
.
Half
,
"fp32"
:
PrecisionType
.
Float32
,
...
...
@@ -73,27 +78,33 @@ def load_predictor(args, data):
max_batch_size
=
1
,
min_subgraph_size
=
4
,
precision_mode
=
precision_map
[
args
.
precision
],
use_static
=
False
,
use_calib_mode
=
False
)
allow_build_at_runtime
=
True
pred_cfg
.
enable_tuned_tensorrt_dynamic_shape
(
auto_tuned_shape_file
,
allow_build_at_runtime
)
use_static
=
True
,
use_calib_mode
=
False
,
)
else
:
pred_cfg
.
disable_gpu
()
pred_cfg
.
set_cpu_math_library_num_threads
(
10
)
pred_cfg
.
collect_shape_range_info
(
dynamic_shape_file
)
print
(
"Start collect dynamic shape..."
)
rerun_flag
=
True
predictor
=
create_predictor
(
pred_cfg
)
return
predictor
return
predictor
,
rerun_flag
def
predict_image
(
args
):
"""
predict image func
"""
transforms
=
_transforms
(
args
.
dataset
)
transform
=
T
.
Compose
(
transforms
)
# Step1: Load image and preprocess
im
=
cv2
.
imread
(
args
.
image_file
).
astype
(
'float32'
)
im
=
cv2
.
imread
(
args
.
image_file
).
astype
(
"float32"
)
data
,
_
=
transform
(
im
)
data
=
np
.
array
(
data
)[
np
.
newaxis
,
:]
# Step2: Prepare prdictor
predictor
=
load_predictor
(
args
,
data
)
predictor
,
rerun_flag
=
load_predictor
(
args
)
# Step3: Inference
input_names
=
predictor
.
get_input_names
()
...
...
@@ -114,14 +125,21 @@ def predict_image(args):
for
i
in
range
(
repeats
):
predictor
.
run
()
results
=
output_handle
.
copy_to_cpu
()
if
rerun_flag
:
print
(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
return
total_time
=
time
.
time
()
-
start_time
avg_time
=
float
(
total_time
)
/
repeats
print
(
f
"Average inference time:
\033
[91m
{
round
(
avg_time
*
1000
,
2
)
}
ms
\033
[0m"
)
print
(
f
"[Benchmark]Average inference time:
\033
[91m
{
round
(
avg_time
*
1000
,
2
)
}
ms
\033
[0m"
)
# Step4: Post process
if
args
.
dataset
==
"human"
:
results
=
reverse_transform
(
paddle
.
to_tensor
(
results
),
im
.
shape
,
transforms
,
mode
=
'bilinear'
)
paddle
.
to_tensor
(
results
),
im
.
shape
,
transforms
,
mode
=
"bilinear"
)
results
=
np
.
argmax
(
results
,
axis
=
1
)
result
=
get_pseudo_color_map
(
results
[
0
])
...
...
@@ -132,8 +150,11 @@ def predict_image(args):
def
eval
(
args
):
"""
eval mIoU func
"""
# DataLoader need run on cpu
paddle
.
set_device
(
'cpu'
)
paddle
.
set_device
(
"cpu"
)
data_cfg
=
PaddleSegDataConfig
(
args
.
dataset_config
)
eval_dataset
=
data_cfg
.
val_dataset
...
...
@@ -142,48 +163,56 @@ def eval(args):
loader
=
paddle
.
io
.
DataLoader
(
eval_dataset
,
batch_sampler
=
batch_sampler
,
num_workers
=
1
,
num_workers
=
0
,
return_list
=
True
)
total_iters
=
len
(
loader
)
predictor
,
rerun_flag
=
load_predictor
(
args
)
intersect_area_all
=
0
pred_area_all
=
0
label_area_all
=
0
print
(
"Start evaluating (total_samples: {}, total_iters: {})..."
.
format
(
len
(
eval_dataset
),
total_iters
))
init_predictor
=
False
for
(
image
,
label
)
in
tqdm
(
loader
):
label
=
np
.
array
(
label
).
astype
(
'int64'
)
ori_shape
=
np
.
array
(
label
).
shape
[
-
2
:]
data
=
np
.
array
(
image
)
if
not
init_predictor
:
predictor
=
load_predictor
(
args
,
data
)
init_predictor
=
True
input_names
=
predictor
.
get_input_names
()
input_handle
=
predictor
.
get_input_handle
(
input_names
[
0
])
input_handle
.
reshape
(
data
.
shape
)
input_handle
.
copy_from_cpu
(
data
)
predictor
.
run
()
output_names
=
predictor
.
get_output_names
()
output_handle
=
predictor
.
get_output_handle
(
output_names
[
0
])
total_samples
=
len
(
eval_dataset
)
sample_nums
=
len
(
loader
)
batch_size
=
int
(
total_samples
/
sample_nums
)
predict_time
=
0.0
time_min
=
float
(
"inf"
)
time_max
=
float
(
"-inf"
)
print
(
"Start evaluating (total_samples: {}, total_iters: {})."
.
format
(
total_samples
,
sample_nums
))
for
batch_id
,
data
in
enumerate
(
loader
):
image
=
np
.
array
(
data
[
0
])
label
=
np
.
array
(
data
[
1
]).
astype
(
"int64"
)
ori_shape
=
np
.
array
(
label
).
shape
[
-
2
:]
input_handle
.
reshape
(
image
.
shape
)
input_handle
.
copy_from_cpu
(
image
)
start_time
=
time
.
time
()
predictor
.
run
()
results
=
output_handle
.
copy_to_cpu
()
end_time
=
time
.
time
()
timed
=
end_time
-
start_time
time_min
=
min
(
time_min
,
timed
)
time_max
=
max
(
time_max
,
timed
)
predict_time
+=
timed
if
rerun_flag
:
print
(
"***** Collect dynamic shape done, Please rerun the program to get correct results. *****"
)
return
logit
=
reverse_transform
(
paddle
.
to_tensor
(
results
),
ori_shape
,
eval_dataset
.
transforms
.
transforms
,
mode
=
'bilinear'
)
mode
=
"bilinear"
)
pred
=
paddle
.
to_tensor
(
logit
)
if
len
(
pred
.
shape
)
==
4
:
# for humanseg model whose prediction is distribution but not class id
pred
=
paddle
.
argmax
(
pred
,
axis
=
1
,
keepdim
=
True
,
dtype
=
'int32'
)
pred
=
paddle
.
argmax
(
pred
,
axis
=
1
,
keepdim
=
True
,
dtype
=
"int32"
)
intersect_area
,
pred_area
,
label_area
=
metrics
.
calculate_area
(
pred
,
...
...
@@ -193,71 +222,95 @@ def eval(args):
intersect_area_all
=
intersect_area_all
+
intersect_area
pred_area_all
=
pred_area_all
+
pred_area
label_area_all
=
label_area_all
+
label_area
if
batch_id
%
100
==
0
:
print
(
"Eval iter:"
,
batch_id
)
sys
.
stdout
.
flush
()
class_iou
,
miou
=
metrics
.
mean_iou
(
intersect_area_all
,
pred_area_all
,
_
,
miou
=
metrics
.
mean_iou
(
intersect_area_all
,
pred_area_all
,
label_area_all
)
class_acc
,
acc
=
metrics
.
accuracy
(
intersect_area_all
,
pred_area_all
)
_
,
acc
=
metrics
.
accuracy
(
intersect_area_all
,
pred_area_all
)
kappa
=
metrics
.
kappa
(
intersect_area_all
,
pred_area_all
,
label_area_all
)
class_dice
,
mdice
=
metrics
.
dice
(
intersect_area_all
,
pred_area_all
,
label_area_all
)
infor
=
"[EVAL] #Images: {} mIoU: {:.4f} Acc: {:.4f} Kappa: {:.4f} Dice: {:.4f}"
.
format
(
len
(
eval_dataset
),
miou
,
acc
,
kappa
,
mdice
)
_
,
mdice
=
metrics
.
dice
(
intersect_area_all
,
pred_area_all
,
label_area_all
)
time_avg
=
predict_time
/
sample_nums
print
(
"[Benchmark]Batch size: {}, Inference time(ms): min={}, max={}, avg={}"
.
format
(
batch_size
,
round
(
time_min
*
1000
,
2
),
round
(
time_max
*
1000
,
1
),
round
(
time_avg
*
1000
,
1
)))
infor
=
"[Benchmark] #Images: {} mIoU: {:.4f} Acc: {:.4f} Kappa: {:.4f} Dice: {:.4f}"
.
format
(
total_samples
,
miou
,
acc
,
kappa
,
mdice
)
print
(
infor
)
sys
.
stdout
.
flush
()
if
__name__
==
'__main__'
:
if
__name__
==
"__main__"
:
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'--image_file'
,
"--model_path"
,
type
=
str
,
help
=
"inference model filepath"
)
parser
.
add_argument
(
"--model_filename"
,
type
=
str
,
default
=
"model.pdmodel"
,
help
=
"model file name"
)
parser
.
add_argument
(
"--params_filename"
,
type
=
str
,
default
=
"model.pdiparams"
,
help
=
"params file name"
)
parser
.
add_argument
(
"--image_file"
,
type
=
str
,
default
=
None
,
help
=
"Image path to be processed."
)
parser
.
add_argument
(
'--save_file'
,
"--save_file"
,
type
=
str
,
default
=
None
,
help
=
"The path to save the processed image."
)
parser
.
add_argument
(
'--model_path'
,
type
=
str
,
help
=
"Inference model filepath."
)
parser
.
add_argument
(
'--params_path'
,
type
=
str
,
help
=
"Inference parameters filepath."
)
parser
.
add_argument
(
'--dataset'
,
"--dataset"
,
type
=
str
,
default
=
"human"
,
choices
=
[
"human"
,
"cityscape"
],
help
=
"The type of given image which can be 'human' or 'cityscape'."
)
help
=
"The type of given image which can be 'human' or 'cityscape'."
,
)
parser
.
add_argument
(
'--dataset_config'
,
"--dataset_config"
,
type
=
str
,
default
=
None
,
help
=
"path of dataset config."
)
parser
.
add_argument
(
'--benchmark'
,
"--benchmark"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to run benchmark or not."
)
parser
.
add_argument
(
'--use_trt'
,
"--use_trt"
,
type
=
bool
,
default
=
False
,
help
=
"Whether to use tensorrt engine or not."
)
parser
.
add_argument
(
'--device'
,
"--device"
,
type
=
str
,
default
=
'GPU'
,
default
=
"GPU"
,
choices
=
[
"CPU"
,
"GPU"
],
help
=
"Choose the device you want to run, it can be: CPU/GPU, default is GPU"
help
=
"Choose the device you want to run, it can be: CPU/GPU, default is GPU"
,
)
parser
.
add_argument
(
'--precision'
,
"--precision"
,
type
=
str
,
default
=
'fp32'
,
default
=
"fp32"
,
choices
=
[
"fp32"
,
"fp16"
,
"int8"
],
help
=
"The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'."
help
=
"The precision of inference. It can be 'fp32', 'fp16' or 'int8'. Default is 'fp16'."
,
)
parser
.
add_argument
(
"--use_mkldnn"
,
type
=
bool
,
default
=
False
,
help
=
"Whether use mkldnn or not."
)
parser
.
add_argument
(
"--cpu_threads"
,
type
=
int
,
default
=
1
,
help
=
"Num of cpu threads."
)
args
=
parser
.
parse_args
()
if
args
.
image_file
:
predict_image
(
args
)
...
...
paddleslim/common/load_model.py
浏览文件 @
456b9fed
...
...
@@ -230,7 +230,6 @@ def export_onnx(model_dir,
opset_version
=
opset_version
,
enable_onnx_checker
=
True
,
deploy_backend
=
deploy_backend
,
scale_file
=
os
.
path
.
join
(
model_dir
,
'calibration_table.txt'
),
calibration_file
=
os
.
path
.
join
(
save_file_path
.
rstrip
(
os
.
path
.
split
(
save_file_path
)[
-
1
]),
'calibration.cache'
))
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录