Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleSlim
提交
05c2122e
P
PaddleSlim
项目概览
PaddlePaddle
/
PaddleSlim
大约 1 年 前同步成功
通知
51
Star
1434
Fork
344
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
53
列表
看板
标记
里程碑
合并请求
16
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleSlim
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
53
Issue
53
列表
看板
标记
里程碑
合并请求
16
合并请求
16
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
05c2122e
编写于
8月 16, 2022
作者:
G
Guanghua Yu
提交者:
GitHub
8月 16, 2022
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update yolov7 act demo (#1343)
上级
ebbc5431
变更
6
显示空白变更内容
内联
并排
Showing
6 changed file
with
451 addition
and
34 deletion
+451
-34
example/auto_compression/pytorch_yolov7/README.md
example/auto_compression/pytorch_yolov7/README.md
+37
-30
example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
...to_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
+1
-1
example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml
...mpression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml
+31
-0
example/auto_compression/pytorch_yolov7/eval.py
example/auto_compression/pytorch_yolov7/eval.py
+3
-3
example/auto_compression/pytorch_yolov7/onnx_trt_infer.py
example/auto_compression/pytorch_yolov7/onnx_trt_infer.py
+378
-0
example/auto_compression/pytorch_yolov7/post_quant.py
example/auto_compression/pytorch_yolov7/post_quant.py
+1
-0
未找到文件。
example/auto_compression/pytorch_yolov7/README.md
浏览文件 @
05c2122e
...
@@ -14,17 +14,19 @@
...
@@ -14,17 +14,19 @@
## 1. 简介
## 1. 简介
飞桨模型转换工具
[
X2Paddle
](
https://github.com/PaddlePaddle/X2Paddle
)
支持将
```Caffe/TensorFlow/ONNX/PyTorch```
的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
本示例将以
[
WongKinYiu/yolov7
](
https://github.com/WongKinYiu/yolov7
)
目标检测模型为例,借助
[
X2Paddle
](
https://github.com/PaddlePaddle/X2Paddle
)
的能力,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行模型压缩,压缩后的模型可使用Paddle Inference或者导出至ONNX,利用TensorRT部署。
本示例将以
[
WongKinYiu/yolov7
](
https://github.com/WongKinYiu/yolov7
)
目标检测模型为例,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
## 2.Benchmark
## 2.Benchmark
| 模型 | 策略 | 输入尺寸 | mAP
<sup>
val
<br>
0.5:0.95 | 预测时延
<sup><small>
FP32
</small><sup><br><sup>
(ms) |预测时延
<sup><small>
FP16
</small><sup><br><sup>
(ms) | 预测时延
<sup><small>
INT8
</small><sup><br><sup>
(ms) | 配置文件 | Inference模型 |
| 模型 | 策略 | 输入尺寸 | mAP
<sup>
val
<br>
0.5:0.95 | 模型体积 | 预测时延
<sup><small>
FP32
</small><sup><br><sup>
|预测时延
<sup><small>
FP16
</small><sup><br><sup>
| 预测时延
<sup><small>
INT8
</small><sup><br><sup>
| 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| YOLOv7 | Base模型 | 640
*
640 | 51.1 | 26.84ms | 7.44ms | - | - |
[
Model
](
https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx
)
|
| YOLOv7 | Base模型 | 640
*
640 | 51.1 | 141MB | 26.84ms | 7.44ms | - | - |
[
Model
](
https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx
)
|
| YOLOv7 | KL离线量化 | 640
*
640 | 50.2 | - | - | 4.55ms | - | - |
| YOLOv7 | 离线量化 | 640
*
640 | 50.2 | 36MB | - | - | 4.55ms | - | - |
| YOLOv7 | 量化蒸馏训练 | 640
*640 | **50.8** | - | - | **4.55ms*
*
|
[
config
](
./configs/yolov7_qat_dis.yaml
)
|
[
Infer Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar
)
|
[
ONNX Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx
)
|
| YOLOv7 | ACT量化训练 | 640
*640 | **50.9** | 36MB | - | - | **4.55ms*
*
|
[
config
](
./configs/yolov7_qat_dis.yaml
)
|
[
Infer Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar
)
|
[
ONNX Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx
)
|
| | | | | | | | | |
| YOLOv7-Tiny | Base模型 | 640
*
640 | 37.3 | 24MB | 5.06ms | 2.32ms | - | - |
[
Model
](
https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx
)
|
| YOLOv7-Tiny | 离线量化 | 640
*
640 | - | 6.1MB | - | - | 1.68ms | - | - |
| YOLOv7-Tiny | ACT量化训练 | 640
*640 | **37.0** | 6.1MB | - | - | **1.68ms*
*
|
[
config
](
./configs/yolov7_tiny_qat_dis.yaml
)
|
[
Infer Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.tar
)
|
[
ONNX Model
](
https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.onnx
)
|
说明:
说明:
-
mAP的指标均在COCO val2017数据集中评测得到。
-
mAP的指标均在COCO val2017数据集中评测得到。
...
@@ -33,10 +35,8 @@
...
@@ -33,10 +35,8 @@
## 3. 自动压缩流程
## 3. 自动压缩流程
#### 3.1 准备环境
#### 3.1 准备环境
-
PaddlePaddle >= 2.3 (可从
[
Paddle官网
](
https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html
)
下载安装)
-
PaddlePaddle develop每日版本 (可从
[
Paddle官网
](
https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html
)
下载安装)
-
PaddleSlim > 2.3版本
-
PaddleSlim develop 版本
-
PaddleDet >= 2.4
-
opencv-python
(1)安装paddlepaddle:
(1)安装paddlepaddle:
```
shell
```
shell
...
@@ -48,22 +48,32 @@ pip install paddlepaddle-gpu
...
@@ -48,22 +48,32 @@ pip install paddlepaddle-gpu
(2)安装paddleslim:
(2)安装paddleslim:
```
shell
```
shell
pip
install
paddleslim
git clone https://github.com/PaddlePaddle/PaddleSlim.git &
cd
PaddleSlim
```
python setup.py
install
(3)安装paddledet:
```
shell
pip
install
paddledet
```
```
注:安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
#### 3.2 准备数据集
#### 3.2 准备数据集
本案例默认以COCO数据进行自动压缩实验,并且依赖PaddleDetection中数据读取模块,如果自定义COCO数据,或者其他格式数据,请参考
[
PaddleDetection数据准备文档
](
https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md
)
来准备数据。
本示例默认以COCO数据进行自动压缩实验,可以从
[
MS COCO官网
](
https://cocodataset.org
)
下载
[
Train
](
http://images.cocodataset.org/zips/train2017.zip
)
、
[
Val
](
http://images.cocodataset.org/zips/val2017.zip
)
、
[
annotation
](
http://images.cocodataset.org/annotations/annotations_trainval2017.zip
)
。
目录格式如下:
```
dataset/coco/
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
```
如果
已经准备好数据集,请直接修改[./configs/yolov7_reader.yml]中
`EvalDataset`
的
`dataset_dir`
字段为自己数据集路径即可
。
如果
是自定义数据集,请按照如上COCO数据格式准备数据
。
#### 3.3 准备预测模型
#### 3.3 准备预测模型
...
@@ -73,13 +83,10 @@ pip install paddledet
...
@@ -73,13 +83,10 @@ pip install paddledet
可通过
[
WongKinYiu/yolov7
](
https://github.com/WongKinYiu/yolov7
)
的导出脚本来准备ONNX模型,具体步骤如下:
可通过
[
WongKinYiu/yolov7
](
https://github.com/WongKinYiu/yolov7
)
的导出脚本来准备ONNX模型,具体步骤如下:
```
shell
```
shell
git clone https://github.com/WongKinYiu/yolov7.git
git clone https://github.com/WongKinYiu/yolov7.git
# 切换分支到u5分支,保持导出的ONNX模型后处理和YOLOv5一致
python export.py
--weights
yolov7-tiny.pt
--grid
git checkout u5
# 下载好yolov7.pt权重后执行:
python export.py
--weights
yolov7.pt
--include
onnx
```
```
也可以直接下载我们已经准备好的
[
yolov7.onnx
](
https://paddle-slim-models.bj.bcebos.com/act/yolov7
.onnx
)
。
**注意**
:目前ACT支持不带NMS模型,使用如上命令导出即可。也可以直接下载我们已经准备好的
[
yolov7.onnx
](
https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny
.onnx
)
。
#### 3.4 自动压缩并产出模型
#### 3.4 自动压缩并产出模型
...
@@ -88,13 +95,13 @@ python export.py --weights yolov7.pt --include onnx
...
@@ -88,13 +95,13 @@ python export.py --weights yolov7.pt --include onnx
-
单卡训练:
-
单卡训练:
```
```
export CUDA_VISIBLE_DEVICES=0
export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
python run.py --config_path=./configs/yolov7_
tiny_
qat_dis.yaml --save_dir='./output/'
```
```
-
多卡训练:
-
多卡训练:
```
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
--config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
--config_path=./configs/yolov7_
tiny_
qat_dis.yaml --save_dir='./output/'
```
```
#### 3.5 测试模型精度
#### 3.5 测试模型精度
...
@@ -102,7 +109,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
...
@@ -102,7 +109,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
修改
[
yolov7_qat_dis.yaml
](
./configs/yolov7_qat_dis.yaml
)
中
`model_dir`
字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
修改
[
yolov7_qat_dis.yaml
](
./configs/yolov7_qat_dis.yaml
)
中
`model_dir`
字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
```
export CUDA_VISIBLE_DEVICES=0
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov7_qat_dis.yaml
python eval.py --config_path=./configs/yolov7_
tiny_
qat_dis.yaml
```
```
...
...
example/auto_compression/pytorch_yolov7/configs/yolov7_qat_dis.yaml
浏览文件 @
05c2122e
example/auto_compression/pytorch_yolov7/configs/yolov7_tiny_qat_dis.yaml
0 → 100644
浏览文件 @
05c2122e
Global
:
model_dir
:
./yolov7-tiny.onnx
dataset_dir
:
dataset/coco/
train_image_dir
:
train2017
val_image_dir
:
val2017
train_anno_path
:
annotations/instances_train2017.json
val_anno_path
:
annotations/instances_val2017.json
Evaluation
:
True
Distillation
:
alpha
:
1.0
loss
:
soft_label
Quantization
:
onnx_format
:
true
activation_quantize_type
:
'
moving_average_abs_max'
quantize_op_types
:
-
conv2d
-
depthwise_conv2d
TrainConfig
:
train_iter
:
5000
eval_iter
:
1000
learning_rate
:
type
:
CosineAnnealingDecay
learning_rate
:
0.00003
T_max
:
8000
optimizer_builder
:
optimizer
:
type
:
SGD
weight_decay
:
0.00004
example/auto_compression/pytorch_yolov7/eval.py
浏览文件 @
05c2122e
...
@@ -19,7 +19,7 @@ import argparse
...
@@ -19,7 +19,7 @@ import argparse
from
tqdm
import
tqdm
from
tqdm
import
tqdm
import
paddle
import
paddle
from
paddleslim.auto_compression.config_helpers
import
load_config
as
load_slim_config
from
paddleslim.auto_compression.config_helpers
import
load_config
as
load_slim_config
from
paddleslim.
common
import
load_onnx
_model
from
paddleslim.
auto_compression.utils
import
load_inference
_model
from
post_process
import
YOLOv7PostProcess
,
coco_metric
from
post_process
import
YOLOv7PostProcess
,
coco_metric
from
dataset
import
COCOValDataset
from
dataset
import
COCOValDataset
...
@@ -46,8 +46,8 @@ def eval():
...
@@ -46,8 +46,8 @@ def eval():
place
=
paddle
.
CUDAPlace
(
0
)
if
FLAGS
.
devices
==
'gpu'
else
paddle
.
CPUPlace
()
place
=
paddle
.
CUDAPlace
(
0
)
if
FLAGS
.
devices
==
'gpu'
else
paddle
.
CPUPlace
()
exe
=
paddle
.
static
.
Executor
(
place
)
exe
=
paddle
.
static
.
Executor
(
place
)
val_program
,
feed_target_names
,
fetch_targets
=
load_
onnx
_model
(
val_program
,
feed_target_names
,
fetch_targets
=
load_
inference
_model
(
global_config
[
"model_dir"
])
global_config
[
"model_dir"
]
,
exe
)
bboxes_list
,
bbox_nums_list
,
image_id_list
=
[],
[],
[]
bboxes_list
,
bbox_nums_list
,
image_id_list
=
[],
[],
[]
with
tqdm
(
with
tqdm
(
...
...
example/auto_compression/pytorch_yolov7/onnx_trt_infer.py
0 → 100644
浏览文件 @
05c2122e
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
numpy
as
np
import
cv2
import
tensorrt
as
trt
import
pycuda.driver
as
cuda
import
pycuda.autoinit
import
os
import
time
import
random
import
argparse
EXPLICIT_BATCH
=
1
<<
(
int
)(
trt
.
NetworkDefinitionCreationFlag
.
EXPLICIT_BATCH
)
EXPLICIT_PRECISION
=
1
<<
(
int
)(
trt
.
NetworkDefinitionCreationFlag
.
EXPLICIT_PRECISION
)
# load coco labels
CLASS_LABEL
=
[
"person"
,
"bicycle"
,
"car"
,
"motorcycle"
,
"airplane"
,
"bus"
,
"train"
,
"truck"
,
"boat"
,
"traffic light"
,
"fire hydrant"
,
"stop sign"
,
"parking meter"
,
"bench"
,
"bird"
,
"cat"
,
"dog"
,
"horse"
,
"sheep"
,
"cow"
,
"elephant"
,
"bear"
,
"zebra"
,
"giraffe"
,
"backpack"
,
"umbrella"
,
"handbag"
,
"tie"
,
"suitcase"
,
"frisbee"
,
"skis"
,
"snowboard"
,
"sports ball"
,
"kite"
,
"baseball bat"
,
"baseball glove"
,
"skateboard"
,
"surfboard"
,
"tennis racket"
,
"bottle"
,
"wine glass"
,
"cup"
,
"fork"
,
"knife"
,
"spoon"
,
"bowl"
,
"banana"
,
"apple"
,
"sandwich"
,
"orange"
,
"broccoli"
,
"carrot"
,
"hot dog"
,
"pizza"
,
"donut"
,
"cake"
,
"chair"
,
"couch"
,
"potted plant"
,
"bed"
,
"dining table"
,
"toilet"
,
"tv"
,
"laptop"
,
"mouse"
,
"remote"
,
"keyboard"
,
"cell phone"
,
"microwave"
,
"oven"
,
"toaster"
,
"sink"
,
"refrigerator"
,
"book"
,
"clock"
,
"vase"
,
"scissors"
,
"teddy bear"
,
"hair drier"
,
"toothbrush"
]
def
preprocess
(
image
,
input_size
,
mean
=
None
,
std
=
None
,
swap
=
(
2
,
0
,
1
)):
if
len
(
image
.
shape
)
==
3
:
padded_img
=
np
.
ones
((
input_size
[
0
],
input_size
[
1
],
3
))
*
114.0
else
:
padded_img
=
np
.
ones
(
input_size
)
*
114.0
img
=
np
.
array
(
image
)
r
=
min
(
input_size
[
0
]
/
img
.
shape
[
0
],
input_size
[
1
]
/
img
.
shape
[
1
])
resized_img
=
cv2
.
resize
(
img
,
(
int
(
img
.
shape
[
1
]
*
r
),
int
(
img
.
shape
[
0
]
*
r
)),
interpolation
=
cv2
.
INTER_LINEAR
,
).
astype
(
np
.
float32
)
padded_img
[:
int
(
img
.
shape
[
0
]
*
r
),
:
int
(
img
.
shape
[
1
]
*
r
)]
=
resized_img
padded_img
=
padded_img
[:,
:,
::
-
1
]
padded_img
/=
255.0
if
mean
is
not
None
:
padded_img
-=
mean
if
std
is
not
None
:
padded_img
/=
std
padded_img
=
padded_img
.
transpose
(
swap
)
padded_img
=
np
.
ascontiguousarray
(
padded_img
,
dtype
=
np
.
float32
)
return
padded_img
,
r
def
postprocess
(
predictions
,
ratio
):
boxes
=
predictions
[:,
:
4
]
scores
=
predictions
[:,
4
:
5
]
*
predictions
[:,
5
:]
boxes_xyxy
=
np
.
ones_like
(
boxes
)
boxes_xyxy
[:,
0
]
=
boxes
[:,
0
]
-
boxes
[:,
2
]
/
2.
boxes_xyxy
[:,
1
]
=
boxes
[:,
1
]
-
boxes
[:,
3
]
/
2.
boxes_xyxy
[:,
2
]
=
boxes
[:,
0
]
+
boxes
[:,
2
]
/
2.
boxes_xyxy
[:,
3
]
=
boxes
[:,
1
]
+
boxes
[:,
3
]
/
2.
boxes_xyxy
/=
ratio
dets
=
multiclass_nms
(
boxes_xyxy
,
scores
,
nms_thr
=
0.45
,
score_thr
=
0.1
)
return
dets
def
nms
(
boxes
,
scores
,
nms_thr
):
"""Single class NMS implemented in Numpy."""
x1
=
boxes
[:,
0
]
y1
=
boxes
[:,
1
]
x2
=
boxes
[:,
2
]
y2
=
boxes
[:,
3
]
areas
=
(
x2
-
x1
+
1
)
*
(
y2
-
y1
+
1
)
order
=
scores
.
argsort
()[::
-
1
]
keep
=
[]
while
order
.
size
>
0
:
i
=
order
[
0
]
keep
.
append
(
i
)
xx1
=
np
.
maximum
(
x1
[
i
],
x1
[
order
[
1
:]])
yy1
=
np
.
maximum
(
y1
[
i
],
y1
[
order
[
1
:]])
xx2
=
np
.
minimum
(
x2
[
i
],
x2
[
order
[
1
:]])
yy2
=
np
.
minimum
(
y2
[
i
],
y2
[
order
[
1
:]])
w
=
np
.
maximum
(
0.0
,
xx2
-
xx1
+
1
)
h
=
np
.
maximum
(
0.0
,
yy2
-
yy1
+
1
)
inter
=
w
*
h
ovr
=
inter
/
(
areas
[
i
]
+
areas
[
order
[
1
:]]
-
inter
)
inds
=
np
.
where
(
ovr
<=
nms_thr
)[
0
]
order
=
order
[
inds
+
1
]
return
keep
def
multiclass_nms
(
boxes
,
scores
,
nms_thr
,
score_thr
):
"""Multiclass NMS implemented in Numpy"""
final_dets
=
[]
num_classes
=
scores
.
shape
[
1
]
for
cls_ind
in
range
(
num_classes
):
cls_scores
=
scores
[:,
cls_ind
]
valid_score_mask
=
cls_scores
>
score_thr
if
valid_score_mask
.
sum
()
==
0
:
continue
else
:
valid_scores
=
cls_scores
[
valid_score_mask
]
valid_boxes
=
boxes
[
valid_score_mask
]
keep
=
nms
(
valid_boxes
,
valid_scores
,
nms_thr
)
if
len
(
keep
)
>
0
:
cls_inds
=
np
.
ones
((
len
(
keep
),
1
))
*
cls_ind
dets
=
np
.
concatenate
(
[
valid_boxes
[
keep
],
valid_scores
[
keep
,
None
],
cls_inds
],
1
)
final_dets
.
append
(
dets
)
if
len
(
final_dets
)
==
0
:
return
None
return
np
.
concatenate
(
final_dets
,
0
)
def
get_color_map_list
(
num_classes
):
color_map
=
num_classes
*
[
0
,
0
,
0
]
for
i
in
range
(
0
,
num_classes
):
j
=
0
lab
=
i
while
lab
:
color_map
[
i
*
3
]
|=
(((
lab
>>
0
)
&
1
)
<<
(
7
-
j
))
color_map
[
i
*
3
+
1
]
|=
(((
lab
>>
1
)
&
1
)
<<
(
7
-
j
))
color_map
[
i
*
3
+
2
]
|=
(((
lab
>>
2
)
&
1
)
<<
(
7
-
j
))
j
+=
1
lab
>>=
3
color_map
=
[
color_map
[
i
:
i
+
3
]
for
i
in
range
(
0
,
len
(
color_map
),
3
)]
return
color_map
def
draw_box
(
img
,
boxes
,
scores
,
cls_ids
,
conf
=
0.5
,
class_names
=
None
):
color_list
=
get_color_map_list
(
len
(
class_names
))
for
i
in
range
(
len
(
boxes
)):
box
=
boxes
[
i
]
cls_id
=
int
(
cls_ids
[
i
])
color
=
tuple
(
color_list
[
cls_id
])
score
=
scores
[
i
]
if
score
<
conf
:
continue
x0
=
int
(
box
[
0
])
y0
=
int
(
box
[
1
])
x1
=
int
(
box
[
2
])
y1
=
int
(
box
[
3
])
text
=
'{}:{:.1f}%'
.
format
(
class_names
[
cls_id
],
score
*
100
)
font
=
cv2
.
FONT_HERSHEY_SIMPLEX
txt_size
=
cv2
.
getTextSize
(
text
,
font
,
0.4
,
1
)[
0
]
cv2
.
rectangle
(
img
,
(
x0
,
y0
),
(
x1
,
y1
),
color
,
2
)
cv2
.
rectangle
(
img
,
(
x0
,
y0
+
1
),
(
x0
+
txt_size
[
0
]
+
1
,
y0
+
int
(
1.5
*
txt_size
[
1
])),
color
,
-
1
)
cv2
.
putText
(
img
,
text
,
(
x0
,
y0
+
txt_size
[
1
]),
font
,
0.8
,
(
0
,
255
,
0
),
thickness
=
2
)
return
img
def
get_engine
(
precision
,
model_file_path
):
# TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
TRT_LOGGER
=
trt
.
Logger
()
builder
=
trt
.
Builder
(
TRT_LOGGER
)
config
=
builder
.
create_builder_config
()
if
precision
==
'int8'
:
network
=
builder
.
create_network
(
EXPLICIT_BATCH
|
EXPLICIT_PRECISION
)
else
:
network
=
builder
.
create_network
(
EXPLICIT_BATCH
)
parser
=
trt
.
OnnxParser
(
network
,
TRT_LOGGER
)
runtime
=
trt
.
Runtime
(
TRT_LOGGER
)
if
model_file_path
.
endswith
(
'.trt'
):
# If a serialized engine exists, use it instead of building an engine.
print
(
"Reading engine from file {}"
.
format
(
model_file_path
))
with
open
(
model_file_path
,
"rb"
)
as
f
,
trt
.
Runtime
(
TRT_LOGGER
)
as
runtime
:
engine
=
runtime
.
deserialize_cuda_engine
(
f
.
read
())
for
i
in
range
(
network
.
num_layers
):
layer
=
network
.
get_layer
(
i
)
print
(
i
,
layer
.
name
)
return
engine
else
:
config
.
max_workspace_size
=
1
<<
30
if
precision
==
"fp16"
:
if
not
builder
.
platform_has_fast_fp16
:
print
(
"FP16 is not supported natively on this platform/device"
)
else
:
config
.
set_flag
(
trt
.
BuilderFlag
.
FP16
)
elif
precision
==
"int8"
:
if
not
builder
.
platform_has_fast_int8
:
print
(
"INT8 is not supported natively on this platform/device"
)
else
:
if
builder
.
platform_has_fast_fp16
:
# Also enable fp16, as some layers may be even more efficient in fp16 than int8
config
.
set_flag
(
trt
.
BuilderFlag
.
FP16
)
config
.
set_flag
(
trt
.
BuilderFlag
.
INT8
)
builder
.
max_batch_size
=
1
print
(
'Loading ONNX file from path {}...'
.
format
(
model_file_path
))
with
open
(
model_file_path
,
'rb'
)
as
model
:
print
(
'Beginning ONNX file parsing'
)
if
not
parser
.
parse
(
model
.
read
()):
print
(
'ERROR: Failed to parse the ONNX file.'
)
for
error
in
range
(
parser
.
num_errors
):
print
(
parser
.
get_error
(
error
))
return
None
print
(
'Completed parsing of ONNX file'
)
print
(
'Building an engine from file {}; this may take a while...'
.
format
(
model_file_path
))
plan
=
builder
.
build_serialized_network
(
network
,
config
)
engine
=
runtime
.
deserialize_cuda_engine
(
plan
)
print
(
"Completed creating Engine"
)
with
open
(
model_file_path
,
"wb"
)
as
f
:
f
.
write
(
engine
.
serialize
())
for
i
in
range
(
network
.
num_layers
):
layer
=
network
.
get_layer
(
i
)
print
(
i
,
layer
.
name
)
return
engine
# Simple helper data class that's a little nicer to use than a 2-tuple.
class
HostDeviceMem
(
object
):
def
__init__
(
self
,
host_mem
,
device_mem
):
self
.
host
=
host_mem
self
.
device
=
device_mem
def
__str__
(
self
):
return
"Host:
\n
"
+
str
(
self
.
host
)
+
"
\n
Device:
\n
"
+
str
(
self
.
device
)
def
__repr__
(
self
):
return
self
.
__str__
()
def
allocate_buffers
(
engine
):
inputs
=
[]
outputs
=
[]
bindings
=
[]
stream
=
cuda
.
Stream
()
for
binding
in
engine
:
size
=
trt
.
volume
(
engine
.
get_binding_shape
(
binding
))
*
engine
.
max_batch_size
dtype
=
trt
.
nptype
(
engine
.
get_binding_dtype
(
binding
))
# Allocate host and device buffers
host_mem
=
cuda
.
pagelocked_empty
(
size
,
dtype
)
device_mem
=
cuda
.
mem_alloc
(
host_mem
.
nbytes
)
# Append the device buffer to device bindings.
bindings
.
append
(
int
(
device_mem
))
# Append to the appropriate list.
if
engine
.
binding_is_input
(
binding
):
inputs
.
append
(
HostDeviceMem
(
host_mem
,
device_mem
))
else
:
outputs
.
append
(
HostDeviceMem
(
host_mem
,
device_mem
))
return
inputs
,
outputs
,
bindings
,
stream
def
run_inference
(
context
,
bindings
,
inputs
,
outputs
,
stream
):
# Transfer input data to the GPU.
[
cuda
.
memcpy_htod_async
(
inp
.
device
,
inp
.
host
,
stream
)
for
inp
in
inputs
]
# Run inference.
context
.
execute_async_v2
(
bindings
=
bindings
,
stream_handle
=
stream
.
handle
)
# Transfer predictions back from the GPU.
[
cuda
.
memcpy_dtoh_async
(
out
.
host
,
out
.
device
,
stream
)
for
out
in
outputs
]
# Synchronize the stream
stream
.
synchronize
()
# Return only the host outputs.
return
[
out
.
host
for
out
in
outputs
]
def
main
(
args
):
onnx_model
=
args
.
model_path
img_path
=
args
.
image_file
num_class
=
len
(
CLASS_LABEL
)
repeat
=
1000
engine
=
get_engine
(
args
.
precision
,
onnx_model
)
model_all_names
=
[]
for
idx
in
range
(
engine
.
num_bindings
):
is_input
=
engine
.
binding_is_input
(
idx
)
name
=
engine
.
get_binding_name
(
idx
)
op_type
=
engine
.
get_binding_dtype
(
idx
)
model_all_names
.
append
(
name
)
shape
=
engine
.
get_binding_shape
(
idx
)
print
(
'input id:'
,
idx
,
' is input: '
,
is_input
,
' binding name:'
,
name
,
' shape:'
,
shape
,
'type: '
,
op_type
)
context
=
engine
.
create_execution_context
()
print
(
'Allocate buffers ...'
)
inputs
,
outputs
,
bindings
,
stream
=
allocate_buffers
(
engine
)
print
(
"TRT set input ..."
)
origin_img
=
cv2
.
imread
(
img_path
)
input_shape
=
[
args
.
img_shape
,
args
.
img_shape
]
input_image
,
ratio
=
preprocess
(
origin_img
,
input_shape
)
inputs
[
0
].
host
=
np
.
expand_dims
(
input_image
,
axis
=
0
)
for
_
in
range
(
0
,
50
):
trt_outputs
=
run_inference
(
context
,
bindings
=
bindings
,
inputs
=
inputs
,
outputs
=
outputs
,
stream
=
stream
)
time1
=
time
.
time
()
for
_
in
range
(
0
,
repeat
):
trt_outputs
=
run_inference
(
context
,
bindings
=
bindings
,
inputs
=
inputs
,
outputs
=
outputs
,
stream
=
stream
)
time2
=
time
.
time
()
# total time cost(ms)
total_inference_cost
=
(
time2
-
time1
)
*
1000
print
(
"model path: "
,
onnx_model
,
" precision: "
,
args
.
precision
)
print
(
"In TensorRT, "
,
"average latency is : {} ms"
.
format
(
total_inference_cost
/
repeat
))
# Do postprocess
output
=
trt_outputs
[
0
]
predictions
=
np
.
reshape
(
output
,
(
1
,
-
1
,
int
(
5
+
num_class
)))[
0
]
dets
=
postprocess
(
predictions
,
ratio
)
# Draw rectangles and labels on the original image
if
dets
is
not
None
:
final_boxes
,
final_scores
,
final_cls_inds
=
dets
[:,
:
4
],
dets
[:,
4
],
dets
[:,
5
]
origin_img
=
draw_box
(
origin_img
,
final_boxes
,
final_scores
,
final_cls_inds
,
conf
=
0.5
,
class_names
=
CLASS_LABEL
)
cv2
.
imwrite
(
'output.jpg'
,
origin_img
)
print
(
'The prediction results are saved in output.jpg.'
)
if
__name__
==
"__main__"
:
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'--model_path'
,
type
=
str
,
default
=
"quant_model.onnx"
,
help
=
"inference model filepath"
)
parser
.
add_argument
(
'--image_file'
,
type
=
str
,
default
=
"bus.jpg"
,
help
=
"image path"
)
parser
.
add_argument
(
'--precision'
,
type
=
str
,
default
=
'fp32'
,
help
=
"support fp32/fp16/int8."
)
parser
.
add_argument
(
'--img_shape'
,
type
=
int
,
default
=
640
,
help
=
"input_size"
)
args
=
parser
.
parse_args
()
main
(
args
)
example/auto_compression/pytorch_yolov7/post_quant.py
浏览文件 @
05c2122e
...
@@ -22,6 +22,7 @@ from paddleslim.common import load_onnx_model
...
@@ -22,6 +22,7 @@ from paddleslim.common import load_onnx_model
from
paddleslim.quant
import
quant_post_static
from
paddleslim.quant
import
quant_post_static
from
dataset
import
COCOTrainDataset
from
dataset
import
COCOTrainDataset
def
argsparser
():
def
argsparser
():
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
.
add_argument
(
parser
.
add_argument
(
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录