未验证 提交 96d309a7 编写于 作者: G Guanghua Yu 提交者: GitHub

Cherry pick some PR (#1354)

上级 0c34d239
...@@ -4,40 +4,51 @@ ...@@ -4,40 +4,51 @@
</p> </p>
<p align="center"> <p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-blue.svg"></a> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://paddleslim.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat"></a> <a href="https://github.com/PaddlePaddle/PaddleSlim/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/Paddle?color=ffa"></a>
<a href="https://paddleslim.readthedocs.io/zh_CN/latest/"><img src="https://img.shields.io/badge/中文文档-最新-brightgreen.svg"></a> <a href=""><img src="https://img.shields.io/badge/python-3.6.2+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSlim/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSlim?color=9ea"></a>
<a href="https://pypi.org/project/PaddleSlim/"><img src="https://img.shields.io/pypi/dm/PaddleSlim?color=9cf"></a>
<a href="https://github.com/PaddlePaddle/PaddleSlim/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSlim?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/PaddleSlim/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSlim?color=ccf"></a>
</p> </p>
PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略,帮助用户快速实现模型的小型化。 PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低比特量化、知识蒸馏、稀疏化和模型结构搜索**等模型压缩策略,帮助开发者快速实现模型的小型化。
## 产品动态 ## 产品动态
- 🔥 **2022.08.16:自动化压缩功能升级**
- 支持直接加载ONNX模型和Paddle模型导出至ONNX
- 发布量化分析工具试用版,发布[YOLO系列离线量化工具](example/post_training_quantization/pytorch_yolo_series/)
- 更新[YOLO-Series自动化压缩模型库](example/auto_compression/pytorch_yolo_series)
| 模型 | Base mAP<sup>val<br>0.5:0.95 | ACT量化mAP<sup>val<br>0.5:0.95 | 模型体积压缩比 | 预测时延<sup><small>FP32</small><sup><br><sup> | 预测时延<sup><small>INT8</small><sup><br><sup> | 预测加速比 |
| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: |
| PPYOLOE-s | 43.1 | 42.6 | 3.9倍 | 6.51ms | 2.12ms | 3.1倍 |
| YOLOv5s | 37.4 | 36.9 | 3.8倍 | 5.95ms | 1.87ms | 3.2倍 |
| YOLOv6s | 42.4 | 41.3 | 3.9倍 | 9.06ms | 1.83ms | 5.0倍 |
| YOLOv7 | 51.1 | 50.9 | 3.9倍 | 26.84ms | 4.55ms | 5.9倍 |
| YOLOv7-Tiny | 37.3 | 37.0 | 3.9倍 | 5.06ms | 1.68ms | 3.0倍 |
- 🔥 **2022.07.01: 发布[v2.3.0版本](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)** - 🔥 **2022.07.01: 发布[v2.3.0版本](https://github.com/PaddlePaddle/PaddleSlim/releases/tag/v2.3.0)**
- 发布[自动化压缩功能](example/auto_compression) - 发布[自动化压缩功能](example/auto_compression)
- 支持代码无感知压缩:开发者只需提供推理模型文件和数据,既可进行离线量化(PTQ)、量化训练(QAT)、稀疏训练等压缩任务。
- 支持代码无感知压缩:用户只需提供推理模型文件和数据,既可进行离线量化(PTQ)、量化训练(QAT)、稀疏训练等压缩任务。
- 支持自动策略选择,根据任务特点和部署环境特性:自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。 - 支持自动策略选择,根据任务特点和部署环境特性:自动搜索合适的离线量化方法,自动搜索最佳的压缩策略组合方式。
- 发布[自然语言处理](example/auto_compression/nlp)[图像语义分割](example/auto_compression/semantic_segmentation)[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。 - 发布[自然语言处理](example/auto_compression/nlp)[图像语义分割](example/auto_compression/semantic_segmentation)[图像目标检测](example/auto_compression/detection)三个方向的自动化压缩示例。
- 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolov5)、[YOLOv6](example/auto_compression/pytorch_yolov6)、[YOLOv7](example/auto_compression/pytorch_yolov7)、[HuggingFace](example/auto_compression/pytorch_huggingface)、[MobileNet](example/auto_compression/tensorflow_mobilenet)。 - 发布`X2Paddle`模型自动化压缩方案:[YOLOv5](example/auto_compression/pytorch_yolo_series)[YOLOv6](example/auto_compression/pytorch_yolo_series)[YOLOv7](example/auto_compression/pytorch_yolo_series)[HuggingFace](example/auto_compression/pytorch_huggingface)[MobileNet](example/auto_compression/tensorflow_mobilenet)
- 升级量化功能 - 升级量化功能
- 统一量化模型格式;离线量化支持while op;修复BERT大模型量化训练过慢的问题。
- 统一量化模型格式 - 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等。
- 离线量化支持while op
- 新增7种[离线量化方法](docs/zh_cn/tutorials/quant/post_training_quantization.md), 包括HIST, AVG, EMD, Bias Correction, AdaRound等
- 修复BERT大模型量化训练过慢的问题
- 支持半结构化稀疏训练 - 支持半结构化稀疏训练
- 新增延时预估工具 - 新增延时预估工具
- 支持对稀疏化模型、低比特量化模型的性能预估;支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能;提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口。
- 提供部署环境自动扩展工具,可以自动增加在更多 ARM CPU 设备上的预估工具。
- 支持预估指定模型在特定部署环境下 (ARM CPU + Paddle Lite) 的推理性能 <details>
- 提供部署环境自动扩展工具,可以自动增加在更多 ARM CPU 设备上的预估工具 <summary>历史更新</summary>
- 支持对稀疏化模型、低比特量化模型的性能预估
- 提供 SD625、SD710、RK3288 芯片 + Paddle Lite 的预估接口
- **2021.11.15: 发布v2.2.0版本** - **2021.11.15: 发布v2.2.0版本**
...@@ -52,6 +63,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低 ...@@ -52,6 +63,7 @@ PaddleSlim是一个专注于深度学习模型压缩的工具库,提供**低
更多信息请参考:[release note](https://github.com/PaddlePaddle/PaddleSlim/releases) 更多信息请参考:[release note](https://github.com/PaddlePaddle/PaddleSlim/releases)
</details>
## 基础压缩功能概览 ## 基础压缩功能概览
......
...@@ -65,6 +65,8 @@ add_arg('use_pact', bool, True, ...@@ -65,6 +65,8 @@ add_arg('use_pact', bool, True,
"Whether to use PACT or not.") "Whether to use PACT or not.")
add_arg('analysis', bool, False, add_arg('analysis', bool, False,
"Whether analysis variables distribution.") "Whether analysis variables distribution.")
add_arg('onnx_format', bool, False,
"Whether use onnx format or not.")
add_arg('ce_test', bool, False, "Whether to CE test.") add_arg('ce_test', bool, False, "Whether to CE test.")
# yapf: enable # yapf: enable
...@@ -257,6 +259,8 @@ def compress(args): ...@@ -257,6 +259,8 @@ def compress(args):
'window_size': 10000, 'window_size': 10000,
# The decay coefficient of moving average, default is 0.9 # The decay coefficient of moving average, default is 0.9
'moving_rate': 0.9, 'moving_rate': 0.9,
# Whether use onnx format or not
'onnx_format': args.onnx_format,
} }
# 2. quantization transform programs (training aware) # 2. quantization transform programs (training aware)
...@@ -298,9 +302,9 @@ def compress(args): ...@@ -298,9 +302,9 @@ def compress(args):
places, places,
quant_config, quant_config,
scope=None, scope=None,
act_preprocess_func=act_preprocess_func, act_preprocess_func=None,
optimizer_func=optimizer_func, optimizer_func=None,
executor=executor, executor=None,
for_test=True) for_test=True)
compiled_train_prog = quant_aware( compiled_train_prog = quant_aware(
train_prog, train_prog,
...@@ -425,29 +429,23 @@ def compress(args): ...@@ -425,29 +429,23 @@ def compress(args):
# 3. Freeze the graph after training by adjusting the quantize # 3. Freeze the graph after training by adjusting the quantize
# operators' order for the inference. # operators' order for the inference.
# The dtype of float_program's weights is float32, but in int8 range. # The dtype of float_program's weights is float32, but in int8 range.
float_program, int8_program = convert(val_program, places, quant_config, \ model_path = os.path.join(quantization_model_save_dir, args.model)
scope=None, \ if not os.path.isdir(model_path):
save_int8=True) os.makedirs(model_path)
float_program = convert(val_program, places, quant_config)
_logger.info("eval best_model after convert") _logger.info("eval best_model after convert")
final_acc1 = test(best_epoch, float_program) final_acc1 = test(best_epoch, float_program)
_logger.info("final acc:{}".format(final_acc1)) _logger.info("final acc:{}".format(final_acc1))
# 4. Save inference model # 4. Save inference model
model_path = os.path.join(quantization_model_save_dir, args.model,
'act_' + quant_config['activation_quantize_type']
+ '_w_' + quant_config['weight_quantize_type'])
float_path = os.path.join(model_path, 'float')
if not os.path.isdir(model_path):
os.makedirs(model_path)
paddle.fluid.io.save_inference_model( paddle.fluid.io.save_inference_model(
dirname=float_path, dirname=model_path,
feeded_var_names=[image.name], feeded_var_names=[image.name],
target_vars=[out], target_vars=[out],
executor=exe, executor=exe,
main_program=float_program, main_program=float_program,
model_filename=float_path + '/model', model_filename=model_path + '/model.pdmodel',
params_filename=float_path + '/params') params_filename=model_path + '/model.pdiparams')
def main(): def main():
......
...@@ -126,6 +126,8 @@ def compress(args): ...@@ -126,6 +126,8 @@ def compress(args):
'window_size': 10000, 'window_size': 10000,
# The decay coefficient of moving average, default is 0.9 # The decay coefficient of moving average, default is 0.9
'moving_rate': 0.9, 'moving_rate': 0.9,
# Whether use onnx format or not
'onnx_format': args.onnx_format,
} }
pretrain = True pretrain = True
...@@ -294,10 +296,7 @@ def compress(args): ...@@ -294,10 +296,7 @@ def compress(args):
# operators' order for the inference. # operators' order for the inference.
# The dtype of float_program's weights is float32, but in int8 range. # The dtype of float_program's weights is float32, but in int8 range.
############################################################################################################ ############################################################################################################
float_program, int8_program = convert(val_program, places, quant_config, \ float_program = convert(val_program, places, quant_config)
scope=None, \
save_int8=True,
onnx_format=args.onnx_format)
print("eval best_model after convert") print("eval best_model after convert")
final_acc1 = test(best_epoch, float_program) final_acc1 = test(best_epoch, float_program)
############################################################################################################ ############################################################################################################
......
...@@ -21,8 +21,7 @@ import functools ...@@ -21,8 +21,7 @@ import functools
import paddle import paddle
sys.path[0] = os.path.join( sys.path[0] = os.path.join(
os.path.dirname("__file__"), os.path.pardir, os.path.pardir) os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
sys.path[1] = os.path.join( sys.path[1] = os.path.join(os.path.dirname("__file__"), os.path.pardir)
os.path.dirname("__file__"), os.path.pardir)
import imagenet_reader as reader import imagenet_reader as reader
from utility import add_arguments, print_arguments from utility import add_arguments, print_arguments
...@@ -31,8 +30,8 @@ parser = argparse.ArgumentParser(description=__doc__) ...@@ -31,8 +30,8 @@ parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
add_arg('use_gpu', bool, True, "Whether to use GPU or not.") add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('model_path', str, "./pruning/checkpoints/resnet50/2/eval_model/", "Whether to use pretrained model.") add_arg('model_path', str, "./pruning/checkpoints/resnet50/2/eval_model/", "Whether to use pretrained model.")
add_arg('model_name', str, '__model__', "model filename for inference model") add_arg('model_name', str, 'model.pdmodel', "model filename for inference model")
add_arg('params_name', str, '__params__', "params filename for inference model") add_arg('params_name', str, 'model.pdiparams', "params filename for inference model")
add_arg('batch_size', int, 64, "Minibatch size.") add_arg('batch_size', int, 64, "Minibatch size.")
# yapf: enable # yapf: enable
......
...@@ -3,19 +3,19 @@ AutoCompression自动压缩功能 ...@@ -3,19 +3,19 @@ AutoCompression自动压缩功能
AutoCompression AutoCompression
--------------- ---------------
.. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, model_filename, params_filename, save_dir, strategy_config, train_config, train_dataloader, eval_callback, devices='gpu') .. py:class:: paddleslim.auto_compression.AutoCompression(model_dir, train_dataloader, model_filename, params_filename, save_dir, strategy_config, train_config, eval_callback, devices='gpu')
`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L32>`_ `源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/auto_compression.py#L49>`_
根据指定的配置对使用 ``paddle.jit.save`` 接口或者 ``paddle.static.save_inference_model`` 接口保存的推理模型进行压缩。 根据指定的配置对使用 ``paddle.jit.save`` 接口或者 ``paddle.static.save_inference_model`` 接口保存的推理模型进行压缩。
**参数: ** **参数: **
- **model_dir(str)** - 需要压缩的推理模型所在的目录。 - **model_dir(str)** - 需要压缩的推理模型所在的目录。
- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意:如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
- **model_filename(str)** - 需要压缩的推理模型文件名称。 - **model_filename(str)** - 需要压缩的推理模型文件名称。
- **params_filename(str)** - 需要压缩的推理模型参数文件名称。 - **params_filename(str)** - 需要压缩的推理模型参数文件名称。
- **save_dir(str)** - 压缩后模型的所保存的目录。 - **save_dir(str)** - 压缩后模型的所保存的目录。
- **train_dataloader(paddle.io.DataLoader)** - 训练数据迭代器。注意:如果选择离线量化超参搜索策略的话, ``train_dataloader`` 和 ``eval_callback`` 设置相同的数据读取即可。
- **train_config(dict)** - 训练配置。可以配置的参数请参考: `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L103>`_ 。注意:如果选择离线量化超参搜索策略的话, ``train_config`` 直接设置为 ``None`` 即可。 - **train_config(dict)** - 训练配置。可以配置的参数请参考: `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L103>`_ 。注意:如果选择离线量化超参搜索策略的话, ``train_config`` 直接设置为 ``None`` 即可。
- **strategy_config(dict, list(dict), 可选)** - 使用的压缩策略,可以通过设置多个单种策略来并行使用这些压缩方式。字典的关键字必须在: - **strategy_config(dict, list(dict), 可选)** - 使用的压缩策略,可以通过设置多个单种策略来并行使用这些压缩方式。字典的关键字必须在:
``Quantization`` (量化配置, 可配置的参数参考 `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24>`_ ), ``Quantization`` (量化配置, 可配置的参数参考 `<https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24>`_ ),
...@@ -82,13 +82,13 @@ AutoCompression ...@@ -82,13 +82,13 @@ AutoCompression
eval_dataloader = Cifar10(mode='eval') eval_dataloader = Cifar10(mode='eval')
ac = AutoCompression(model_path, model_filename, params_filename, save_dir, \ ac = AutoCompression(model_path, train_dataloader, model_filename, params_filename, save_dir, \
strategy_config="Quantization": Quantization(**default_ptq_config), strategy_config="Quantization": Quantization(**default_ptq_config),
"Distillation": HyperParameterOptimization(**default_distill_config)}, \ "Distillation": HyperParameterOptimization(**default_distill_config)}, \
train_config=None, train_dataloader=train_dataloader, eval_callback=eval_dataloader,devices='gpu') train_config=None, eval_callback=eval_dataloader,devices='gpu')
``` ```
......
...@@ -118,7 +118,7 @@ quant_post_dynamic ...@@ -118,7 +118,7 @@ quant_post_dynamic
quant_post_static quant_post_static
--------------- ---------------
.. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='__model__', save_params_filename='__params__', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False) .. py:function:: paddleslim.quant.quant_post_static(executor,model_dir, quantize_model_path, batch_generator=None, sample_generator=None, model_filename=None, params_filename=None, save_model_filename='model.pdmodel', save_params_filename='model.pdiparams', batch_size=16, batch_nums=None, scope=None, algo='KL', round_type='round', quantizable_op_type=["conv2d","depthwise_conv2d","mul"], is_full_quantize=False, weight_bits=8, activation_bits=8, activation_quantize_type='range_abs_max', weight_quantize_type='channel_wise_abs_max', onnx_format=False, skip_tensor_list=None, optimize_model=False)
`源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py>`_ `源代码 <https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/quant/quanter.py>`_
...@@ -217,15 +217,15 @@ quant_post_static ...@@ -217,15 +217,15 @@ quant_post_static
target_vars=[out], target_vars=[out],
main_program=val_prog, main_program=val_prog,
executor=exe, executor=exe,
model_filename='__model__', model_filename='model.pdmodel',
params_filename='__params__') params_filename='model.pdiparams')
quant_post_static( quant_post_static(
executor=exe, executor=exe,
model_dir='./model_path', model_dir='./model_path',
quantize_model_path='./save_path', quantize_model_path='./save_path',
sample_generator=val_reader, sample_generator=val_reader,
model_filename='__model__', model_filename='model.pdmodel',
params_filename='__params__', params_filename='model.pdiparams',
batch_size=16, batch_size=16,
batch_nums=10) batch_nums=10)
......
...@@ -82,15 +82,15 @@ ACT相比传统的模型压缩方法, ...@@ -82,15 +82,15 @@ ACT相比传统的模型压缩方法,
| [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 | | [语义分割](./semantic_segmentation) | UNet | 65.00 | 64.93 | 15.29 | 10.23 | **1.49** | NVIDIA Tesla T4 |
| NLP | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 | | NLP | PP-MiniLM | 72.81 | 72.44 | 128.01 | 17.97 | **7.12** | NVIDIA Tesla T4 |
| NLP | ERNIE 3.0-Medium | 73.09 | 72.40 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 | | NLP | ERNIE 3.0-Medium | 73.09 | 72.40 | 29.25(fp16) | 19.61 | **1.49** | NVIDIA Tesla T4 |
| [目标检测](./pytorch_yolov5) | YOLOv5s<br/>(PyTorch) | 37.40 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv5s<br/>(PyTorch) | 37.40 | 36.9 | 5.95 | 1.87 | **3.18** | NVIDIA Tesla T4 |
| [目标检测](./pytorch_yolov6) | YOLOv6s<br/>(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv6s<br/>(PyTorch) | 42.4 | 41.3 | 9.06 | 1.83 | **4.95** | NVIDIA Tesla T4 |
| [目标检测](./pytorch_yolov7) | YOLOv7<br/>(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 | | [目标检测](./pytorch_yolo_series) | YOLOv7<br/>(PyTorch) | 51.1 | 50.8 | 26.84 | 4.55 | **5.89** | NVIDIA Tesla T4 |
| [目标检测](./detection) | PP-YOLOE-l | 50.9 | 50.6 | 11.2 | 6.7 | **1.67** | NVIDIA Tesla V100 | | [目标检测](./detection) | PP-YOLOE-s | 43.1 | 42.6 | 6.51 | 2.12 | **3.07** | NVIDIA Tesla T4 |
| [图像分类](./image_classification) | MobileNetV1<br/>(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865(骁龙865) | | [图像分类](./image_classification) | MobileNetV1<br/>(TensorFlow) | 71.0 | 70.22 | 30.45 | 15.86 | **1.92** | SDMM865(骁龙865) |
- 备注:目标检测精度指标为mAP(0.5:0.95)精度测量结果。图像分割精度指标为IoU精度测量结果。 - 备注:目标检测精度指标为mAP(0.5:0.95)精度测量结果。图像分割精度指标为IoU精度测量结果。
- 更多飞桨模型应用示例及Benchmark可以参考:[图像分类](./image_classification)[目标检测](./detection)[语义分割](./semantic_segmentation)[自然语言处理](./nlp) - 更多飞桨模型应用示例及Benchmark可以参考:[图像分类](./image_classification)[目标检测](./detection)[语义分割](./semantic_segmentation)[自然语言处理](./nlp)
- 更多其它框架应用示例及Benchmark可以参考:[YOLOv5(PyTorch)](./pytorch_yolov5)[YOLOv6(PyTorch)](./pytorch_yolov6)[YOLOv7(PyTorch)](./pytorch_yolov7)[HuggingFace(PyTorch)](./pytorch_huggingface)[MobileNet(TensorFlow)](./tensorflow_mobilenet) - 更多其它框架应用示例及Benchmark可以参考:[YOLOv5(PyTorch)](./pytorch_yolo_series)[YOLOv6(PyTorch)](./pytorch_yolo_series)[YOLOv7(PyTorch)](./pytorch_yolo_series)[HuggingFace(PyTorch)](./pytorch_huggingface)[MobileNet(TensorFlow)](./tensorflow_mobilenet)
## **环境准备** ## **环境准备**
......
...@@ -12,6 +12,7 @@ Distillation: ...@@ -12,6 +12,7 @@ Distillation:
loss: soft_label loss: soft_label
Quantization: Quantization:
onnx_format: true
use_pact: true use_pact: true
activation_quantize_type: 'moving_average_abs_max' activation_quantize_type: 'moving_average_abs_max'
quantize_op_types: quantize_op_types:
......
Global:
reader_config: configs/yolo_reader.yml
input_list: ['image']
arch: PPYOLOE # When export exclude_nms=True, need set arch: PPYOLOE
Evaluation: True
model_dir: ./ppyoloe_crn_s_300e_coco
model_filename: model.pdmodel
params_filename: model.pdiparams
Distillation:
alpha: 1.0
loss: soft_label
Quantization:
onnx_format: true
use_pact: true
activation_quantize_type: 'moving_average_abs_max'
quantize_op_types:
- conv2d
- depthwise_conv2d
TrainConfig:
train_iter: 5000
eval_iter: 1000
learning_rate:
type: CosineAnnealingDecay
learning_rate: 0.00003
T_max: 6000
optimizer_builder:
optimizer:
type: SGD
weight_decay: 4.0e-05
...@@ -20,7 +20,7 @@ import paddle ...@@ -20,7 +20,7 @@ import paddle
from ppdet.core.workspace import load_config, merge_config from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from keypoint_utils import keypoint_post_process from keypoint_utils import keypoint_post_process
......
...@@ -20,7 +20,7 @@ import paddle ...@@ -20,7 +20,7 @@ import paddle
from ppdet.core.workspace import load_config, merge_config from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from paddleslim.auto_compression import AutoCompression from paddleslim.auto_compression import AutoCompression
from keypoint_utils import keypoint_post_process from keypoint_utils import keypoint_post_process
...@@ -121,7 +121,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): ...@@ -121,7 +121,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
def main(): def main():
global global_config global global_config
all_config = load_slim_config(FLAGS.config_path) all_config = load_slim_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}" assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
all_config)
global_config = all_config["Global"] global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config']) reader_cfg = load_config(global_config['reader_config'])
......
# ACT超参详细教程 # 1. ACT超参详细教程
## 各压缩方法超参解析 ## 1.1 各压缩方法超参解析
#### 配置定制量化方案 ### 1.1.1 量化(quantization)
量化参数主要设置量化比特数和量化op类型,其中量化op包含卷积层(conv2d, depthwise_conv2d)和全连接层(mul, matmul_v2)。以下为只量化卷积层的示例: 量化参数主要设置量化比特数和量化op类型,其中量化op包含卷积层(conv2d, depthwise_conv2d)和全连接层(mul, matmul_v2)。以下为只量化卷积层的示例:
```yaml ```yaml
...@@ -20,69 +20,148 @@ Quantization: ...@@ -20,69 +20,148 @@ Quantization:
moving_rate: 0.9 # 'moving_average_abs_max' 量化方式的衰减系数,默认 0.9。 moving_rate: 0.9 # 'moving_average_abs_max' 量化方式的衰减系数,默认 0.9。
for_tensorrt: false # 量化后的模型是否使用 TensorRT 进行预测。如果是的话,量化op类型为: TENSORRT_OP_TYPES 。默认值为False. for_tensorrt: false # 量化后的模型是否使用 TensorRT 进行预测。如果是的话,量化op类型为: TENSORRT_OP_TYPES 。默认值为False.
is_full_quantize: false # 是否全量化 is_full_quantize: false # 是否全量化
onnx_format: false # 是否采用ONNX量化标准格式
``` ```
#### 配置定制蒸馏策略 以上配置项说明如下:
- use_pact: 是否开启PACT。一般情况下,开启PACT后,量化产出的模型精度会更高。算法原理请参考:[PACT: Parameterized Clipping Activation for Quantized Neural Networks](https://arxiv.org/abs/1805.06085)
- activation_bits: 激活量化bit数,可选1~8。默认为8。
- weight_bits: 参数量化bit数,可选1~8。默认为8。
- activation_quantize_type: 激活量化方式,可选 'abs_max' , 'range_abs_max' , 'moving_average_abs_max' 。如果使用 TensorRT 加载量化后的模型来预测,请使用 'range_abs_max' 或 'moving_average_abs_max' 。默认为 'moving_average_abs_max'。
- weight_quantize_type: 参数量化方式。可选 'abs_max' , 'channel_wise_abs_max' , 'range_abs_max' , 'moving_average_abs_max' 。如果使用 TensorRT 加载量化后的模型来预测,请使用 'channel_wise_abs_max' 。 默认 'channel_wise_abs_max' 。
- not_quant_pattern: 所有 `name_scope` 包含 'not_quant_pattern' 字符串的 op ,都不量化。 `name_scope` 设置方式请参考 [paddle.static.name_scope](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/name_scope_cn.html#name-scope)
- quantize_op_types:需要进行量化的OP类型。通过以下代码输出所有支持量化的OP类型:
```
from paddleslim.quant.quanter import TRANSFORM_PASS_OP_TYPES,QUANT_DEQUANT_PASS_OP_TYPES
print(TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES)
```
- dtype: 量化后的参数类型,默认 int8 , 目前仅支持 int8
- window_size: 'range_abs_max' 量化方式的 window size ,默认10000。
- moving_rate: 'moving_average_abs_max' 量化方式的衰减系数,默认 0.9。
- for_tensorrt: 量化后的模型是否使用 TensorRT 进行预测。默认值为False. 通过以下代码,输出for_tensorrt=True时会量化到的OP:
```
from paddleslim.quant.quanter import TENSORRT_OP_TYPES
print(TENSORRT_OP_TYPES)
```
- is_full_quantize: 是否量化所有可支持op类型。默认值为False.
### 1.1.2 知识蒸馏(knowledge distillation)
蒸馏参数主要设置蒸馏节点(`node`)和教师预测模型路径,如下所示: 蒸馏参数主要设置蒸馏节点(`node`)和教师预测模型路径,如下所示:
```yaml ```yaml
Distillation: Distillation:
# alpha: 蒸馏loss所占权重;可输入多个数值,支持不同节点之间使用不同的ahpha值
alpha: 1.0 alpha: 1.0
# loss: 蒸馏loss算法;可输入多个loss,支持不同节点之间使用不同的loss算法
loss: l2 loss: l2
# node: 蒸馏节点,即某层输出的变量名称,可以选择:
# 1. 使用自蒸馏的话,蒸馏结点仅包含学生网络节点即可, 支持多节点蒸馏;
# 2. 使用其他蒸馏的话,蒸馏节点需要包含教师网络节点和对应的学生网络节点,
# 每两个节点组成一对,分别属于教师模型和学生模型,支持多节点蒸馏。
node: node:
- relu_30.tmp_0 - relu_30.tmp_0
# teacher_model_dir: 保存预测模型文件和预测模型参数文件的文件夹名称
teacher_model_dir: ./inference_model teacher_model_dir: ./inference_model
# teacher_model_filename: 预测模型文件,格式为 *.pdmodel 或 __model__ # teacher_model_filename: 预测模型文件,格式为 *.pdmodel 或 __model__
teacher_model_filename: model.pdmodel teacher_model_filename: model.pdmodel
# teacher_params_filename: 预测模型参数文件,格式为 *.pdiparams 或 __params__ # teacher_params_filename: 预测模型参数文件,格式为 *.pdiparams 或 __params__
teacher_params_filename: model.pdiparams teacher_params_filename: model.pdiparams
``` ```
以上配置项说明如下:
- alpha: 蒸馏loss所占权重;可输入多个数值,支持不同节点之间使用不同的alpha值。
- loss: 蒸馏loss算法;可输入多个loss,支持不同节点之间使用不同的loss算法。 可选"soft_label"、“l2”或“fsp”。也可自定义loss。具体定义和使用可参考[知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html)
- node: 蒸馏节点,即某层输出的变量名称。该选项设置方式分两种情况:
- 蒸馏loss目前支持的有:fsp,l2,soft_label,也可自定义loss。具体定义和使用可参考[知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html) - 自蒸馏:教师模型为压缩前的推理模型,学生模型为压缩后的推理模型。‘node’ 可设置为变量名称的列表,ACT会自动在该列表中的变量上依次添加知识蒸馏loss。示例如下:
```
node:
- relu_30.tmp_0
- relu_31.tmp_0
```
上述示例,会添加两个知识蒸馏loss。第一个loss的输入为教师模型和学生模型的 'relu_30.tmp_0',第二个loss的输入为教师模型和学生模型的'relu_31.tmp_0'。
- 普通蒸馏:教师模型为任意模型,学生模型为压缩后的推理模型。‘node’ 可设置为变量名称的列表,列表中元素数量必须为偶数。示例如下:
```
node:
- teacher_relu_0.tmp_0
- student_relu_0.tmp_0
- teacher_relu_1.tmp_0
- student_relu_1.tmp_0
```
#### 配置定制结构化稀疏策略 上述示例,会添加两个知识蒸馏loss。第一个loss的输入为教师模型的变量“teacher_relu_0.tmp_0”和学生模型的变量“student_relu_0.tmp_0”,第二个loss的输入为教师模型的变量“teacher_relu_1.tmp_0”和学生模型的“student_relu_1.tmp_0”。
如果不设置`node`,则分别取教师模型和学生模型的最后一个带参数的层的输出,组成知识蒸馏loss.
- teacher_model_dir: 用于监督压缩后模型训练的教师模型所在的路径。如果不设置该选项,则使用压缩前的模型做为教师模型。
- teacher_model_filename: 教师模型的模型文件名称,格式为 *.pdmodel 或 __model__。仅当设置`teacher_model_dir`后生效。
- teacher_params_filename: 教师模型的参数文件名称,格式为 *.pdiparams 或 __params__。仅当设置`teacher_model_dir`后生效。
### 1.1.3 结构化稀疏(sparsity)
结构化稀疏参数设置如下所示: 结构化稀疏参数设置如下所示:
```yaml ```yaml
ChannelPrune: ChannelPrune:
# pruned_ratio: 裁剪比例
pruned_ratio: 0.25 pruned_ratio: 0.25
# prune_params_name: 需要裁剪的参数名字
prune_params_name: prune_params_name:
- conv1_weights - conv1_weights
# criterion: 评估一个卷积层内通道重要性所参考的指标
criterion: l1_norm criterion: l1_norm
``` ```
- criterion目前支持的有:l1_norm , bn_scale , geometry_median。具体定义和使用可参考[结构化稀疏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/prune/prune_api.html)
#### 配置定制ASP半结构化稀疏策略 - pruned_ratio: 每个卷积层的通道数被剪裁的比例。
- prune_params_name: 待剪裁的卷积层的权重名称。通过以下脚本获得推理模型中所有卷积层的权重名称:
```
import paddle
paddle.enable_static()
model_dir="./inference_model"
exe = paddle.static.Executor(paddle.CPUPlace())
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(model_dir, exe))
for var_ in inference_program.list_vars():
if var_.persistable and "conv2d" in var_.name:
print(f"{var_.name}")
```
或者,使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。
- criterion: 评估卷积通道重要性的指标。可选 “l1_norm” , “bn_scale” , “geometry_median”。具体定义和使用可参考[结构化稀疏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/prune/prune_api.html)
### 1.1.4 ASP半结构化稀疏
半结构化稀疏参数设置如下所示: 半结构化稀疏参数设置如下所示:
```yaml ```yaml
ASPPrune: ASPPrune:
# prune_params_name: 需要裁剪的参数名字
prune_params_name: prune_params_name:
- conv1_weights - conv1_weights
``` ```
#### 配置定制针对Transformer结构的结构化剪枝策略 - prune_params_name: 待剪裁的卷积层的权重名称。通过以下脚本获得推理模型中所有卷积层的权重名称:
```
import paddle
paddle.enable_static()
model_dir="./inference_model"
exe = paddle.static.Executor(paddle.CPUPlace())
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(model_dir, exe))
for var_ in inference_program.list_vars():
if var_.persistable and "conv2d" in var_.name:
print(f"{var_.name}")
```
或者,使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件,选择合适的卷积层进行剪裁。
### 1.1.5 Transformer结构化剪枝
针对Transformer结构的结构化剪枝参数设置如下所示: 针对Transformer结构的结构化剪枝参数设置如下所示:
```yaml ```yaml
TransformerPrune: TransformerPrune:
# pruned_ratio: 每个全链接层的裁剪比例
pruned_ratio: 0.25 pruned_ratio: 0.25
``` ```
- pruned_ratio: 每个全链接层的被剪裁的比例。
#### 配置定制非结构化稀疏策略 ### 1.1.6 非结构化稀疏策略
非结构化稀疏参数设置如下所示: 非结构化稀疏参数设置如下所示:
```yaml ```yaml
...@@ -122,7 +201,7 @@ UnstructurePrune: ...@@ -122,7 +201,7 @@ UnstructurePrune:
- local_sparsity 表示剪裁比例(ratio)应用的范围,仅在 'ratio' 模式生效。local_sparsity 开启时意味着每个参与剪裁的参数矩阵稀疏度均为 'ratio', 关闭时表示只保证模型整体稀疏度达到'ratio',但是每个参数矩阵的稀疏度可能存在差异。各个矩阵稀疏度保持一致时,稀疏加速更显著。 - local_sparsity 表示剪裁比例(ratio)应用的范围,仅在 'ratio' 模式生效。local_sparsity 开启时意味着每个参与剪裁的参数矩阵稀疏度均为 'ratio', 关闭时表示只保证模型整体稀疏度达到'ratio',但是每个参数矩阵的稀疏度可能存在差异。各个矩阵稀疏度保持一致时,稀疏加速更显著。
- 更多非结构化稀疏的参数含义详见[非结构化稀疏API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/pruners/unstructured_pruner.rst) - 更多非结构化稀疏的参数含义详见[非结构化稀疏API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/pruners/unstructured_pruner.rst)
#### 配置训练超参 ### 1.1.7 训练超参
训练参数主要设置学习率、训练次数(epochs)和优化器等。 训练参数主要设置学习率、训练次数(epochs)和优化器等。
```yaml ```yaml
...@@ -143,12 +222,69 @@ TrainConfig: ...@@ -143,12 +222,69 @@ TrainConfig:
boundaries: [4500] # 设置策略参数 boundaries: [4500] # 设置策略参数
values: [0.005, 0.0005] # 设置策略参数 values: [0.005, 0.0005] # 设置策略参数
``` ```
## 其他参数配置 ## 1.2 FAQ
#### 1.自动蒸馏效果不理想,怎么自主选择蒸馏节点? ### 1.自动蒸馏效果不理想,怎么自主选择蒸馏节点?
首先使用[Netron工具](https://netron.app/) 可视化`model.pdmodel`模型文件,选择模型中某些层输出Tensor名称,对蒸馏节点进行配置。(一般选择Backbone或网络的输出等层进行蒸馏) 首先使用[Netron工具](https://netron.app/) 可视化`model.pdmodel`模型文件,选择模型中某些层输出Tensor名称,对蒸馏节点进行配置。(一般选择Backbone或网络的输出等层进行蒸馏)
<div align="center"> <div align="center">
<img src="../../docs/images/dis_node.png" width="600"> <img src="../../docs/images/dis_node.png" width="600">
</div> </div>
### 2.如何获得推理模型中的OP类型
执行以下代码获取推理模型中的OP类型,其中`model_dir`为推理模型存储路径。
```
import paddle
paddle.enable_static()
model_dir="./inference_model"
exe = paddle.static.Executor(paddle.CPUPlace())
inference_program, _, _ = (
paddle.static.load_inference_model(model_dir, exe))
op_types = {}
for block in inference_program.blocks:
for op in block.ops:
op_types[op.type] = 1
print(f"Operators in inference model:\n{op_types.keys()}")
```
所用飞桨框架接口:
- [load_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/load_inference_model_cn.html#load-inference-model)
- [Program](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/Program_cn.html#program)
- [Executor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/static/Executor_cn.html#executor)
### 3. 量化支持对哪些OP进行量化
执行以下代码,查看当前PaddlePaddle版本的量化功能所支持的OP类型:
```
from paddle.fluid.contrib.slim.quantization.utils import _weight_supported_quantizable_op_type, _act_supported_quantizable_op_type
print(f"_supported_quantizable_op_type:\n{_weight_supported_quantizable_op_type}")
print(f"_supported_quantizable_op_type:\n{_act_supported_quantizable_op_type}")
```
### 4. 如何设置推理模型中OP的‘name_scope’属性
以下代码,将输出变量为`conv2d_52.tmp_0`的OP的`name_scope`设置为'skip_quant':
```
import paddle
paddle.enable_static()
model_dir="./original_model"
exe = paddle.static.Executor(paddle.CPUPlace())
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(model_dir, exe))
skips = ['conv2d_52.tmp_0']
for block in inference_program.blocks:
for op in block.ops:
if op.output_arg_names[0] in skips:
op._set_attr("name_scope", "skip_quant")
feed_vars = []
for var_ in inference_program.list_vars():
if var_.name in feed_target_names:
feed_vars.append(var_)
paddle.static.save_inference_model("./infer_model", feed_vars, fetch_targets, exe, program=inference_program)
```
...@@ -21,28 +21,30 @@ ...@@ -21,28 +21,30 @@
### PaddleClas模型 ### PaddleClas模型
| 模型 | 策略 | Top-1 Acc | GPU 耗时(ms) | ARM CPU 耗时(ms) | | 模型 | 策略 | Top-1 Acc | GPU 耗时(ms) | ARM CPU 耗时(ms) | 配置文件 | Inference模型 |
|:------:|:------:|:------:|:------:|:------:| |:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| MobileNetV1 | Baseline | 70.90 | - | 33.15 | | MobileNetV1 | Baseline | 70.90 | - | 33.15 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar) |
| MobileNetV1 | 量化+蒸馏 | 70.57 | - | 13.64 | | MobileNetV1 | 量化+蒸馏 | 70.57 | - | 13.64 | [Config](./configs/MobileNetV1/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV1_QAT.tar) |
| ResNet50_vd | Baseline | 79.12 | 3.19 | - | | ResNet50_vd | Baseline | 79.12 | 3.19 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar) |
| ResNet50_vd | 量化+蒸馏 | 78.74 | 0.92 | - | | ResNet50_vd | 量化+蒸馏 | 78.74 | 0.92 | - | [Config](./configs/ResNet50_vd/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/ResNet50_vd_QAT.tar) |
| ShuffleNetV2_x1_0 | Baseline | 68.65 | - | 10.43 | | ShuffleNetV2_x1_0 | Baseline | 68.65 | - | 10.43 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x1_0_infer.tar) |
| ShuffleNetV2_x1_0 | 量化+蒸馏 | 68.32 | - | 5.51 | | ShuffleNetV2_x1_0 | 量化+蒸馏 | 68.32 | - | 5.51 | [Config](./configs/ShuffleNetV2_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/ShuffleNetV2_x1_0_QAT.tar) |
| SqueezeNet1_0_infer | Baseline | 59.60 | - | 35.98 | | SqueezeNet1_0 | Baseline | 59.60 | - | 35.98 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SqueezeNet1_0_infer.tar) |
| SqueezeNet1_0_infer | 量化+蒸馏 | 59.45 | - | 16.96 | | SqueezeNet1_0 | 量化+蒸馏 | 59.45 | - | 16.96 | [Config](./configs/SqueezeNet1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/SqueezeNet1_0_QAT.tar) |
| PPLCNetV2_base | Baseline | 76.86 | - | 36.50 | | PPLCNetV2_base | Baseline | 76.86 | - | 36.50 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_infer.tar) |
| PPLCNetV2_base | 量化+蒸馏 | 76.43 | - | 15.79 | | PPLCNetV2_base | 量化+蒸馏 | 76.43 | - | 15.79 | [Config](./configs/PPLCNetV2_base/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/PPLCNetV2_base_QAT.tar) |
| PPHGNet_tiny | Baseline | 79.59 | 2.82 | - | | PPHGNet_tiny | Baseline | 79.59 | 2.82 | - | - |[Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPHGNet_tiny_infer.tar) |
| PPHGNet_tiny | 量化+蒸馏 | 79.20 | 0.98 | - | | PPHGNet_tiny | 量化+蒸馏 | 79.20 | 0.98 | - | [Config](./configs/PPHGNet_tiny/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/PPHGNet_tiny_QAT.tar) |
| InceptionV3 | Baseline | 79.14 | 4.79 | - | | InceptionV3 | Baseline | 79.14 | 4.79 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/InceptionV3_infer.tar) |
| InceptionV3 | 量化+蒸馏 | 78.32 | 1.47 | - | | InceptionV3 | 量化+蒸馏 | 78.32 | 1.47 | - | [Config](./configs/InceptionV3/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/InceptionV3_QAT.tar) |
| EfficientNetB0 | Baseline | 77.02 | 1.95 | - | | EfficientNetB0 | Baseline | 77.02 | 1.95 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB0_infer.tar) |
| EfficientNetB0 | 量化+蒸馏 | 75.39 | 1.44 | - | | EfficientNetB0 | 量化+蒸馏 | 75.39 | 1.44 | - | [Config](./configs/EfficientNetB0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/EfficientNetB0_QAT.tar) |
| GhostNet_x1_0 | Baseline | 74.02 | 2.93 | - | | GhostNet_x1_0 | Baseline | 74.02 | 2.93 | - | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x1_0_infer.tar) |
| GhostNet_x1_0 | 量化+蒸馏 | 72.62 | 1.03 | - | | GhostNet_x1_0 | 量化+蒸馏 | 72.62 | 1.03 | - | [Config](./configs/GhostNet_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/GhostNet_x1_0_QAT.tar) |
| MobileNetV3_large_x1_0 | Baseline | 75.32 | - | 16.62 | | MobileNetV3_large_x1_0 | Baseline | 75.32 | - | 16.62 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar) |
| MobileNetV3_large_x1_0 | 量化+蒸馏 | 70.93 | - | 9.85 | | MobileNetV3_large_x1_0 | 量化+蒸馏 | 74.41 | - | 9.85 | [Config](./configs/MobileNetV3_large_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV3_large_x1_0_QAT.tar) |
| MobileNetV3_large_x1_0_ssld | Baseline | 78.96 | - | 16.62 | - | [Model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_ssld_infer.tar) |
| MobileNetV3_large_x1_0_ssld | 量化+蒸馏 | 77.17 | - | 9.85 | [Config](./configs/MobileNetV3_large_x1_0/qat_dis.yaml) | [Model](https://paddle-slim-models.bj.bcebos.com/act/MobileNetV3_large_x1_0_ssld_QAT.tar) |
- ARM CPU 测试环境:`SDM865(4xA77+4xA55)` - ARM CPU 测试环境:`SDM865(4xA77+4xA55)`
- Nvidia GPU 测试环境: - Nvidia GPU 测试环境:
......
...@@ -23,7 +23,7 @@ import paddle ...@@ -23,7 +23,7 @@ import paddle
import paddle.nn as nn import paddle.nn as nn
from paddle.io import DataLoader from paddle.io import DataLoader
from imagenet_reader import ImageNetDataset from imagenet_reader import ImageNetDataset
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
def argsparser(): def argsparser():
......
...@@ -22,7 +22,7 @@ import yaml ...@@ -22,7 +22,7 @@ import yaml
from utils import preprocess, postprocess from utils import preprocess, postprocess
import paddle import paddle
from paddle.inference import create_predictor from paddle.inference import create_predictor
from paddleslim.auto_compression.config_helpers import load_config from paddleslim.common import load_config
def argsparser(): def argsparser():
......
...@@ -24,7 +24,7 @@ import paddle ...@@ -24,7 +24,7 @@ import paddle
import paddle.nn as nn import paddle.nn as nn
from paddle.io import DataLoader from paddle.io import DataLoader
from imagenet_reader import ImageNetDataset from imagenet_reader import ImageNetDataset
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from paddleslim.auto_compression import AutoCompression from paddleslim.auto_compression import AutoCompression
...@@ -46,6 +46,11 @@ def argsparser(): ...@@ -46,6 +46,11 @@ def argsparser():
type=int, type=int,
default=1281167, default=1281167,
help="the number of total training images.") help="the number of total training images.")
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser return parser
...@@ -122,7 +127,12 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): ...@@ -122,7 +127,12 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
def main(): def main():
rank_id = paddle.distributed.get_rank() rank_id = paddle.distributed.get_rank()
place = paddle.CUDAPlace(rank_id) if args.devices == 'gpu':
place = paddle.CUDAPlace(rank_id)
paddle.set_device('gpu')
else:
place = paddle.CPUPlace()
paddle.set_device('cpu')
global global_config global global_config
all_config = load_slim_config(args.config_path) all_config = load_slim_config(args.config_path)
......
...@@ -15,7 +15,7 @@ from paddlenlp.datasets import load_dataset ...@@ -15,7 +15,7 @@ from paddlenlp.datasets import load_dataset
from paddlenlp.data import Stack, Tuple, Pad from paddlenlp.data import Stack, Tuple, Pad
from paddlenlp.data.sampler import SamplerHelper from paddlenlp.data.sampler import SamplerHelper
from paddlenlp.metrics import Mcc, PearsonAndSpearman from paddlenlp.metrics import Mcc, PearsonAndSpearman
from paddleslim.auto_compression.config_helpers import load_config from paddleslim.common import load_config
from paddleslim.auto_compression.compressor import AutoCompression from paddleslim.auto_compression.compressor import AutoCompression
......
...@@ -27,7 +27,7 @@ from paddlenlp.transformers import AutoModelForTokenClassification, AutoTokenize ...@@ -27,7 +27,7 @@ from paddlenlp.transformers import AutoModelForTokenClassification, AutoTokenize
from paddlenlp.datasets import load_dataset from paddlenlp.datasets import load_dataset
from paddlenlp.data import Stack, Tuple, Pad from paddlenlp.data import Stack, Tuple, Pad
from paddlenlp.metrics import AccuracyAndF1, Mcc, PearsonAndSpearman from paddlenlp.metrics import AccuracyAndF1, Mcc, PearsonAndSpearman
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from paddleslim.auto_compression.compressor import AutoCompression from paddleslim.auto_compression.compressor import AutoCompression
......
# YOLO系列模型自动压缩示例
目录:
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
- [3.1 环境准备](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 测试模型精度](#34-测试模型精度)
- [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
## 1. 简介
本示例将以以[ultralytics/yolov5](https://github.com/ultralytics/yolov5)[meituan/YOLOv6](https://github.com/meituan/YOLOv6)[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) 目标检测模型为例,借助[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)的能力,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行模型压缩,压缩后的模型可使用Paddle Inference或者导出至ONNX,利用TensorRT部署。
## 2.Benchmark
| 模型 | 策略 | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 模型体积 | 预测时延<sup><small>FP32</small><sup><br><sup> |预测时延<sup><small>FP16</small><sup><br><sup> | 预测时延<sup><small>INT8</small><sup><br><sup> | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| YOLOv5s | Base模型 | 640*640 | 37.4 | 28.1MB | 5.95ms | 2.44ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx) |
| YOLOv5s | 离线量化 | 640*640 | 36.0 | 7.4MB | - | - | 1.87ms | [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series) | - |
| YOLOv5s | ACT量化训练 | 640*640 | **36.9** | 7.4MB | - | - | **1.87ms** | [config](./configs/yolov5s_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.onnx) |
| | | | | | | | | |
| YOLOv6s | Base模型 | 640*640 | 42.4 | 65.9MB | 9.06ms | 2.90ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx) |
| YOLOv6s | KL离线量化 | 640*640 | 30.3 | 16.8MB | - | - | 1.83ms | [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series) | - |
| YOLOv6s | 量化蒸馏训练 | 640*640 | **41.3** | 16.8MB | - | - | **1.83ms** | [config](./configs/yolov6s_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.onnx) |
| | | | | | | | | |
| YOLOv7 | Base模型 | 640*640 | 51.1 | 141MB | 26.84ms | 7.44ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) |
| YOLOv7 | 离线量化 | 640*640 | 50.2 | 36MB | - | - | 4.55ms | [config](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series) | - |
| YOLOv7 | ACT量化训练 | 640*640 | **50.9** | 36MB | - | - | **4.55ms** | [config](./configs/yolov7_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.onnx) |
| | | | | | | | | |
| YOLOv7-Tiny | Base模型 | 640*640 | 37.3 | 24MB | 5.06ms | 2.32ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx) |
| YOLOv7-Tiny | 离线量化 | 640*640 | 35.8 | 6.1MB | - | - | 1.68ms | - | - |
| YOLOv7-Tiny | ACT量化训练 | 640*640 | **37.0** | 6.1MB | - | - | **1.68ms** | [config](./configs/yolov7_tiny_qat_dis.yaml) | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.tar) &#124; [ONNX Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_tiny_quant.onnx) |
说明:
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv7模型在Tesla T4的GPU环境下开启TensorRT 8.4.1,batch_size=1, 测试脚本是[cpp_infer](./cpp_infer)
## 3. 自动压缩流程
#### 3.1 准备环境
- PaddlePaddle >= 2.3.2版本 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)根据相应环境的安装指令进行安装)
- PaddleSlim develop 版本
(1)安装paddlepaddle
```
# CPU
pip install paddlepaddle==2.3.2
# GPU
pip install paddlepaddle-gpu==2.3.2
```
(2)安装paddleslim:
```shell
git clone https://github.com/PaddlePaddle/PaddleSlim.git & cd PaddleSlim
python setup.py install
```
#### 3.2 准备数据集
本示例默认以COCO数据进行自动压缩实验,可以从[MS COCO官网](https://cocodataset.org)下载[Train](http://images.cocodataset.org/zips/train2017.zip)[Val](http://images.cocodataset.org/zips/val2017.zip)[annotation](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
目录格式如下:
```
dataset/coco/
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
```
如果是自定义数据集,请按照如上COCO数据格式准备数据。
#### 3.3 准备预测模型
(1)准备ONNX模型:
- YOLOv5:
本示例模型使用[ultralytics/yolov5](https://github.com/ultralytics/yolov5)的master分支导出,要求v6.1之后的ONNX模型,可以根据官方的[导出教程](https://github.com/ultralytics/yolov5/issues/251)来准备ONNX模型。也可以下载准备好的[yolov5s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx)
```shell
python export.py --weights yolov5s.pt --include onnx
```
- YOLOv6:
可通过[meituan/YOLOv6](https://github.com/meituan/YOLOv6)官方的[导出教程](https://github.com/meituan/YOLOv6/blob/main/deploy/ONNX/README.md)来准备ONNX模型。也可以下载已经准备好的[yolov6s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx)
- YOLOv7: 可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型,具体步骤如下:
```shell
git clone https://github.com/WongKinYiu/yolov7.git
python export.py --weights yolov7-tiny.pt --grid
```
**注意**:目前ACT支持**不带NMS**模型,使用如上命令导出即可。也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx)
#### 3.4 自动压缩并产出模型
蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。
本示例启动自动压缩以YOLOv7-Tiny为例,如果想要更换模型,可修改`--config_path`路径即可,具体运行命令为:
- 单卡训练:
```
export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
```
- 多卡训练:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
--config_path=./configs/yolov7_tiny_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov7_tiny_qat_dis.yaml
```
## 4.预测部署
#### 导出至ONNX使用TensorRT部署
执行完自动压缩后会默认在`save_dir`中生成`quant_model.onnx`的ONNX模型文件,可以直接使用TensorRT测试脚本进行验证。
- 进行测试:
```shell
python yolov7_onnx_trt.py --model_path=output/quant_model.onnx --image_file=images/000000570688.jpg --precision=int8
```
#### Paddle-TensorRT部署
- C++部署
进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
```shell
# 编译
bash complie.sh
# 执行
./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
```
- Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
```
## 5.FAQ
- 如果想对模型进行离线量化,可进入[YOLO系列模型离线量化示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/post_training_quantization/pytorch_yolo_series)中进行实验。
Global: Global:
reader_config: configs/yolov5_reader.yml model_dir: ./yolov5s.onnx
input_list: {'image': 'x2paddle_images'} dataset_dir: dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
Evaluation: True Evaluation: True
arch: 'YOLOv5' arch: YOLOv5
model_dir: ./yolov5s_infer
model_filename: model.pdmodel
params_filename: model.pdiparams
Distillation: Distillation:
alpha: 1.0 alpha: 1.0
loss: soft_label loss: soft_label
Quantization: Quantization:
onnx_format: true
use_pact: true use_pact: true
activation_quantize_type: 'moving_average_abs_max' activation_quantize_type: 'moving_average_abs_max'
quantize_op_types: quantize_op_types:
......
Global: Global:
reader_config: configs/yolov6_reader.yml model_dir: ./yolov6s.onnx
input_list: {'image': 'x2paddle_image_arrays'} dataset_dir: dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
Evaluation: True Evaluation: True
arch: 'YOLOv6' arch: YOLOv6
model_dir: ./yolov6s_infer
model_filename: model.pdmodel
params_filename: model.pdiparams
Distillation: Distillation:
alpha: 1.0 alpha: 1.0
loss: soft_label loss: soft_label
Quantization: Quantization:
onnx_format: true
activation_quantize_type: 'moving_average_abs_max' activation_quantize_type: 'moving_average_abs_max'
quantize_op_types: quantize_op_types:
- conv2d - conv2d
......
Global: Global:
reader_config: configs/yolov7_reader.yaml model_dir: ./yolov7.onnx
input_list: {'image': 'x2paddle_images'} dataset_dir: dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
Evaluation: True Evaluation: True
model_dir: ./yolov7_infer arch: YOLOv7
model_filename: model.pdmodel
params_filename: model.pdiparams
Distillation: Distillation:
alpha: 1.0 alpha: 1.0
loss: soft_label loss: soft_label
Quantization: Quantization:
onnx_format: true
activation_quantize_type: 'moving_average_abs_max' activation_quantize_type: 'moving_average_abs_max'
quantize_op_types: quantize_op_types:
- conv2d - conv2d
- depthwise_conv2d - depthwise_conv2d
TrainConfig: TrainConfig:
train_iter: 8000 train_iter: 5000
eval_iter: 1000 eval_iter: 1000
learning_rate: learning_rate:
type: CosineAnnealingDecay type: CosineAnnealingDecay
learning_rate: 0.00003 learning_rate: 0.00003
T_max: 8000 T_max: 8000
......
Global:
model_dir: ./yolov7-tiny.onnx
dataset_dir: dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
Evaluation: True
arch: YOLOv7
Distillation:
alpha: 1.0
loss: soft_label
Quantization:
onnx_format: true
activation_quantize_type: 'moving_average_abs_max'
quantize_op_types:
- conv2d
- depthwise_conv2d
TrainConfig:
train_iter: 5000
eval_iter: 1000
learning_rate:
type: CosineAnnealingDecay
learning_rate: 0.00003
T_max: 8000
optimizer_builder:
optimizer:
type: SGD
weight_decay: 0.00004
# YOLOv5 TensorRT Benchmark测试(Linux) # YOLOv7 TensorRT Benchmark测试(Linux)
## 环境准备 ## 环境准备
...@@ -22,21 +22,37 @@ CUDA_LIB=/usr/local/cuda/lib64 ...@@ -22,21 +22,37 @@ CUDA_LIB=/usr/local/cuda/lib64
TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/ TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
``` ```
## Paddle TensorRT测试 ## Paddle tensorRT测试
- FP32 - YOLOv5
``` ```
# FP32
./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp32 ./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp32
# FP16
./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp16
# INT8
./build/trt_run --model_file yolov5s_quant/model.pdmodel --params_file yolov5s_quant/model.pdiparams --run_mode=trt_int8
``` ```
- FP16 - YOLOv6
``` ```
./build/trt_run --model_file yolov5s_infer/model.pdmodel --params_file yolov5s_infer/model.pdiparams --run_mode=trt_fp16 # FP32
./build/trt_run --arch=YOLOv6 --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp32
# FP16
./build/trt_run --arch=YOLOv6 --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp16
# INT8
./build/trt_run --arch=YOLOv6 --model_file yolov6s_quant/model.pdmodel --params_file yolov6s_quant/model.pdiparams --run_mode=trt_int8
``` ```
- INT8
- YOLOv7
``` ```
./build/trt_run --model_file yolov5s_quant/model.pdmodel --params_file yolov5s_quant/model.pdiparams --run_mode=trt_int8 # FP32
./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp32
# FP16
./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp16
# INT8
./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
``` ```
## 原生TensorRT测试 ## 原生TensorRT测试
...@@ -49,6 +65,7 @@ trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp1 ...@@ -49,6 +65,7 @@ trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp1
# INT8 # INT8
trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --int8 trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --int8
``` ```
- 注:可把--onnx=yolov5s.onnx替换成yolov6s.onnx和yolov7.onnx模型
## 性能对比 ## 性能对比
...@@ -56,6 +73,12 @@ trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp1 ...@@ -56,6 +73,12 @@ trtexec --onnx=yolov5s.onnx --workspace=1024 --avgRuns=1000 --inputIOFormats=fp1
| :--------: | :--------: |:-------- |:--------: | :---------------------: | | :--------: | :--------: |:-------- |:--------: | :---------------------: |
| Paddle TensorRT | yolov5s | 5.95ms | 2.44ms | 1.87ms | | Paddle TensorRT | yolov5s | 5.95ms | 2.44ms | 1.87ms |
| TensorRT | yolov5s | 6.16ms | 2.58ms | 2.07ms | | TensorRT | yolov5s | 6.16ms | 2.58ms | 2.07ms |
| | | | | |
| Paddle TensorRT | YOLOv6s | 9.06ms | 2.90ms | 1.83ms |
| TensorRT | YOLOv6s | 8.59ms | 2.83ms | 1.87ms |
| | | | | |
| Paddle TensorRT | YOLOv7 | 26.84ms | 7.44ms | 4.55ms |
| TensorRT | YOLOv7 | 28.25ms | 7.23ms | 4.67ms |
环境: 环境:
- Tesla T4,TensorRT 8.4.1,CUDA 11.2 - Tesla T4,TensorRT 8.4.1,CUDA 11.2
......
...@@ -19,6 +19,7 @@ using phi::dtype::float16; ...@@ -19,6 +19,7 @@ using phi::dtype::float16;
DEFINE_string(model_dir, "", "Directory of the inference model."); DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_string(model_file, "", "Path of the inference model file."); DEFINE_string(model_file, "", "Path of the inference model file.");
DEFINE_string(params_file, "", "Path of the inference params file."); DEFINE_string(params_file, "", "Path of the inference params file.");
DEFINE_string(arch, "YOLOv5", "Architectures name, can be: YOLOv5, YOLOv6, YOLOv7.");
DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8"); DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
DEFINE_int32(batch_size, 1, "Batch size."); DEFINE_int32(batch_size, 1, "Batch size.");
DEFINE_int32(gpu_id, 0, "GPU card ID num."); DEFINE_int32(gpu_id, 0, "GPU card ID num.");
...@@ -106,11 +107,15 @@ int main(int argc, char *argv[]) { ...@@ -106,11 +107,15 @@ int main(int argc, char *argv[]) {
using dtype = float16; using dtype = float16;
std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0)); std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
int out_box_shape = 25200;
if (FLAGS_arch == "YOLOv6"){
out_box_shape = 8400;
}
dtype *out_data; dtype *out_data;
int out_data_size = FLAGS_batch_size * 25200 * 85; int out_data_size = FLAGS_batch_size * out_box_shape * 85;
cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped); cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
std::vector<int> out_shape{ FLAGS_batch_size, 1, 25200, 85}; std::vector<int> out_shape{ FLAGS_batch_size, 1, out_box_shape, 85};
run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape); run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
return 0; return 0;
} }
from pycocotools.coco import COCO
import cv2
import os
import numpy as np
import paddle
class COCOValDataset(paddle.io.Dataset):
def __init__(self,
dataset_dir=None,
image_dir=None,
anno_path=None,
img_size=[640, 640],
input_name='x2paddle_images'):
self.dataset_dir = dataset_dir
self.image_dir = image_dir
self.img_size = img_size
self.input_name = input_name
self.ann_file = os.path.join(dataset_dir, anno_path)
self.coco = COCO(self.ann_file)
ori_ids = list(sorted(self.coco.imgs.keys()))
# check gt bbox
clean_ids = []
for idx in ori_ids:
ins_anno_ids = self.coco.getAnnIds(imgIds=[idx], iscrowd=False)
instances = self.coco.loadAnns(ins_anno_ids)
num_bbox = 0
for inst in instances:
if inst.get('ignore', False):
continue
if 'bbox' not in inst.keys():
continue
elif not any(np.array(inst['bbox'])):
continue
else:
num_bbox += 1
if num_bbox > 0:
clean_ids.append(idx)
self.ids = clean_ids
def __getitem__(self, idx):
img_id = self.ids[idx]
img = self._get_img_data_from_img_id(img_id)
img, scale_factor = self.image_preprocess(img, self.img_size)
return {
'image': img,
'im_id': np.array([img_id]),
'scale_factor': scale_factor
}
def __len__(self):
return len(self.ids)
def _get_img_data_from_img_id(self, img_id):
img_info = self.coco.loadImgs(img_id)[0]
img_path = os.path.join(self.dataset_dir, self.image_dir,
img_info['file_name'])
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
return img
def _generate_scale(self, im, target_shape, keep_ratio=True):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
origin_shape = im.shape[:2]
if keep_ratio:
im_size_min = np.min(origin_shape)
im_size_max = np.max(origin_shape)
target_size_min = np.min(target_shape)
target_size_max = np.max(target_shape)
im_scale = float(target_size_min) / float(im_size_min)
if np.round(im_scale * im_size_max) > target_size_max:
im_scale = float(target_size_max) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
resize_h, resize_w = target_shape
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
def image_preprocess(self, img, target_shape):
# Resize image
im_scale_y, im_scale_x = self._generate_scale(img, target_shape)
img = cv2.resize(
img,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
# Pad
im_h, im_w = img.shape[:2]
h, w = target_shape[:]
if h != im_h or w != im_w:
canvas = np.ones((h, w, 3), dtype=np.float32)
canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
img = canvas
img = np.transpose(img / 255, [2, 0, 1])
scale_factor = np.array([im_scale_y, im_scale_x])
return img.astype(np.float32), scale_factor
class COCOTrainDataset(COCOValDataset):
def __getitem__(self, idx):
img_id = self.ids[idx]
img = self._get_img_data_from_img_id(img_id)
img, scale_factor = self.image_preprocess(img, self.img_size)
return {self.input_name: img}
...@@ -16,14 +16,12 @@ import os ...@@ -16,14 +16,12 @@ import os
import sys import sys
import numpy as np import numpy as np
import argparse import argparse
from tqdm import tqdm
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from paddleslim.common import load_config
from ppdet.core.workspace import create from paddleslim.common import load_inference_model
from ppdet.metrics import COCOMetric, VOCMetric from post_process import YOLOPostProcess, coco_metric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from dataset import COCOValDataset
from paddleslim.quant import quant_post_static
from post_process import YOLOv6PostProcess
def argsparser(): def argsparser():
...@@ -35,64 +33,62 @@ def argsparser(): ...@@ -35,64 +33,62 @@ def argsparser():
help="path of compression strategy config.", help="path of compression strategy config.",
required=True) required=True)
parser.add_argument( parser.add_argument(
'--save_dir', '--batch_size', type=int, default=1, help="Batch size of model input.")
type=str,
default='ptq_out',
help="directory to save compressed model.")
parser.add_argument( parser.add_argument(
'--devices', '--devices',
type=str, type=str,
default='gpu', default='gpu',
help="which device used to compress.") help="which device used to compress.")
parser.add_argument(
'--algo', type=str, default='KL', help="post quant algo.")
return parser return parser
def reader_wrapper(reader, input_list): def eval():
def gen():
for data in reader:
in_dict = {}
if isinstance(input_list, list):
for input_name in input_list:
in_dict[input_name] = data[input_name]
elif isinstance(input_list, dict):
for input_name in input_list.keys():
in_dict[input_list[input_name]] = data[input_name]
yield in_dict
return gen place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
exe = paddle.static.Executor(place)
val_program, feed_target_names, fetch_targets = load_inference_model(
global_config["model_dir"], exe)
bboxes_list, bbox_nums_list, image_id_list = [], [], []
with tqdm(
total=len(val_loader),
bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
ncols=80) as t:
for data in val_loader:
data_all = {k: np.array(v) for k, v in data.items()}
outs = exe.run(val_program,
feed={feed_target_names[0]: data_all['image']},
fetch_list=fetch_targets,
return_numpy=False)
postprocess = YOLOPostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor'])
bboxes_list.append(res['bbox'])
bbox_nums_list.append(res['bbox_num'])
image_id_list.append(np.array(data_all['im_id']))
t.update()
coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
def main(): def main():
global global_config global global_config
all_config = load_slim_config(FLAGS.config_path) all_config = load_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
global_config = all_config["Global"] global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
reader_cfg['worker_num'],
return_list=True)
train_loader = reader_wrapper(train_loader, global_config['input_list'])
place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() global val_loader
exe = paddle.static.Executor(place) dataset = COCOValDataset(
quant_post_static( dataset_dir=global_config['dataset_dir'],
executor=exe, image_dir=global_config['val_image_dir'],
model_dir=global_config["model_dir"], anno_path=global_config['val_anno_path'])
quantize_model_path=FLAGS.save_dir, global anno_file
data_loader=train_loader, anno_file = dataset.ann_file
model_filename=global_config["model_filename"], val_loader = paddle.io.DataLoader(
params_filename=global_config["params_filename"], dataset, batch_size=FLAGS.batch_size, drop_last=True)
batch_size=32,
batch_nums=10, eval()
algo=FLAGS.algo,
hist_percent=0.999,
is_full_quantize=False,
bias_correction=False,
onnx_format=False)
if __name__ == '__main__': if __name__ == '__main__':
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import cv2
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import os
import time
import random
import argparse
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
EXPLICIT_PRECISION = 1 << (
int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_PRECISION)
# load coco labels
CLASS_LABEL = [
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train",
"truck", "boat", "traffic light", "fire hydrant", "stop sign",
"parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag",
"tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
"baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon",
"bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot",
"hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant",
"bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote",
"keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
"refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"
]
def preprocess(image, input_size, mean=None, std=None, swap=(2, 0, 1)):
if len(image.shape) == 3:
padded_img = np.ones((input_size[0], input_size[1], 3)) * 114.0
else:
padded_img = np.ones(input_size) * 114.0
img = np.array(image)
r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
resized_img = cv2.resize(
img,
(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR, ).astype(np.float32)
padded_img[:int(img.shape[0] * r), :int(img.shape[1] * r)] = resized_img
padded_img = padded_img[:, :, ::-1]
padded_img /= 255.0
if mean is not None:
padded_img -= mean
if std is not None:
padded_img /= std
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, r
def postprocess(predictions, ratio):
boxes = predictions[:, :4]
scores = predictions[:, 4:5] * predictions[:, 5:]
boxes_xyxy = np.ones_like(boxes)
boxes_xyxy[:, 0] = boxes[:, 0] - boxes[:, 2] / 2.
boxes_xyxy[:, 1] = boxes[:, 1] - boxes[:, 3] / 2.
boxes_xyxy[:, 2] = boxes[:, 0] + boxes[:, 2] / 2.
boxes_xyxy[:, 3] = boxes[:, 1] + boxes[:, 3] / 2.
boxes_xyxy /= ratio
dets = multiclass_nms(boxes_xyxy, scores, nms_thr=0.45, score_thr=0.1)
return dets
def nms(boxes, scores, nms_thr):
"""Single class NMS implemented in Numpy."""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= nms_thr)[0]
order = order[inds + 1]
return keep
def multiclass_nms(boxes, scores, nms_thr, score_thr):
"""Multiclass NMS implemented in Numpy"""
final_dets = []
num_classes = scores.shape[1]
for cls_ind in range(num_classes):
cls_scores = scores[:, cls_ind]
valid_score_mask = cls_scores > score_thr
if valid_score_mask.sum() == 0:
continue
else:
valid_scores = cls_scores[valid_score_mask]
valid_boxes = boxes[valid_score_mask]
keep = nms(valid_boxes, valid_scores, nms_thr)
if len(keep) > 0:
cls_inds = np.ones((len(keep), 1)) * cls_ind
dets = np.concatenate(
[valid_boxes[keep], valid_scores[keep, None], cls_inds], 1)
final_dets.append(dets)
if len(final_dets) == 0:
return None
return np.concatenate(final_dets, 0)
def get_color_map_list(num_classes):
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def draw_box(img, boxes, scores, cls_ids, conf=0.5, class_names=None):
color_list = get_color_map_list(len(class_names))
for i in range(len(boxes)):
box = boxes[i]
cls_id = int(cls_ids[i])
color = tuple(color_list[cls_id])
score = scores[i]
if score < conf:
continue
x0 = int(box[0])
y0 = int(box[1])
x1 = int(box[2])
y1 = int(box[3])
text = '{}:{:.1f}%'.format(class_names[cls_id], score * 100)
font = cv2.FONT_HERSHEY_SIMPLEX
txt_size = cv2.getTextSize(text, font, 0.4, 1)[0]
cv2.rectangle(img, (x0, y0), (x1, y1), color, 2)
cv2.rectangle(img, (x0, y0 + 1),
(x0 + txt_size[0] + 1, y0 + int(1.5 * txt_size[1])),
color, -1)
cv2.putText(
img,
text, (x0, y0 + txt_size[1]),
font,
0.8, (0, 255, 0),
thickness=2)
return img
def get_engine(precision, model_file_path):
# TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
TRT_LOGGER = trt.Logger()
builder = trt.Builder(TRT_LOGGER)
config = builder.create_builder_config()
if precision == 'int8':
network = builder.create_network(EXPLICIT_BATCH | EXPLICIT_PRECISION)
else:
network = builder.create_network(EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
runtime = trt.Runtime(TRT_LOGGER)
if model_file_path.endswith('.trt'):
# If a serialized engine exists, use it instead of building an engine.
print("Reading engine from file {}".format(model_file_path))
with open(model_file_path,
"rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
for i in range(network.num_layers):
layer = network.get_layer(i)
print(i, layer.name)
return engine
else:
config.max_workspace_size = 1 << 30
if precision == "fp16":
if not builder.platform_has_fast_fp16:
print("FP16 is not supported natively on this platform/device")
else:
config.set_flag(trt.BuilderFlag.FP16)
elif precision == "int8":
if not builder.platform_has_fast_int8:
print("INT8 is not supported natively on this platform/device")
else:
if builder.platform_has_fast_fp16:
# Also enable fp16, as some layers may be even more efficient in fp16 than int8
config.set_flag(trt.BuilderFlag.FP16)
config.set_flag(trt.BuilderFlag.INT8)
builder.max_batch_size = 1
print('Loading ONNX file from path {}...'.format(model_file_path))
with open(model_file_path, 'rb') as model:
print('Beginning ONNX file parsing')
if not parser.parse(model.read()):
print('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
print('Completed parsing of ONNX file')
print('Building an engine from file {}; this may take a while...'.
format(model_file_path))
plan = builder.build_serialized_network(network, config)
engine = runtime.deserialize_cuda_engine(plan)
print("Completed creating Engine")
with open(model_file_path, "wb") as f:
f.write(engine.serialize())
for i in range(network.num_layers):
layer = network.get_layer(i)
print(i, layer.name)
return engine
# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(
binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
def run_inference(context, bindings, inputs, outputs, stream):
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
def main(args):
onnx_model = args.model_path
img_path = args.image_file
num_class = len(CLASS_LABEL)
repeat = 1000
engine = get_engine(args.precision, onnx_model)
model_all_names = []
for idx in range(engine.num_bindings):
is_input = engine.binding_is_input(idx)
name = engine.get_binding_name(idx)
op_type = engine.get_binding_dtype(idx)
model_all_names.append(name)
shape = engine.get_binding_shape(idx)
print('input id:', idx, ' is input: ', is_input, ' binding name:',
name, ' shape:', shape, 'type: ', op_type)
context = engine.create_execution_context()
print('Allocate buffers ...')
inputs, outputs, bindings, stream = allocate_buffers(engine)
print("TRT set input ...")
origin_img = cv2.imread(img_path)
input_shape = [args.img_shape, args.img_shape]
input_image, ratio = preprocess(origin_img, input_shape)
inputs[0].host = np.expand_dims(input_image, axis=0)
for _ in range(0, 50):
trt_outputs = run_inference(
context,
bindings=bindings,
inputs=inputs,
outputs=outputs,
stream=stream)
time1 = time.time()
for _ in range(0, repeat):
trt_outputs = run_inference(
context,
bindings=bindings,
inputs=inputs,
outputs=outputs,
stream=stream)
time2 = time.time()
# total time cost(ms)
total_inference_cost = (time2 - time1) * 1000
print("model path: ", onnx_model, " precision: ", args.precision)
print("In TensorRT, ",
"average latency is : {} ms".format(total_inference_cost / repeat))
# Do postprocess
output = trt_outputs[0]
predictions = np.reshape(output, (1, -1, int(5 + num_class)))[0]
dets = postprocess(predictions, ratio)
# Draw rectangles and labels on the original image
if dets is not None:
final_boxes, final_scores, final_cls_inds = dets[:, :
4], dets[:, 4], dets[:,
5]
origin_img = draw_box(
origin_img,
final_boxes,
final_scores,
final_cls_inds,
conf=0.5,
class_names=CLASS_LABEL)
cv2.imwrite('output.jpg', origin_img)
print('The prediction results are saved in output.jpg.')
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_path',
type=str,
default="quant_model.onnx",
help="inference model filepath")
parser.add_argument(
'--image_file', type=str, default="bus.jpg", help="image path")
parser.add_argument(
'--precision', type=str, default='fp32', help="support fp32/fp16/int8.")
parser.add_argument('--img_shape', type=int, default=640, help="input_size")
args = parser.parse_args()
main(args)
...@@ -244,8 +244,9 @@ def predict_image(predictor, ...@@ -244,8 +244,9 @@ def predict_image(predictor,
threshold=0.5, threshold=0.5,
arch='YOLOv5'): arch='YOLOv5'):
img, scale_factor = image_preprocess(image_file, image_shape) img, scale_factor = image_preprocess(image_file, image_shape)
inputs = {} if arch == 'YOLOv6':
if arch == 'YOLOv5': inputs['x2paddle_image_arrays'] = img
else:
inputs['x2paddle_images'] = img inputs['x2paddle_images'] = img
input_names = predictor.get_input_names() input_names = predictor.get_input_names()
for i in range(len(input_names)): for i in range(len(input_names)):
...@@ -306,6 +307,8 @@ if __name__ == '__main__': ...@@ -306,6 +307,8 @@ if __name__ == '__main__':
default='GPU', default='GPU',
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU" help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
) )
parser.add_argument(
'--arch', type=str, default='YOLOv5', help="architectures name.")
parser.add_argument('--img_shape', type=int, default=640, help="input_size") parser.add_argument('--img_shape', type=int, default=640, help="input_size")
args = parser.parse_args() args = parser.parse_args()
...@@ -319,4 +322,5 @@ if __name__ == '__main__': ...@@ -319,4 +322,5 @@ if __name__ == '__main__':
args.image_file, args.image_file,
image_shape=[args.img_shape, args.img_shape], image_shape=[args.img_shape, args.img_shape],
warmup=warmup, warmup=warmup,
repeats=repeats) repeats=repeats,
arch=args.arch)
...@@ -14,6 +14,8 @@ ...@@ -14,6 +14,8 @@
import numpy as np import numpy as np
import cv2 import cv2
import json
import sys
def box_area(boxes): def box_area(boxes):
...@@ -68,9 +70,9 @@ def nms(boxes, scores, iou_threshold): ...@@ -68,9 +70,9 @@ def nms(boxes, scores, iou_threshold):
return keep return keep
class YOLOv7PostProcess(object): class YOLOPostProcess(object):
""" """
Post process of YOLOv6 network. Post process of YOLO-series network.
args: args:
score_threshold(float): Threshold to filter out bounding boxes with low score_threshold(float): Threshold to filter out bounding boxes with low
confidence score. If not provided, consider all boxes. confidence score. If not provided, consider all boxes.
...@@ -157,8 +159,8 @@ class YOLOv7PostProcess(object): ...@@ -157,8 +159,8 @@ class YOLOv7PostProcess(object):
if len(pred.shape) == 1: if len(pred.shape) == 1:
pred = pred[np.newaxis, :] pred = pred[np.newaxis, :]
pred_bboxes = pred[:, :4] pred_bboxes = pred[:, :4]
scale_factor = np.tile(scale_factor[i][::-1], (1, 2)) scale = np.tile(scale_factor[i][::-1], (2))
pred_bboxes /= scale_factor pred_bboxes /= scale
bbox = np.concatenate( bbox = np.concatenate(
[ [
pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis], pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
...@@ -171,3 +173,59 @@ class YOLOv7PostProcess(object): ...@@ -171,3 +173,59 @@ class YOLOv7PostProcess(object):
bboxs = np.concatenate(bboxs, axis=0) bboxs = np.concatenate(bboxs, axis=0)
box_nums = np.array(box_nums) box_nums = np.array(box_nums)
return {'bbox': bboxs, 'bbox_num': box_nums} return {'bbox': bboxs, 'bbox_num': box_nums}
def coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list):
try:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
except:
print(
"[ERROR] Not found pycocotools, please install by `pip install pycocotools`"
)
sys.exit(1)
coco_gt = COCO(anno_file)
cats = coco_gt.loadCats(coco_gt.getCatIds())
clsid2catid = {i: cat['id'] for i, cat in enumerate(cats)}
results = []
for bboxes, bbox_nums, image_id in zip(bboxes_list, bbox_nums_list,
image_id_list):
results += _get_det_res(bboxes, bbox_nums, image_id, clsid2catid)
output = "bbox.json"
with open(output, 'w') as f:
json.dump(results, f)
coco_dt = coco_gt.loadRes(output)
coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
return coco_eval.stats
def _get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map):
det_res = []
k = 0
for i in range(len(bbox_nums)):
cur_image_id = int(image_id[i][0])
det_nums = bbox_nums[i]
for j in range(det_nums):
dt = bboxes[k]
k = k + 1
num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
if int(num_id) < 0:
continue
category_id = label_to_cat_id_map[int(num_id)]
w = xmax - xmin
h = ymax - ymin
bbox = [xmin, ymin, w, h]
dt_res = {
'image_id': cur_image_id,
'category_id': category_id,
'bbox': bbox,
'score': score
}
det_res.append(dt_res)
return det_res
...@@ -16,14 +16,12 @@ import os ...@@ -16,14 +16,12 @@ import os
import sys import sys
import numpy as np import numpy as np
import argparse import argparse
from tqdm import tqdm
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from paddleslim.common import load_config
from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
from paddleslim.auto_compression import AutoCompression from paddleslim.auto_compression import AutoCompression
from dataset import COCOValDataset, COCOTrainDataset
from post_process import YOLOv6PostProcess from post_process import YOLOPostProcess, coco_metric
def argsparser(): def argsparser():
...@@ -50,124 +48,72 @@ def argsparser(): ...@@ -50,124 +48,72 @@ def argsparser():
return parser return parser
def reader_wrapper(reader, input_list):
def gen():
for data in reader:
in_dict = {}
if isinstance(input_list, list):
for input_name in input_list:
in_dict[input_name] = data[input_name]
elif isinstance(input_list, dict):
for input_name in input_list.keys():
in_dict[input_list[input_name]] = data[input_name]
yield in_dict
return gen
def convert_numpy_data(data, metric):
data_all = {}
data_all = {k: np.array(v) for k, v in data.items()}
if isinstance(metric, VOCMetric):
for k, v in data_all.items():
if not isinstance(v[0], np.ndarray):
tmp_list = []
for t in v:
tmp_list.append(np.array(t))
data_all[k] = np.array(tmp_list)
else:
data_all = {k: np.array(v) for k, v in data.items()}
return data_all
def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
metric = global_config['metric'] bboxes_list, bbox_nums_list, image_id_list = [], [], []
for batch_id, data in enumerate(val_loader): with tqdm(
data_all = convert_numpy_data(data, metric) total=len(val_loader),
data_input = {} bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
for k, v in data.items(): ncols=80) as t:
if isinstance(global_config['input_list'], list): for data in val_loader:
if k in test_feed_names: data_all = {k: np.array(v) for k, v in data.items()}
data_input[k] = np.array(v) outs = exe.run(compiled_test_program,
elif isinstance(global_config['input_list'], dict): feed={test_feed_names[0]: data_all['image']},
if k in global_config['input_list'].keys(): fetch_list=test_fetch_list,
data_input[global_config['input_list'][k]] = np.array(v) return_numpy=False)
outs = exe.run(compiled_test_program, res = {}
feed=data_input, postprocess = YOLOPostProcess(
fetch_list=test_fetch_list,
return_numpy=False)
res = {}
if 'arch' in global_config and global_config['arch'] == 'YOLOv6':
postprocess = YOLOv6PostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True) score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor']) res = postprocess(np.array(outs[0]), data_all['scale_factor'])
else: bboxes_list.append(res['bbox'])
for out in outs: bbox_nums_list.append(res['bbox_num'])
v = np.array(out) image_id_list.append(np.array(data_all['im_id']))
if len(v.shape) > 1: t.update()
res['bbox'] = v map_res = coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
else: return map_res[0]
res['bbox_num'] = v
metric.update(data_all, res)
if batch_id % 100 == 0:
print('Eval iter:', batch_id)
metric.accumulate()
metric.log()
map_res = metric.get_results()
metric.reset()
return map_res['bbox'][0]
def main(): def main():
global global_config global global_config
all_config = load_slim_config(FLAGS.config_path) all_config = load_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}" assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
global_config = all_config["Global"] global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config']) input_name = 'x2paddle_image_arrays' if global_config[
'arch'] == 'YOLOv6' else 'x2paddle_images'
train_loader = create('EvalReader')(reader_cfg['TrainDataset'], dataset = COCOTrainDataset(
reader_cfg['worker_num'], dataset_dir=global_config['dataset_dir'],
return_list=True) image_dir=global_config['train_image_dir'],
train_loader = reader_wrapper(train_loader, global_config['input_list']) anno_path=global_config['train_anno_path'],
input_name=input_name)
train_loader = paddle.io.DataLoader(
dataset, batch_size=1, shuffle=True, drop_last=True, num_workers=0)
if 'Evaluation' in global_config.keys() and global_config[ if 'Evaluation' in global_config.keys() and global_config[
'Evaluation'] and paddle.distributed.get_rank() == 0: 'Evaluation'] and paddle.distributed.get_rank() == 0:
eval_func = eval_function eval_func = eval_function
dataset = reader_cfg['EvalDataset']
global val_loader global val_loader
_eval_batch_sampler = paddle.io.BatchSampler( dataset = COCOValDataset(
dataset, batch_size=reader_cfg['EvalReader']['batch_size']) dataset_dir=global_config['dataset_dir'],
val_loader = create('EvalReader')(dataset, image_dir=global_config['val_image_dir'],
reader_cfg['worker_num'], anno_path=global_config['val_anno_path'])
batch_sampler=_eval_batch_sampler, global anno_file
return_list=True) anno_file = dataset.ann_file
metric = None val_loader = paddle.io.DataLoader(
if reader_cfg['metric'] == 'COCO': dataset,
clsid2catid = {v: k for k, v in dataset.catid2clsid.items()} batch_size=1,
anno_file = dataset.get_anno() shuffle=False,
metric = COCOMetric( drop_last=False,
anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox') num_workers=0)
elif reader_cfg['metric'] == 'VOC':
metric = VOCMetric(
label_list=dataset.get_label_list(),
class_num=reader_cfg['num_classes'],
map_type=reader_cfg['map_type'])
else:
raise ValueError("metric currently only supports COCO and VOC.")
global_config['metric'] = metric
else: else:
eval_func = None eval_func = None
ac = AutoCompression( ac = AutoCompression(
model_dir=global_config["model_dir"], model_dir=global_config["model_dir"],
model_filename=global_config["model_filename"], train_dataloader=train_loader,
params_filename=global_config["params_filename"],
save_dir=FLAGS.save_dir, save_dir=FLAGS.save_dir,
config=all_config, config=all_config,
train_dataloader=train_loader,
eval_callback=eval_func) eval_callback=eval_func)
ac.compress() ac.compress()
ac.export_onnx()
if __name__ == '__main__': if __name__ == '__main__':
......
# YOLOv5目标检测模型自动压缩示例
目录:
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
- [3.1 环境准备](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 测试模型精度](#34-测试模型精度)
- [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
## 1. 简介
飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
本示例将以[ultralytics/yolov5](https://github.com/ultralytics/yolov5)目标检测模型为例,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
## 2.Benchmark
| 模型 | 策略 | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| YOLOv5s | Base模型 | 640*640 | 37.4 | 5.95ms | 2.44ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov5s_infer.tar) |
| YOLOv5s | KL离线量化 | 640*640 | 36.0 | - | - | 1.87ms | - | - |
| YOLOv5s | 量化蒸馏训练 | 640*640 | **36.9** | - | - | **1.87ms** | [config](./configs/yolov5s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.tar) |
说明:
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv5s模型在Tesla T4的GPU环境下开启TensorRT 8.4.1,batch_size=1, 测试脚本是[cpp_infer](./cpp_infer)
## 3. 自动压缩流程
#### 3.1 准备环境
- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim >= 2.3
- PaddleDet >= 2.4
- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
- opencv-python
(1)安装paddlepaddle:
```shell
# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu
```
(2)安装paddleslim:
```shell
pip install paddleslim
```
(3)安装paddledet:
```shell
pip install paddledet
```
注:安装PaddleDet的目的是为了直接使用PaddleDetection中的Dataloader组件。
(4)安装X2Paddle的1.3.6以上版本:
```shell
pip install x2paddle sympy onnx
```
#### 3.2 准备数据集
本案例默认以COCO数据进行自动压缩实验,并且依赖PaddleDetection中数据读取模块,如果自定义COCO数据,或者其他格式数据,请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
如果已经准备好数据集,请直接修改[./configs/yolov6_reader.yml]中`EvalDataset``dataset_dir`字段为自己数据集路径即可。
#### 3.3 准备预测模型
(1)准备ONNX模型:
可通过[ultralytics/yolov5](https://github.com/ultralytics/yolov5) 官方的[导出教程](https://github.com/ultralytics/yolov5/issues/251)来准备ONNX模型。也可以下载准备好的[yolov5s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx)
```shell
python export.py --weights yolov5s.pt --include onnx
```
(2) 转换模型:
```shell
x2paddle --framework=onnx --model=yolov5s.onnx --save_dir=pd_model
cp -r pd_model/inference_model/ yolov5s_infer
```
即可得到YOLOv5s模型的预测模型(`model.pdmodel``model.pdiparams`)。如想快速体验,可直接下载上方表格中YOLOv5s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/detection/yolov5s_infer.tar)
预测模型的格式为:`model.pdmodel``model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。
#### 3.4 自动压缩并产出模型
蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为:
- 单卡训练:
```
export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/yolov5s_qat_dis.yaml --save_dir='./output/'
```
- 多卡训练:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
--config_path=./configs/yolov5s_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov5s_qat_dis.yaml
```
**注意**:如果要测试量化后的模型,模型路径需要在配置文件中`model_dir`字段下进行修改指定。
## 4.预测部署
#### Paddle-TensorRT C++部署
进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
```shell
# 编译
bash complie.sh
# 执行
./build/trt_run --model_file yolov5s_quant/model.pdmodel --params_file yolov5s_quant/model.pdiparams --run_mode=trt_int8
```
#### Paddle-TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
```
## 5.FAQ
- 如果想测试离线量化模型精度,可执行:
```shell
python post_quant.py --config_path=./configs/yolov5s_qat_dis.yaml
```
metric: COCO
num_classes: 80
# Datset configuration
TrainDataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco/
EvalDataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco/
worker_num: 0
# preprocess reader in test
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: [640, 640], keep_ratio: True}
- Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
- NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
- Permute: {}
batch_size: 1
#include <chrono>
#include <iostream>
#include <memory>
#include <numeric>
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <cuda_runtime.h>
#include "paddle/include/paddle_inference_api.h"
#include "paddle/include/experimental/phi/common/float16.h"
using paddle_infer::Config;
using paddle_infer::Predictor;
using paddle_infer::CreatePredictor;
using paddle_infer::PrecisionType;
using phi::dtype::float16;
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_string(model_file, "", "Path of the inference model file.");
DEFINE_string(params_file, "", "Path of the inference params file.");
DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
DEFINE_int32(batch_size, 1, "Batch size.");
DEFINE_int32(gpu_id, 0, "GPU card ID num.");
DEFINE_int32(trt_min_subgraph_size, 3, "tensorrt min_subgraph_size");
DEFINE_int32(warmup, 50, "warmup");
DEFINE_int32(repeats, 1000, "repeats");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::shared_ptr<Predictor> InitPredictor() {
Config config;
std::string model_path;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
model_path = FLAGS_model_dir.substr(0, FLAGS_model_dir.find_last_of("/"));
} else {
config.SetModel(FLAGS_model_file, FLAGS_params_file);
model_path = FLAGS_model_file.substr(0, FLAGS_model_file.find_last_of("/"));
}
// enable tune
std::cout << "model_path: " << model_path << std::endl;
config.EnableUseGpu(256, FLAGS_gpu_id);
if (FLAGS_run_mode == "trt_fp32") {
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
PrecisionType::kFloat32, false, false);
} else if (FLAGS_run_mode == "trt_fp16") {
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
PrecisionType::kHalf, false, false);
} else if (FLAGS_run_mode == "trt_int8") {
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
PrecisionType::kInt8, false, false);
}
config.EnableMemoryOptim();
config.SwitchIrOptim(true);
return CreatePredictor(config);
}
template <typename type>
void run(Predictor *predictor, const std::vector<type> &input,
const std::vector<int> &input_shape, type* out_data, std::vector<int> out_shape) {
// prepare input
int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
std::multiplies<int>());
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputHandle(input_names[0]);
input_t->Reshape(input_shape);
input_t->CopyFromCpu(input.data());
for (int i = 0; i < FLAGS_warmup; ++i)
CHECK(predictor->Run());
auto st = time();
for (int i = 0; i < FLAGS_repeats; ++i) {
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputHandle(input_names[0]);
input_t->Reshape(input_shape);
input_t->CopyFromCpu(input.data());
CHECK(predictor->Run());
auto output_names = predictor->GetOutputNames();
auto output_t = predictor->GetOutputHandle(output_names[0]);
std::vector<int> output_shape = output_t->shape();
output_t -> ShareExternalData<type>(out_data, out_shape, paddle_infer::PlaceType::kGPU);
}
LOG(INFO) << "[" << FLAGS_run_mode << " bs-" << FLAGS_batch_size << " ] run avg time is " << time_diff(st, time()) / FLAGS_repeats
<< " ms";
}
int main(int argc, char *argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = InitPredictor();
std::vector<int> input_shape = {FLAGS_batch_size, 3, 640, 640};
// float16
using dtype = float16;
std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
dtype *out_data;
int out_data_size = FLAGS_batch_size * 25200 * 85;
cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
std::vector<int> out_shape{ FLAGS_batch_size, 1, 25200, 85};
run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
return 0;
}
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
import argparse
import paddle
from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
from post_process import YOLOv5PostProcess
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--config_path',
type=str,
default=None,
help="path of compression strategy config.",
required=True)
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser
def reader_wrapper(reader, input_list):
def gen():
for data in reader:
in_dict = {}
if isinstance(input_list, list):
for input_name in input_list:
in_dict[input_name] = data[input_name]
elif isinstance(input_list, dict):
for input_name in input_list.keys():
in_dict[input_list[input_name]] = data[input_name]
yield in_dict
return gen
def convert_numpy_data(data, metric):
data_all = {}
data_all = {k: np.array(v) for k, v in data.items()}
if isinstance(metric, VOCMetric):
for k, v in data_all.items():
if not isinstance(v[0], np.ndarray):
tmp_list = []
for t in v:
tmp_list.append(np.array(t))
data_all[k] = np.array(tmp_list)
else:
data_all = {k: np.array(v) for k, v in data.items()}
return data_all
def eval():
place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
exe = paddle.static.Executor(place)
val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
global_config["model_dir"],
exe,
model_filename=global_config["model_filename"],
params_filename=global_config["params_filename"])
print('Loaded model from: {}'.format(global_config["model_dir"]))
metric = global_config['metric']
for batch_id, data in enumerate(val_loader):
data_all = convert_numpy_data(data, metric)
data_input = {}
for k, v in data.items():
if isinstance(global_config['input_list'], list):
if k in global_config['input_list']:
data_input[k] = np.array(v)
elif isinstance(global_config['input_list'], dict):
if k in global_config['input_list'].keys():
data_input[global_config['input_list'][k]] = np.array(v)
outs = exe.run(val_program,
feed=data_input,
fetch_list=fetch_targets,
return_numpy=False)
res = {}
if 'arch' in global_config and global_config['arch'] == 'YOLOv5':
postprocess = YOLOv5PostProcess(
score_threshold=0.001, nms_threshold=0.6, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor'])
else:
for out in outs:
v = np.array(out)
if len(v.shape) > 1:
res['bbox'] = v
else:
res['bbox_num'] = v
metric.update(data_all, res)
if batch_id % 100 == 0:
print('Eval iter:', batch_id)
metric.accumulate()
metric.log()
metric.reset()
def main():
global global_config
all_config = load_slim_config(FLAGS.config_path)
global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
dataset = reader_cfg['EvalDataset']
global val_loader
val_loader = create('EvalReader')(reader_cfg['EvalDataset'],
reader_cfg['worker_num'],
return_list=True)
metric = None
if reader_cfg['metric'] == 'COCO':
clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
anno_file = dataset.get_anno()
metric = COCOMetric(
anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
elif reader_cfg['metric'] == 'VOC':
metric = VOCMetric(
label_list=dataset.get_label_list(),
class_num=reader_cfg['num_classes'],
map_type=reader_cfg['map_type'])
else:
raise ValueError("metric currently only supports COCO and VOC.")
global_config['metric'] = metric
eval()
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices)
main()
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import numpy as np
import argparse
import time
from paddle.inference import Config
from paddle.inference import create_predictor
from post_process import YOLOv5PostProcess
CLASS_LABEL = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
def generate_scale(im, target_shape, keep_ratio=True):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
origin_shape = im.shape[:2]
if keep_ratio:
im_size_min = np.min(origin_shape)
im_size_max = np.max(origin_shape)
target_size_min = np.min(target_shape)
target_size_max = np.max(target_shape)
im_scale = float(target_size_min) / float(im_size_min)
if np.round(im_scale * im_size_max) > target_size_max:
im_scale = float(target_size_max) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
resize_h, resize_w = target_shape
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
def image_preprocess(img_path, target_shape):
img = cv2.imread(img_path)
# Resize
im_scale_y, im_scale_x = generate_scale(img, target_shape)
img = cv2.resize(
img,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
# Pad
im_h, im_w = img.shape[:2]
h, w = target_shape[:]
if h != im_h or w != im_w:
canvas = np.ones((h, w, 3), dtype=np.float32)
canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
img = canvas
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.transpose(img, [2, 0, 1]) / 255
img = np.expand_dims(img, 0)
scale_factor = np.array([[im_scale_y, im_scale_x]])
return img.astype(np.float32), scale_factor
def get_color_map_list(num_classes):
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def draw_box(image_file, results, class_label, threshold=0.5):
srcimg = cv2.imread(image_file, 1)
for i in range(len(results)):
color_list = get_color_map_list(len(class_label))
clsid2color = {}
classid, conf = int(results[i, 0]), results[i, 1]
if conf < threshold:
continue
xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
results[i, 4]), int(results[i, 5])
if classid not in clsid2color:
clsid2color[classid] = color_list[classid]
color = tuple(clsid2color[classid])
cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
print(class_label[classid] + ': ' + str(round(conf, 3)))
cv2.putText(
srcimg,
class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.8, (0, 255, 0),
thickness=2)
return srcimg
def load_predictor(model_dir,
run_mode='paddle',
batch_size=1,
device='CPU',
min_subgraph_size=3,
use_dynamic_shape=False,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
trt_calib_mode=False,
cpu_threads=1,
enable_mkldnn=False,
enable_mkldnn_bfloat16=False,
delete_shuffle_pass=False):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT.
Used by action model.
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
if device != 'GPU' and run_mode != 'paddle':
raise ValueError(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.format(run_mode, device))
config = Config(
os.path.join(model_dir, 'model.pdmodel'),
os.path.join(model_dir, 'model.pdiparams'))
if device == 'GPU':
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
elif device == 'XPU':
config.enable_lite_engine()
config.enable_xpu(10 * 1024 * 1024)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(cpu_threads)
if enable_mkldnn:
try:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.enable_mkldnn()
if enable_mkldnn_bfloat16:
config.enable_mkldnn_bfloat16()
except Exception as e:
print(
"The current environment does not support `mkldnn`, so disable mkldnn."
)
pass
precision_map = {
'trt_int8': Config.Precision.Int8,
'trt_fp32': Config.Precision.Float32,
'trt_fp16': Config.Precision.Half
}
if run_mode in precision_map.keys():
config.enable_tensorrt_engine(
workspace_size=(1 << 25) * batch_size,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
use_static=False,
use_calib_mode=trt_calib_mode)
if use_dynamic_shape:
min_input_shape = {
'image': [batch_size, 3, trt_min_shape, trt_min_shape]
}
max_input_shape = {
'image': [batch_size, 3, trt_max_shape, trt_max_shape]
}
opt_input_shape = {
'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
}
config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
opt_input_shape)
print('trt set dynamic shape done!')
# disable print log when predict
config.disable_glog_info()
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
if delete_shuffle_pass:
config.delete_pass("shuffle_channel_detect_pass")
predictor = create_predictor(config)
return predictor
def predict_image(predictor,
image_file,
image_shape=[640, 640],
warmup=1,
repeats=1,
threshold=0.5,
arch='YOLOv5'):
img, scale_factor = image_preprocess(image_file, image_shape)
inputs = {}
if arch == 'YOLOv5':
inputs['x2paddle_images'] = img
input_names = predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
for i in range(warmup):
predictor.run()
np_boxes = None
predict_time = 0.
time_min = float("inf")
time_max = float('-inf')
for i in range(repeats):
start_time = time.time()
predictor.run()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
time_avg = predict_time / repeats
print('Inference time(ms): min={}, max={}, avg={}'.format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
postprocess = YOLOv5PostProcess(
score_threshold=0.001, nms_threshold=0.6, multi_label=True)
res = postprocess(np_boxes, scale_factor)
res_img = draw_box(
image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
cv2.imwrite('result.jpg', res_img)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--image_file', type=str, default=None, help="image path")
parser.add_argument(
'--model_path', type=str, help="inference model filepath")
parser.add_argument(
'--benchmark',
type=bool,
default=False,
help="Whether run benchmark or not.")
parser.add_argument(
'--run_mode',
type=str,
default='paddle',
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
parser.add_argument(
'--device',
type=str,
default='GPU',
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
)
parser.add_argument('--img_shape', type=int, default=640, help="input_size")
args = parser.parse_args()
predictor = load_predictor(
args.model_path, run_mode=args.run_mode, device=args.device)
warmup, repeats = 1, 1
if args.benchmark:
warmup, repeats = 50, 100
predict_image(
predictor,
args.image_file,
image_shape=[args.img_shape, args.img_shape],
warmup=warmup,
repeats=repeats)
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import cv2
def box_area(boxes):
"""
Args:
boxes(np.ndarray): [N, 4]
return: [N]
"""
return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
def box_iou(box1, box2):
"""
Args:
box1(np.ndarray): [N, 4]
box2(np.ndarray): [M, 4]
return: [N, M]
"""
area1 = box_area(box1)
area2 = box_area(box2)
lt = np.maximum(box1[:, np.newaxis, :2], box2[:, :2])
rb = np.minimum(box1[:, np.newaxis, 2:], box2[:, 2:])
wh = rb - lt
wh = np.maximum(0, wh)
inter = wh[:, :, 0] * wh[:, :, 1]
iou = inter / (area1[:, np.newaxis] + area2 - inter)
return iou
def nms(boxes, scores, iou_threshold):
"""
Non Max Suppression numpy implementation.
args:
boxes(np.ndarray): [N, 4]
scores(np.ndarray): [N, 1]
iou_threshold(float): Threshold of IoU.
"""
idxs = scores.argsort()
keep = []
while idxs.size > 0:
max_score_index = idxs[-1]
max_score_box = boxes[max_score_index][None, :]
keep.append(max_score_index)
if idxs.size == 1:
break
idxs = idxs[:-1]
other_boxes = boxes[idxs]
ious = box_iou(max_score_box, other_boxes)
idxs = idxs[ious[0] <= iou_threshold]
keep = np.array(keep)
return keep
class YOLOv5PostProcess(object):
"""
Post process of YOLOv5 network.
args:
score_threshold(float): Threshold to filter out bounding boxes with low
confidence score. If not provided, consider all boxes.
nms_threshold(float): The threshold to be used in NMS.
multi_label(bool): Whether keep multi label in boxes.
keep_top_k(int): Number of total bboxes to be kept per image after NMS
step. -1 means keeping all bboxes after NMS step.
"""
def __init__(self,
score_threshold=0.25,
nms_threshold=0.5,
multi_label=False,
keep_top_k=300):
self.score_threshold = score_threshold
self.nms_threshold = nms_threshold
self.multi_label = multi_label
self.keep_top_k = keep_top_k
def _xywh2xyxy(self, x):
# Convert from [x, y, w, h] to [x1, y1, x2, y2]
y = np.copy(x)
y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x
y[:, 1] = x[:, 1] - x[:, 3] / 2 # top left y
y[:, 2] = x[:, 0] + x[:, 2] / 2 # bottom right x
y[:, 3] = x[:, 1] + x[:, 3] / 2 # bottom right y
return y
def _non_max_suppression(self, prediction):
max_wh = 4096 # (pixels) minimum and maximum box width and height
nms_top_k = 30000
cand_boxes = prediction[..., 4] > self.score_threshold # candidates
output = [np.zeros((0, 6))] * prediction.shape[0]
for batch_id, boxes in enumerate(prediction):
# Apply constraints
boxes = boxes[cand_boxes[batch_id]]
if not boxes.shape[0]:
continue
# Compute conf (conf = obj_conf * cls_conf)
boxes[:, 5:] *= boxes[:, 4:5]
# Box (center x, center y, width, height) to (x1, y1, x2, y2)
convert_box = self._xywh2xyxy(boxes[:, :4])
# Detections matrix nx6 (xyxy, conf, cls)
if self.multi_label:
i, j = (boxes[:, 5:] > self.score_threshold).nonzero()
boxes = np.concatenate(
(convert_box[i], boxes[i, j + 5, None],
j[:, None].astype(np.float32)),
axis=1)
else:
conf = np.max(boxes[:, 5:], axis=1)
j = np.argmax(boxes[:, 5:], axis=1)
re = np.array(conf.reshape(-1) > self.score_threshold)
conf = conf.reshape(-1, 1)
j = j.reshape(-1, 1)
boxes = np.concatenate((convert_box, conf, j), axis=1)[re]
num_box = boxes.shape[0]
if not num_box:
continue
elif num_box > nms_top_k:
boxes = boxes[boxes[:, 4].argsort()[::-1][:nms_top_k]]
# Batched NMS
c = boxes[:, 5:6] * max_wh
clean_boxes, scores = boxes[:, :4] + c, boxes[:, 4]
keep = nms(clean_boxes, scores, self.nms_threshold)
# limit detection box num
if keep.shape[0] > self.keep_top_k:
keep = keep[:self.keep_top_k]
output[batch_id] = boxes[keep]
return output
def __call__(self, outs, scale_factor):
preds = self._non_max_suppression(outs)
bboxs, box_nums = [], []
for i, pred in enumerate(preds):
if len(pred.shape) > 2:
pred = np.squeeze(pred)
if len(pred.shape) == 1:
pred = pred[np.newaxis, :]
pred_bboxes = pred[:, :4]
scale_factor = np.tile(scale_factor[i][::-1], (1, 2))
pred_bboxes /= scale_factor
bbox = np.concatenate(
[
pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
pred_bboxes
],
axis=-1)
bboxs.append(bbox)
box_num = bbox.shape[0]
box_nums.append(box_num)
bboxs = np.concatenate(bboxs, axis=0)
box_nums = np.array(box_nums)
return {'bbox': bboxs, 'bbox_num': box_nums}
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
import argparse
import paddle
from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
from paddleslim.auto_compression import AutoCompression
from post_process import YOLOv5PostProcess
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--config_path',
type=str,
default=None,
help="path of compression strategy config.",
required=True)
parser.add_argument(
'--save_dir',
type=str,
default='output',
help="directory to save compressed model.")
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser
def reader_wrapper(reader, input_list):
def gen():
for data in reader:
in_dict = {}
if isinstance(input_list, list):
for input_name in input_list:
in_dict[input_name] = data[input_name]
elif isinstance(input_list, dict):
for input_name in input_list.keys():
in_dict[input_list[input_name]] = data[input_name]
yield in_dict
return gen
def convert_numpy_data(data, metric):
data_all = {}
data_all = {k: np.array(v) for k, v in data.items()}
if isinstance(metric, VOCMetric):
for k, v in data_all.items():
if not isinstance(v[0], np.ndarray):
tmp_list = []
for t in v:
tmp_list.append(np.array(t))
data_all[k] = np.array(tmp_list)
else:
data_all = {k: np.array(v) for k, v in data.items()}
return data_all
def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
metric = global_config['metric']
for batch_id, data in enumerate(val_loader):
data_all = convert_numpy_data(data, metric)
data_input = {}
for k, v in data.items():
if isinstance(global_config['input_list'], list):
if k in test_feed_names:
data_input[k] = np.array(v)
elif isinstance(global_config['input_list'], dict):
if k in global_config['input_list'].keys():
data_input[global_config['input_list'][k]] = np.array(v)
outs = exe.run(compiled_test_program,
feed=data_input,
fetch_list=test_fetch_list,
return_numpy=False)
res = {}
if 'arch' in global_config and global_config['arch'] == 'YOLOv5':
postprocess = YOLOv5PostProcess(
score_threshold=0.001, nms_threshold=0.6, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor'])
else:
for out in outs:
v = np.array(out)
if len(v.shape) > 1:
res['bbox'] = v
else:
res['bbox_num'] = v
metric.update(data_all, res)
if batch_id % 100 == 0:
print('Eval iter:', batch_id)
metric.accumulate()
metric.log()
map_res = metric.get_results()
metric.reset()
return map_res['bbox'][0]
def main():
global global_config
all_config = load_slim_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
reader_cfg['worker_num'],
return_list=True)
train_loader = reader_wrapper(train_loader, global_config['input_list'])
if 'Evaluation' in global_config.keys() and global_config[
'Evaluation'] and paddle.distributed.get_rank() == 0:
eval_func = eval_function
dataset = reader_cfg['EvalDataset']
global val_loader
_eval_batch_sampler = paddle.io.BatchSampler(
dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
val_loader = create('EvalReader')(dataset,
reader_cfg['worker_num'],
batch_sampler=_eval_batch_sampler,
return_list=True)
metric = None
if reader_cfg['metric'] == 'COCO':
clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
anno_file = dataset.get_anno()
metric = COCOMetric(
anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
elif reader_cfg['metric'] == 'VOC':
metric = VOCMetric(
label_list=dataset.get_label_list(),
class_num=reader_cfg['num_classes'],
map_type=reader_cfg['map_type'])
else:
raise ValueError("metric currently only supports COCO and VOC.")
global_config['metric'] = metric
else:
eval_func = None
ac = AutoCompression(
model_dir=global_config["model_dir"],
model_filename=global_config["model_filename"],
params_filename=global_config["params_filename"],
save_dir=FLAGS.save_dir,
config=all_config,
train_dataloader=train_loader,
eval_callback=eval_func)
ac.compress()
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices)
main()
# YOLOv6自动压缩示例
目录:
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
- [3.1 环境准备](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 测试模型精度](#34-测试模型精度)
- [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
## 1. 简介
飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
本示例将以[meituan/YOLOv6](https://github.com/meituan/YOLOv6)目标检测模型为例,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
## 2.Benchmark
| 模型 | 策略 | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| YOLOv6s | Base模型 | 640*640 | 42.4 | 9.06ms | 2.90ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar) |
| YOLOv6s | KL离线量化 | 640*640 | 30.3 | - | - | 1.83ms | - | - |
| YOLOv6s | 量化蒸馏训练 | 640*640 | **41.3** | - | - | **1.83ms** | [config](./configs/yolov6s_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_quant.tar) |
说明:
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv6s模型在Tesla T4的GPU环境下开启TensorRT 8.4.1,batch_size=1, 测试脚本是[cpp_infer](./cpp_infer)
## 3. 自动压缩流程
#### 3.1 准备环境
- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim > 2.3版本
- PaddleDet >= 2.4
- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
- opencv-python
(1)安装paddlepaddle:
```shell
# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu
```
(2)安装paddleslim:
```shell
pip install paddleslim
```
(3)安装paddledet:
```shell
pip install paddledet
```
注:安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
(4)安装X2Paddle的1.3.6以上版本:
```shell
pip install x2paddle sympy onnx
```
#### 3.2 准备数据集
本案例默认以COCO数据进行自动压缩实验,并且依赖PaddleDetection中数据读取模块,如果自定义COCO数据,或者其他格式数据,请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
如果已经准备好数据集,请直接修改[./configs/yolov6_reader.yml]中`EvalDataset``dataset_dir`字段为自己数据集路径即可。
#### 3.3 准备预测模型
(1)准备ONNX模型:
可通过[meituan/YOLOv6](https://github.com/meituan/YOLOv6)官方的[导出教程](https://github.com/meituan/YOLOv6/blob/main/deploy/ONNX/README.md)来准备ONNX模型。也可以下载已经准备好的[yolov6s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx)
(2) 转换模型:
```
x2paddle --framework=onnx --model=yolov6s.onnx --save_dir=pd_model
cp -r pd_model/inference_model/ yolov6s_infer
```
即可得到YOLOv6s模型的预测模型(`model.pdmodel``model.pdiparams`)。如想快速体验,可直接下载上方表格中YOLOv6s的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_infer.tar)
预测模型的格式为:`model.pdmodel``model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。
#### 3.4 自动压缩并产出模型
蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为:
- 单卡训练:
```
export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/yolov6s_qat_dis.yaml --save_dir='./output/'
```
- 多卡训练:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
--config_path=./configs/yolov6s_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
修改[yolov6s_qat_dis.yaml](./configs/yolov6s_qat_dis.yaml)`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov6s_qat_dis.yaml
```
## 4.预测部署
#### Paddle-TensorRT C++部署
进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
```shell
# 编译
bash complie.sh
# 执行
./build/trt_run --model_file yolov6s_quant/model.pdmodel --params_file yolov6s_quant/model.pdiparams --run_mode=trt_int8
```
#### Paddle-TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
```
## 5.FAQ
- 如果想测试离线量化模型精度,可执行:
```shell
python post_quant.py --config_path=./configs/yolov6s_qat_dis.yaml
```
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
option(WITH_ROCM "Compile demo with rocm." OFF)
option(WITH_ONNXRUNTIME "Compile demo with ONNXRuntime" OFF)
option(WITH_ARM "Compile demo with ARM" OFF)
option(WITH_MIPS "Compile demo with MIPS" OFF)
option(WITH_SW "Compile demo with SW" OFF)
option(WITH_XPU "Compile demow ith xpu" OFF)
option(WITH_NPU "Compile demow ith npu" OFF)
if(NOT WITH_STATIC_LIB)
add_definitions("-DPADDLE_WITH_SHARED_LIB")
else()
# PD_INFER_DECL is mainly used to set the dllimport/dllexport attribute in dynamic library mode.
# Set it to empty in static library mode to avoid compilation issues.
add_definitions("/DPD_INFER_DECL=")
endif()
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
endif()
include_directories("${PADDLE_LIB}/")
set(PADDLE_LIB_THIRD_PARTY_PATH "${PADDLE_LIB}/third_party/install/")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/include")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib")
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
if (MSVC_STATIC_CRT)
if (WITH_MKL)
set(FLAG_OPENMP "/openmp")
endif()
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
safe_set_static_flag()
if (WITH_STATIC_LIB)
add_definitions(-DSTATIC_LIB)
endif()
endif()
else()
if(WITH_MKL)
set(FLAG_OPENMP "-fopenmp")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 ${FLAG_OPENMP}")
endif()
if(WITH_GPU)
if(NOT WIN32)
include_directories("/usr/local/cuda/include")
if(CUDA_LIB STREQUAL "")
set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
endif()
else()
include_directories("C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\include")
if(CUDA_LIB STREQUAL "")
set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
endif()
endif(NOT WIN32)
endif()
if (USE_TENSORRT AND WITH_GPU)
set(TENSORRT_ROOT "" CACHE STRING "The root directory of TensorRT library")
if("${TENSORRT_ROOT}" STREQUAL "")
message(FATAL_ERROR "The TENSORRT_ROOT is empty, you must assign it a value with CMake command. Such as: -DTENSORRT_ROOT=TENSORRT_ROOT_PATH ")
endif()
set(TENSORRT_INCLUDE_DIR ${TENSORRT_ROOT}/include)
set(TENSORRT_LIB_DIR ${TENSORRT_ROOT}/lib)
file(READ ${TENSORRT_INCLUDE_DIR}/NvInfer.h TENSORRT_VERSION_FILE_CONTENTS)
string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
"${TENSORRT_VERSION_FILE_CONTENTS}")
if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
file(READ ${TENSORRT_INCLUDE_DIR}/NvInferVersion.h TENSORRT_VERSION_FILE_CONTENTS)
string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
"${TENSORRT_VERSION_FILE_CONTENTS}")
endif()
if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
message(SEND_ERROR "Failed to detect TensorRT version.")
endif()
string(REGEX REPLACE "define NV_TENSORRT_MAJOR +([0-9]+)" "\\1"
TENSORRT_MAJOR_VERSION "${TENSORRT_MAJOR_VERSION}")
message(STATUS "Current TensorRT header is ${TENSORRT_INCLUDE_DIR}/NvInfer.h. "
"Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
include_directories("${TENSORRT_INCLUDE_DIR}")
link_directories("${TENSORRT_LIB_DIR}")
endif()
if(WITH_MKL)
set(MATH_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mklml")
include_directories("${MATH_LIB_PATH}/include")
if(WIN32)
set(MATH_LIB ${MATH_LIB_PATH}/lib/mklml${CMAKE_STATIC_LIBRARY_SUFFIX}
${MATH_LIB_PATH}/lib/libiomp5md${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${MATH_LIB_PATH}/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(MKLDNN_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if(WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else(WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif(WIN32)
endif()
elseif((NOT WITH_MIPS) AND (NOT WITH_SW))
set(OPENBLAS_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}openblas")
include_directories("${OPENBLAS_LIB_PATH}/include/openblas")
if(WIN32)
set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_STATIC_LIB)
set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
if(WIN32)
set(DEPS ${PADDLE_LIB}/paddle/lib/paddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if (WITH_ONNXRUNTIME)
if(WIN32)
set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.lib paddle2onnx)
elseif(APPLE)
set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.1.10.0.dylib paddle2onnx)
else()
set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.so.1.10.0 paddle2onnx)
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf xxhash cryptopp
${EXTERNAL_LIB})
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags_static libprotobuf xxhash cryptopp-static ${EXTERNAL_LIB})
set(DEPS ${DEPS} shlwapi.lib)
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (USE_TENSORRT)
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
if(USE_TENSORRT)
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX} )
endif()
endif()
if(WITH_ROCM AND NOT WIN32)
set(DEPS ${DEPS} ${ROCM_LIB}/libamdhip64${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
if(WITH_XPU AND NOT WIN32)
set(XPU_INSTALL_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}xpu")
set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpuapi${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpurt${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
if(WITH_NPU AND NOT WIN32)
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libgraph${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libge_runner${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libacl_op_compiler${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
target_link_libraries(${DEMO_NAME} ${DEPS})
if(WIN32)
if(USE_TENSORRT)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
)
if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE})
endif()
endif()
if(WITH_MKL)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/mklml.dll ${CMAKE_BINARY_DIR}/Release
COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/libiomp5md.dll ${CMAKE_BINARY_DIR}/Release
COMMAND ${CMAKE_COMMAND} -E copy ${MKLDNN_PATH}/lib/mkldnn.dll ${CMAKE_BINARY_DIR}/Release
)
else()
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${OPENBLAS_LIB_PATH}/lib/openblas.dll ${CMAKE_BINARY_DIR}/Release
)
endif()
if(WITH_ONNXRUNTIME)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.dll
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib/paddle2onnx.dll
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
)
endif()
if(NOT WITH_STATIC_LIB)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy "${PADDLE_LIB}/paddle/lib/paddle_inference.dll" ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
)
endif()
endif()
# YOLOv6 TensorRT Benchmark测试(Linux)
## 环境准备
- CUDA、CUDNN:确认环境中已经安装CUDA和CUDNN,并且提前获取其安装路径。
- TensorRT:可通过NVIDIA官网下载[TensorRT 8.4.1.5](https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz)或其他版本安装包。
- Paddle Inference C++预测库:编译develop版本请参考[编译文档](https://www.paddlepaddle.org.cn/inference/user_guides/source_compile.html)。编译完成后,会在build目录下生成`paddle_inference_install_dir`文件夹,这个就是我们需要的C++预测库文件。
## 编译可执行程序
- (1)修改`compile.sh`中依赖库路径,主要是以下内容:
```shell
# Paddle Inference预测库路径
LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
# CUDNN路径
CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
# CUDA路径
CUDA_LIB=/usr/local/cuda/lib64
# TensorRT安装包路径,为TRT资源包解压完成后的绝对路径,其中包含`lib`和`include`文件夹
TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
```
## 测试
- FP32
```
./build/trt_run --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp32
```
- FP16
```
./build/trt_run --model_file yolov6s_infer/model.pdmodel --params_file yolov6s_infer/model.pdiparams --run_mode=trt_fp16
```
- INT8
```
./build/trt_run --model_file yolov6s_quant/model.pdmodel --params_file yolov6s_quant/model.pdiparams --run_mode=trt_int8
```
## 性能对比
| 模型 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |
| :-------- |:-------- |:--------: | :---------------------: |
| YOLOv6s | 9.06ms | 2.90ms | 1.83ms |
环境:
- Tesla T4,TensorRT 8.4.1,CUDA 11.2
- batch_size=1
#!/bin/bash
set +x
set -e
work_path=$(dirname $(readlink -f $0))
mkdir -p build
cd build
rm -rf *
DEMO_NAME=trt_run
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
CUDA_LIB=/usr/local/cuda/lib64
TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
WITH_ROCM=OFF
ROCM_LIB=/opt/rocm/lib
cmake .. -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=${WITH_MKL} \
-DDEMO_NAME=${DEMO_NAME} \
-DWITH_GPU=${WITH_GPU} \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=${USE_TENSORRT} \
-DWITH_ROCM=${WITH_ROCM} \
-DROCM_LIB=${ROCM_LIB} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
-DTENSORRT_ROOT=${TENSORRT_ROOT}
make -j
#include <chrono>
#include <iostream>
#include <memory>
#include <numeric>
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <cuda_runtime.h>
#include "paddle/include/paddle_inference_api.h"
#include "paddle/include/experimental/phi/common/float16.h"
using paddle_infer::Config;
using paddle_infer::Predictor;
using paddle_infer::CreatePredictor;
using paddle_infer::PrecisionType;
using phi::dtype::float16;
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_string(model_file, "", "Path of the inference model file.");
DEFINE_string(params_file, "", "Path of the inference params file.");
DEFINE_string(run_mode, "trt_fp32", "run_mode which can be: trt_fp32, trt_fp16 and trt_int8");
DEFINE_int32(batch_size, 1, "Batch size.");
DEFINE_int32(gpu_id, 0, "GPU card ID num.");
DEFINE_int32(trt_min_subgraph_size, 3, "tensorrt min_subgraph_size");
DEFINE_int32(warmup, 50, "warmup");
DEFINE_int32(repeats, 1000, "repeats");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::shared_ptr<Predictor> InitPredictor() {
Config config;
std::string model_path;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
model_path = FLAGS_model_dir.substr(0, FLAGS_model_dir.find_last_of("/"));
} else {
config.SetModel(FLAGS_model_file, FLAGS_params_file);
model_path = FLAGS_model_file.substr(0, FLAGS_model_file.find_last_of("/"));
}
// enable tune
std::cout << "model_path: " << model_path << std::endl;
config.EnableUseGpu(256, FLAGS_gpu_id);
if (FLAGS_run_mode == "trt_fp32") {
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
PrecisionType::kFloat32, false, false);
} else if (FLAGS_run_mode == "trt_fp16") {
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
PrecisionType::kHalf, false, false);
} else if (FLAGS_run_mode == "trt_int8") {
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, FLAGS_trt_min_subgraph_size,
PrecisionType::kInt8, false, false);
}
config.EnableMemoryOptim();
config.SwitchIrOptim(true);
return CreatePredictor(config);
}
template <typename type>
void run(Predictor *predictor, const std::vector<type> &input,
const std::vector<int> &input_shape, type* out_data, std::vector<int> out_shape) {
// prepare input
int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
std::multiplies<int>());
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputHandle(input_names[0]);
input_t->Reshape(input_shape);
input_t->CopyFromCpu(input.data());
for (int i = 0; i < FLAGS_warmup; ++i)
CHECK(predictor->Run());
auto st = time();
for (int i = 0; i < FLAGS_repeats; ++i) {
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputHandle(input_names[0]);
input_t->Reshape(input_shape);
input_t->CopyFromCpu(input.data());
CHECK(predictor->Run());
auto output_names = predictor->GetOutputNames();
auto output_t = predictor->GetOutputHandle(output_names[0]);
std::vector<int> output_shape = output_t->shape();
output_t -> ShareExternalData<type>(out_data, out_shape, paddle_infer::PlaceType::kGPU);
}
LOG(INFO) << "[" << FLAGS_run_mode << " bs-" << FLAGS_batch_size << " ] run avg time is " << time_diff(st, time()) / FLAGS_repeats
<< " ms";
}
int main(int argc, char *argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = InitPredictor();
std::vector<int> input_shape = {FLAGS_batch_size, 3, 640, 640};
// float16
using dtype = float16;
std::vector<dtype> input_data(FLAGS_batch_size * 3 * 640 * 640, dtype(1.0));
dtype *out_data;
int out_data_size = FLAGS_batch_size * 8400 * 85;
cudaHostAlloc((void**)&out_data, sizeof(float) * out_data_size, cudaHostAllocMapped);
std::vector<int> out_shape{ FLAGS_batch_size, 1, 8400, 85};
run<dtype>(predictor.get(), input_data, input_shape, out_data, out_shape);
return 0;
}
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import numpy as np
import argparse
import time
from paddle.inference import Config
from paddle.inference import create_predictor
from post_process import YOLOv6PostProcess
CLASS_LABEL = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon',
'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant',
'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'
]
def generate_scale(im, target_shape, keep_ratio=True):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
origin_shape = im.shape[:2]
if keep_ratio:
im_size_min = np.min(origin_shape)
im_size_max = np.max(origin_shape)
target_size_min = np.min(target_shape)
target_size_max = np.max(target_shape)
im_scale = float(target_size_min) / float(im_size_min)
if np.round(im_scale * im_size_max) > target_size_max:
im_scale = float(target_size_max) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
resize_h, resize_w = target_shape
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
def image_preprocess(img_path, target_shape):
img = cv2.imread(img_path)
# Resize
im_scale_y, im_scale_x = generate_scale(img, target_shape)
img = cv2.resize(
img,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
# Pad
im_h, im_w = img.shape[:2]
h, w = target_shape[:]
if h != im_h or w != im_w:
canvas = np.ones((h, w, 3), dtype=np.float32)
canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
img = canvas
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.transpose(img, [2, 0, 1]) / 255
img = np.expand_dims(img, 0)
scale_factor = np.array([[im_scale_y, im_scale_x]])
return img.astype(np.float32), scale_factor
def get_color_map_list(num_classes):
color_map = num_classes * [0, 0, 0]
for i in range(0, num_classes):
j = 0
lab = i
while lab:
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
j += 1
lab >>= 3
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
return color_map
def draw_box(image_file, results, class_label, threshold=0.5):
srcimg = cv2.imread(image_file, 1)
for i in range(len(results)):
color_list = get_color_map_list(len(class_label))
clsid2color = {}
classid, conf = int(results[i, 0]), results[i, 1]
if conf < threshold:
continue
xmin, ymin, xmax, ymax = int(results[i, 2]), int(results[i, 3]), int(
results[i, 4]), int(results[i, 5])
if classid not in clsid2color:
clsid2color[classid] = color_list[classid]
color = tuple(clsid2color[classid])
cv2.rectangle(srcimg, (xmin, ymin), (xmax, ymax), color, thickness=2)
print(class_label[classid] + ': ' + str(round(conf, 3)))
cv2.putText(
srcimg,
class_label[classid] + ':' + str(round(conf, 3)), (xmin, ymin - 10),
cv2.FONT_HERSHEY_SIMPLEX,
0.8, (0, 255, 0),
thickness=2)
return srcimg
def load_predictor(model_dir,
run_mode='paddle',
batch_size=1,
device='CPU',
min_subgraph_size=3,
use_dynamic_shape=False,
trt_min_shape=1,
trt_max_shape=1280,
trt_opt_shape=640,
trt_calib_mode=False,
cpu_threads=1,
enable_mkldnn=False,
enable_mkldnn_bfloat16=False,
delete_shuffle_pass=False):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16/trt_int8)
use_dynamic_shape (bool): use dynamic shape or not
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
delete_shuffle_pass (bool): whether to remove shuffle_channel_detect_pass in TensorRT.
Used by action model.
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
ValueError: predict by TensorRT need device == 'GPU'.
"""
if device != 'GPU' and run_mode != 'paddle':
raise ValueError(
"Predict by TensorRT mode: {}, expect device=='GPU', but device == {}"
.format(run_mode, device))
config = Config(
os.path.join(model_dir, 'model.pdmodel'),
os.path.join(model_dir, 'model.pdiparams'))
if device == 'GPU':
# initial GPU memory(M), device ID
config.enable_use_gpu(200, 0)
# optimize graph and fuse op
config.switch_ir_optim(True)
elif device == 'XPU':
config.enable_lite_engine()
config.enable_xpu(10 * 1024 * 1024)
else:
config.disable_gpu()
config.set_cpu_math_library_num_threads(cpu_threads)
if enable_mkldnn:
try:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.enable_mkldnn()
if enable_mkldnn_bfloat16:
config.enable_mkldnn_bfloat16()
except Exception as e:
print(
"The current environment does not support `mkldnn`, so disable mkldnn."
)
pass
precision_map = {
'trt_int8': Config.Precision.Int8,
'trt_fp32': Config.Precision.Float32,
'trt_fp16': Config.Precision.Half
}
if run_mode in precision_map.keys():
config.enable_tensorrt_engine(
workspace_size=(1 << 25) * batch_size,
max_batch_size=batch_size,
min_subgraph_size=min_subgraph_size,
precision_mode=precision_map[run_mode],
use_static=False,
use_calib_mode=trt_calib_mode)
if use_dynamic_shape:
min_input_shape = {
'image': [batch_size, 3, trt_min_shape, trt_min_shape]
}
max_input_shape = {
'image': [batch_size, 3, trt_max_shape, trt_max_shape]
}
opt_input_shape = {
'image': [batch_size, 3, trt_opt_shape, trt_opt_shape]
}
config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape,
opt_input_shape)
print('trt set dynamic shape done!')
# disable print log when predict
config.disable_glog_info()
# enable shared memory
config.enable_memory_optim()
# disable feed, fetch OP, needed by zero_copy_run
config.switch_use_feed_fetch_ops(False)
if delete_shuffle_pass:
config.delete_pass("shuffle_channel_detect_pass")
predictor = create_predictor(config)
return predictor
def predict_image(predictor,
image_file,
image_shape=[640, 640],
warmup=1,
repeats=1,
threshold=0.5,
arch='YOLOv5'):
img, scale_factor = image_preprocess(image_file, image_shape)
inputs = {}
if arch == 'YOLOv5':
inputs['x2paddle_images'] = img
input_names = predictor.get_input_names()
for i in range(len(input_names)):
input_tensor = predictor.get_input_handle(input_names[i])
input_tensor.copy_from_cpu(inputs[input_names[i]])
for i in range(warmup):
predictor.run()
np_boxes = None
predict_time = 0.
time_min = float("inf")
time_max = float('-inf')
for i in range(repeats):
start_time = time.time()
predictor.run()
output_names = predictor.get_output_names()
boxes_tensor = predictor.get_output_handle(output_names[0])
np_boxes = boxes_tensor.copy_to_cpu()
end_time = time.time()
timed = end_time - start_time
time_min = min(time_min, timed)
time_max = max(time_max, timed)
predict_time += timed
time_avg = predict_time / repeats
print('Inference time(ms): min={}, max={}, avg={}'.format(
round(time_min * 1000, 2),
round(time_max * 1000, 1), round(time_avg * 1000, 1)))
postprocess = YOLOv6PostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np_boxes, scale_factor)
res_img = draw_box(
image_file, res['bbox'], CLASS_LABEL, threshold=threshold)
cv2.imwrite('result.jpg', res_img)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--image_file', type=str, default=None, help="image path")
parser.add_argument(
'--model_path', type=str, help="inference model filepath")
parser.add_argument(
'--benchmark',
type=bool,
default=False,
help="Whether run benchmark or not.")
parser.add_argument(
'--run_mode',
type=str,
default='paddle',
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
parser.add_argument(
'--device',
type=str,
default='GPU',
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is GPU"
)
parser.add_argument('--img_shape', type=int, default=640, help="input_size")
args = parser.parse_args()
predictor = load_predictor(
args.model_path, run_mode=args.run_mode, device=args.device)
warmup, repeats = 1, 1
if args.benchmark:
warmup, repeats = 50, 100
predict_image(
predictor,
args.image_file,
image_shape=[args.img_shape, args.img_shape],
warmup=warmup,
repeats=repeats)
# YOLOv7自动压缩示例
目录:
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.开始自动压缩](#自动压缩流程)
- [3.1 环境准备](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 测试模型精度](#34-测试模型精度)
- [3.5 自动压缩并产出模型](#35-自动压缩并产出模型)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
## 1. 简介
飞桨模型转换工具[X2Paddle](https://github.com/PaddlePaddle/X2Paddle)支持将```Caffe/TensorFlow/ONNX/PyTorch```的模型一键转为飞桨(PaddlePaddle)的预测模型。借助X2Paddle的能力,各种框架的推理模型可以很方便的使用PaddleSlim的自动化压缩功能。
本示例将以[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)目标检测模型为例,将PyTorch框架模型转换为Paddle框架模型,再使用ACT自动压缩功能进行自动压缩。本示例使用的自动压缩策略为量化训练。
## 2.Benchmark
| 模型 | 策略 | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| YOLOv7 | Base模型 | 640*640 | 51.1 | 26.84ms | 7.44ms | - | - | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar) |
| YOLOv7 | KL离线量化 | 640*640 | 50.2 | - | - | 4.55ms | - | - |
| YOLOv7 | 量化蒸馏训练 | 640*640 | **50.8** | - | - | **4.55ms** | [config](./configs/yolov7_qat_dis.yaml) | [Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_quant.tar) |
说明:
- mAP的指标均在COCO val2017数据集中评测得到。
- YOLOv7模型在Tesla T4的GPU环境下开启TensorRT 8.4.1,batch_size=1, 测试脚本是[cpp_infer](./cpp_infer)
## 3. 自动压缩流程
#### 3.1 准备环境
- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim > 2.3版本
- PaddleDet >= 2.4
- [X2Paddle](https://github.com/PaddlePaddle/X2Paddle) >= 1.3.6
- opencv-python
(1)安装paddlepaddle:
```shell
# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu
```
(2)安装paddleslim:
```shell
pip install paddleslim
```
(3)安装paddledet:
```shell
pip install paddledet
```
注:安装PaddleDet的目的只是为了直接使用PaddleDetection中的Dataloader组件。
(4)安装X2Paddle的1.3.6以上版本:
```shell
pip install x2paddle sympy onnx
```
#### 3.2 准备数据集
本案例默认以COCO数据进行自动压缩实验,并且依赖PaddleDetection中数据读取模块,如果自定义COCO数据,或者其他格式数据,请参考[PaddleDetection数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareDataSet.md) 来准备数据。
如果已经准备好数据集,请直接修改[./configs/yolov7_reader.yml]中`EvalDataset``dataset_dir`字段为自己数据集路径即可。
#### 3.3 准备预测模型
(1)准备ONNX模型:
可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型,具体步骤如下:
```shell
git clone https://github.com/WongKinYiu/yolov7.git
# 切换分支到u5分支,保持导出的ONNX模型后处理和YOLOv5一致
git checkout u5
# 下载好yolov7.pt权重后执行:
python export.py --weights yolov7.pt --include onnx
```
也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)
(2) 转换模型:
```
x2paddle --framework=onnx --model=yolov7.onnx --save_dir=pd_model
cp -r pd_model/inference_model/ yolov7_infer
```
即可得到YOLOv7模型的预测模型(`model.pdmodel``model.pdiparams`)。如想快速体验,可直接下载上方表格中YOLOv7的[Paddle预测模型](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov7_infer.tar)
预测模型的格式为:`model.pdmodel``model.pdiparams`两个,带`pdmodel`的是模型文件,带`pdiparams`后缀的是权重文件。
#### 3.4 自动压缩并产出模型
蒸馏量化自动压缩示例通过run.py脚本启动,会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数,配置完成后便可对模型进行量化和蒸馏。具体运行命令为:
- 单卡训练:
```
export CUDA_VISIBLE_DEVICES=0
python run.py --config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
```
- 多卡训练:
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log --gpus 0,1,2,3 run.py \
--config_path=./configs/yolov7_qat_dis.yaml --save_dir='./output/'
```
#### 3.5 测试模型精度
修改[yolov7_qat_dis.yaml](./configs/yolov7_qat_dis.yaml)`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov7_qat_dis.yaml
```
## 4.预测部署
#### Paddle-TensorRT C++部署
进入[cpp_infer](./cpp_infer)文件夹内,请按照[C++ TensorRT Benchmark测试教程](./cpp_infer/README.md)进行准备环境及编译,然后开始测试:
```shell
# 编译
bash complie.sh
# 执行
./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
```
#### Paddle-TensorRT Python部署:
首先安装带有TensorRT的[Paddle安装包](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
然后使用[paddle_trt_infer.py](./paddle_trt_infer.py)进行部署:
```shell
python paddle_trt_infer.py --model_path=output --image_file=images/000000570688.jpg --benchmark=True --run_mode=trt_int8
```
## 5.FAQ
- 如果想测试离线量化模型精度,可执行:
```shell
python post_quant.py --config_path=./configs/yolov7_qat_dis.yaml
```
metric: COCO
num_classes: 80
# Datset configuration
TrainDataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco/
EvalDataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco/
worker_num: 0
# preprocess reader in test
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {target_size: [640, 640], keep_ratio: True}
- Pad: {size: [640, 640], fill_value: [114., 114., 114.]}
- NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True}
- Permute: {}
batch_size: 1
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
option(WITH_ROCM "Compile demo with rocm." OFF)
option(WITH_ONNXRUNTIME "Compile demo with ONNXRuntime" OFF)
option(WITH_ARM "Compile demo with ARM" OFF)
option(WITH_MIPS "Compile demo with MIPS" OFF)
option(WITH_SW "Compile demo with SW" OFF)
option(WITH_XPU "Compile demow ith xpu" OFF)
option(WITH_NPU "Compile demow ith npu" OFF)
if(NOT WITH_STATIC_LIB)
add_definitions("-DPADDLE_WITH_SHARED_LIB")
else()
# PD_INFER_DECL is mainly used to set the dllimport/dllexport attribute in dynamic library mode.
# Set it to empty in static library mode to avoid compilation issues.
add_definitions("/DPD_INFER_DECL=")
endif()
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
endif()
include_directories("${PADDLE_LIB}/")
set(PADDLE_LIB_THIRD_PARTY_PATH "${PADDLE_LIB}/third_party/install/")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/include")
include_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/include")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}protobuf/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}glog/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}gflags/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}xxhash/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}cryptopp/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib")
link_directories("${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib")
if (WIN32)
add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
if (MSVC_STATIC_CRT)
if (WITH_MKL)
set(FLAG_OPENMP "/openmp")
endif()
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd ${FLAG_OPENMP}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT ${FLAG_OPENMP}")
safe_set_static_flag()
if (WITH_STATIC_LIB)
add_definitions(-DSTATIC_LIB)
endif()
endif()
else()
if(WITH_MKL)
set(FLAG_OPENMP "-fopenmp")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 ${FLAG_OPENMP}")
endif()
if(WITH_GPU)
if(NOT WIN32)
include_directories("/usr/local/cuda/include")
if(CUDA_LIB STREQUAL "")
set(CUDA_LIB "/usr/local/cuda/lib64/" CACHE STRING "CUDA Library")
endif()
else()
include_directories("C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\include")
if(CUDA_LIB STREQUAL "")
set(CUDA_LIB "C:\\Program\ Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\lib\\x64")
endif()
endif(NOT WIN32)
endif()
if (USE_TENSORRT AND WITH_GPU)
set(TENSORRT_ROOT "" CACHE STRING "The root directory of TensorRT library")
if("${TENSORRT_ROOT}" STREQUAL "")
message(FATAL_ERROR "The TENSORRT_ROOT is empty, you must assign it a value with CMake command. Such as: -DTENSORRT_ROOT=TENSORRT_ROOT_PATH ")
endif()
set(TENSORRT_INCLUDE_DIR ${TENSORRT_ROOT}/include)
set(TENSORRT_LIB_DIR ${TENSORRT_ROOT}/lib)
file(READ ${TENSORRT_INCLUDE_DIR}/NvInfer.h TENSORRT_VERSION_FILE_CONTENTS)
string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
"${TENSORRT_VERSION_FILE_CONTENTS}")
if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
file(READ ${TENSORRT_INCLUDE_DIR}/NvInferVersion.h TENSORRT_VERSION_FILE_CONTENTS)
string(REGEX MATCH "define NV_TENSORRT_MAJOR +([0-9]+)" TENSORRT_MAJOR_VERSION
"${TENSORRT_VERSION_FILE_CONTENTS}")
endif()
if("${TENSORRT_MAJOR_VERSION}" STREQUAL "")
message(SEND_ERROR "Failed to detect TensorRT version.")
endif()
string(REGEX REPLACE "define NV_TENSORRT_MAJOR +([0-9]+)" "\\1"
TENSORRT_MAJOR_VERSION "${TENSORRT_MAJOR_VERSION}")
message(STATUS "Current TensorRT header is ${TENSORRT_INCLUDE_DIR}/NvInfer.h. "
"Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
include_directories("${TENSORRT_INCLUDE_DIR}")
link_directories("${TENSORRT_LIB_DIR}")
endif()
if(WITH_MKL)
set(MATH_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mklml")
include_directories("${MATH_LIB_PATH}/include")
if(WIN32)
set(MATH_LIB ${MATH_LIB_PATH}/lib/mklml${CMAKE_STATIC_LIBRARY_SUFFIX}
${MATH_LIB_PATH}/lib/libiomp5md${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(MATH_LIB ${MATH_LIB_PATH}/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${MATH_LIB_PATH}/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(MKLDNN_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
if(WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/mkldnn.lib)
else(WIN32)
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif(WIN32)
endif()
elseif((NOT WITH_MIPS) AND (NOT WITH_SW))
set(OPENBLAS_LIB_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}openblas")
include_directories("${OPENBLAS_LIB_PATH}/include/openblas")
if(WIN32)
set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(MATH_LIB ${OPENBLAS_LIB_PATH}/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
if(WITH_STATIC_LIB)
set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
if(WIN32)
set(DEPS ${PADDLE_LIB}/paddle/lib/paddle_inference${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS ${PADDLE_LIB}/paddle/lib/libpaddle_inference${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
endif()
if (WITH_ONNXRUNTIME)
if(WIN32)
set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.lib paddle2onnx)
elseif(APPLE)
set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.1.10.0.dylib paddle2onnx)
else()
set(DEPS ${DEPS} ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/libonnxruntime.so.1.10.0 paddle2onnx)
endif()
endif()
if (NOT WIN32)
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf xxhash cryptopp
${EXTERNAL_LIB})
else()
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags_static libprotobuf xxhash cryptopp-static ${EXTERNAL_LIB})
set(DEPS ${DEPS} shlwapi.lib)
endif(NOT WIN32)
if(WITH_GPU)
if(NOT WIN32)
if (USE_TENSORRT)
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
else()
if(USE_TENSORRT)
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_STATIC_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_STATIC_LIBRARY_SUFFIX})
if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
set(DEPS ${DEPS} ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/cudart${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cublas${CMAKE_STATIC_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/cudnn${CMAKE_STATIC_LIBRARY_SUFFIX} )
endif()
endif()
if(WITH_ROCM AND NOT WIN32)
set(DEPS ${DEPS} ${ROCM_LIB}/libamdhip64${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
if(WITH_XPU AND NOT WIN32)
set(XPU_INSTALL_PATH "${PADDLE_LIB_THIRD_PARTY_PATH}xpu")
set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpuapi${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${XPU_INSTALL_PATH}/lib/libxpurt${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
if(WITH_NPU AND NOT WIN32)
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libgraph${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libge_runner${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libascendcl${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64/libacl_op_compiler${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
target_link_libraries(${DEMO_NAME} ${DEPS})
if(WIN32)
if(USE_TENSORRT)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/nvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
)
if(${TENSORRT_MAJOR_VERSION} GREATER_EQUAL 7)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${TENSORRT_LIB_DIR}/myelin64_1${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE})
endif()
endif()
if(WITH_MKL)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/mklml.dll ${CMAKE_BINARY_DIR}/Release
COMMAND ${CMAKE_COMMAND} -E copy ${MATH_LIB_PATH}/lib/libiomp5md.dll ${CMAKE_BINARY_DIR}/Release
COMMAND ${CMAKE_COMMAND} -E copy ${MKLDNN_PATH}/lib/mkldnn.dll ${CMAKE_BINARY_DIR}/Release
)
else()
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${OPENBLAS_LIB_PATH}/lib/openblas.dll ${CMAKE_BINARY_DIR}/Release
)
endif()
if(WITH_ONNXRUNTIME)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}onnxruntime/lib/onnxruntime.dll
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
COMMAND ${CMAKE_COMMAND} -E copy ${PADDLE_LIB_THIRD_PARTY_PATH}paddle2onnx/lib/paddle2onnx.dll
${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
)
endif()
if(NOT WITH_STATIC_LIB)
add_custom_command(TARGET ${DEMO_NAME} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy "${PADDLE_LIB}/paddle/lib/paddle_inference.dll" ${CMAKE_BINARY_DIR}/${CMAKE_BUILD_TYPE}
)
endif()
endif()
# YOLOv7 TensorRT Benchmark测试(Linux)
## 环境准备
- CUDA、CUDNN:确认环境中已经安装CUDA和CUDNN,并且提前获取其安装路径。
- TensorRT:可通过NVIDIA官网下载[TensorRT 8.4.1.5](https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz)或其他版本安装包。
- Paddle Inference C++预测库:编译develop版本请参考[编译文档](https://www.paddlepaddle.org.cn/inference/user_guides/source_compile.html)。编译完成后,会在build目录下生成`paddle_inference_install_dir`文件夹,这个就是我们需要的C++预测库文件。
## 编译可执行程序
- (1)修改`compile.sh`中依赖库路径,主要是以下内容:
```shell
# Paddle Inference预测库路径
LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
# CUDNN路径
CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
# CUDA路径
CUDA_LIB=/usr/local/cuda/lib64
# TensorRT安装包路径,为TRT资源包解压完成后的绝对路径,其中包含`lib`和`include`文件夹
TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
```
## 测试
- FP32
```
./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp32
```
- FP16
```
./build/trt_run --model_file yolov7_infer/model.pdmodel --params_file yolov7_infer/model.pdiparams --run_mode=trt_fp16
```
- INT8
```
./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
```
## 性能对比
| 预测库 | 模型 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) |
| :--------: | :--------: |:-------- |:--------: | :---------------------: |
| Paddle TensorRT | YOLOv7 | 26.84ms | 7.44ms | 4.55ms |
| TensorRT | YOLOv7 | 28.25ms | 7.23ms | 4.67ms |
环境:
- Tesla T4,TensorRT 8.4.1,CUDA 11.2
- batch_size=1
#!/bin/bash
set +x
set -e
work_path=$(dirname $(readlink -f $0))
mkdir -p build
cd build
rm -rf *
DEMO_NAME=trt_run
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
LIB_DIR=/root/auto_compress/Paddle/build/paddle_inference_install_dir/
CUDNN_LIB=/usr/lib/x86_64-linux-gnu/
CUDA_LIB=/usr/local/cuda/lib64
TENSORRT_ROOT=/root/auto_compress/trt/trt8.4/
WITH_ROCM=OFF
ROCM_LIB=/opt/rocm/lib
cmake .. -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=${WITH_MKL} \
-DDEMO_NAME=${DEMO_NAME} \
-DWITH_GPU=${WITH_GPU} \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=${USE_TENSORRT} \
-DWITH_ROCM=${WITH_ROCM} \
-DROCM_LIB=${ROCM_LIB} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
-DTENSORRT_ROOT=${TENSORRT_ROOT}
make -j
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
import argparse
import paddle
from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
from paddleslim.auto_compression import AutoCompression
from post_process import YOLOv7PostProcess
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--config_path',
type=str,
default=None,
help="path of compression strategy config.",
required=True)
parser.add_argument(
'--save_dir',
type=str,
default='output',
help="directory to save compressed model.")
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
parser.add_argument(
'--eval', type=bool, default=False, help="whether to run evaluation.")
return parser
def reader_wrapper(reader, input_list):
def gen():
for data in reader:
in_dict = {}
if isinstance(input_list, list):
for input_name in input_list:
in_dict[input_name] = data[input_name]
elif isinstance(input_list, dict):
for input_name in input_list.keys():
in_dict[input_list[input_name]] = data[input_name]
yield in_dict
return gen
def convert_numpy_data(data, metric):
data_all = {}
data_all = {k: np.array(v) for k, v in data.items()}
if isinstance(metric, VOCMetric):
for k, v in data_all.items():
if not isinstance(v[0], np.ndarray):
tmp_list = []
for t in v:
tmp_list.append(np.array(t))
data_all[k] = np.array(tmp_list)
else:
data_all = {k: np.array(v) for k, v in data.items()}
return data_all
def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
metric = global_config['metric']
for batch_id, data in enumerate(val_loader):
data_all = convert_numpy_data(data, metric)
data_input = {}
for k, v in data.items():
if isinstance(global_config['input_list'], list):
if k in test_feed_names:
data_input[k] = np.array(v)
elif isinstance(global_config['input_list'], dict):
if k in global_config['input_list'].keys():
data_input[global_config['input_list'][k]] = np.array(v)
outs = exe.run(compiled_test_program,
feed=data_input,
fetch_list=test_fetch_list,
return_numpy=False)
res = {}
postprocess = YOLOv7PostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor'])
metric.update(data_all, res)
if batch_id % 100 == 0:
print('Eval iter:', batch_id)
metric.accumulate()
metric.log()
map_res = metric.get_results()
metric.reset()
return map_res['bbox'][0]
def main():
global global_config
all_config = load_slim_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
train_loader = create('EvalReader')(reader_cfg['TrainDataset'],
reader_cfg['worker_num'],
return_list=True)
train_loader = reader_wrapper(train_loader, global_config['input_list'])
if 'Evaluation' in global_config.keys() and global_config[
'Evaluation'] and paddle.distributed.get_rank() == 0:
eval_func = eval_function
dataset = reader_cfg['EvalDataset']
global val_loader
_eval_batch_sampler = paddle.io.BatchSampler(
dataset, batch_size=reader_cfg['EvalReader']['batch_size'])
val_loader = create('EvalReader')(dataset,
reader_cfg['worker_num'],
batch_sampler=_eval_batch_sampler,
return_list=True)
metric = None
if reader_cfg['metric'] == 'COCO':
clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
anno_file = dataset.get_anno()
metric = COCOMetric(
anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
elif reader_cfg['metric'] == 'VOC':
metric = VOCMetric(
label_list=dataset.get_label_list(),
class_num=reader_cfg['num_classes'],
map_type=reader_cfg['map_type'])
else:
raise ValueError("metric currently only supports COCO and VOC.")
global_config['metric'] = metric
else:
eval_func = None
ac = AutoCompression(
model_dir=global_config["model_dir"],
model_filename=global_config["model_filename"],
params_filename=global_config["params_filename"],
save_dir=FLAGS.save_dir,
config=all_config,
train_dataloader=train_loader,
eval_callback=eval_func)
ac.compress()
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices)
main()
...@@ -71,10 +71,10 @@ git clone https://github.com/PaddlePaddle/PaddleSlim.git ...@@ -71,10 +71,10 @@ git clone https://github.com/PaddlePaddle/PaddleSlim.git
安装paddleseg 安装paddleseg
```shell ```shell
pip install paddleseg pip install paddleseg==2.5.0
``` ```
注:安装[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)的目的只是为了直接使用PaddleSeg中的Dataloader组件,不涉及模型组网等。 注:安装[PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg)的目的只是为了直接使用PaddleSeg中的Dataloader组件,不涉及模型组网等。推荐安装PaddleSeg 2.5.0, 不同版本的PaddleSeg的Dataloader返回数据的格式略有不同.
#### 3.2 准备数据集 #### 3.2 准备数据集
......
...@@ -21,7 +21,7 @@ from paddleseg.cvlibs import Config as PaddleSegDataConfig ...@@ -21,7 +21,7 @@ from paddleseg.cvlibs import Config as PaddleSegDataConfig
from paddleseg.utils import worker_init_fn from paddleseg.utils import worker_init_fn
from paddleslim.auto_compression import AutoCompression from paddleslim.auto_compression import AutoCompression
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from paddleseg.core.infer import reverse_transform from paddleseg.core.infer import reverse_transform
from paddleseg.utils import metrics from paddleseg.utils import metrics
...@@ -38,6 +38,11 @@ def argsparser(): ...@@ -38,6 +38,11 @@ def argsparser():
type=str, type=str,
default=None, default=None,
help="directory to save compressed model.") help="directory to save compressed model.")
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser return parser
...@@ -123,7 +128,12 @@ def main(args): ...@@ -123,7 +128,12 @@ def main(args):
config = all_config["Global"] config = all_config["Global"]
rank_id = paddle.distributed.get_rank() rank_id = paddle.distributed.get_rank()
place = paddle.CUDAPlace(rank_id) if args.devices == 'gpu':
place = paddle.CUDAPlace(rank_id)
paddle.set_device('gpu')
else:
place = paddle.CPUPlace()
paddle.set_device('cpu')
# step1: load dataset config and create dataloader # step1: load dataset config and create dataloader
data_cfg = PaddleSegDataConfig(config['reader_config']) data_cfg = PaddleSegDataConfig(config['reader_config'])
train_dataset = data_cfg.train_dataset train_dataset = data_cfg.train_dataset
......
...@@ -23,7 +23,7 @@ import paddle ...@@ -23,7 +23,7 @@ import paddle
import paddle.nn as nn import paddle.nn as nn
from paddle.io import DataLoader from paddle.io import DataLoader
from imagenet_reader import ImageNetDataset from imagenet_reader import ImageNetDataset
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
def argsparser(): def argsparser():
...@@ -93,7 +93,8 @@ def eval(): ...@@ -93,7 +93,8 @@ def eval():
def main(): def main():
global global_config global global_config
all_config = load_slim_config(args.config_path) all_config = load_slim_config(args.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}" assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
all_config)
global_config = all_config["Global"] global_config = all_config["Global"]
global data_dir global data_dir
data_dir = global_config['data_dir'] data_dir = global_config['data_dir']
......
...@@ -23,7 +23,7 @@ import paddle ...@@ -23,7 +23,7 @@ import paddle
import paddle.nn as nn import paddle.nn as nn
from paddle.io import DataLoader from paddle.io import DataLoader
from imagenet_reader import ImageNetDataset from imagenet_reader import ImageNetDataset
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from paddleslim.auto_compression import AutoCompression from paddleslim.auto_compression import AutoCompression
...@@ -107,7 +107,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list): ...@@ -107,7 +107,8 @@ def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
def main(): def main():
global global_config global global_config
all_config = load_slim_config(args.config_path) all_config = load_slim_config(args.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}" assert "Global" in all_config, "Key 'Global' not found in config file. \n{}".format(
all_config)
global_config = all_config["Global"] global_config = all_config["Global"]
global data_dir global data_dir
data_dir = global_config['data_dir'] data_dir = global_config['data_dir']
......
# 量化分析工具详细教程
## 1. 量化分析工具功能
1. 遍历模型所有层,依次量化该层,计算量化后精度。为所有只量化一层的模型精度排序,可视化不适合量化的层,以供量化时可选择性跳过不适合量化的层。
2. 可视化量化效果最好和最差的层的权重和激活分布图,以供分析模型量化效果的原因。
3. 【敬请期待】输入预期精度,直接产出符合预期精度的量化模型。
## 2. paddleslim.quant.AnalysisQuant 可传入参数解析
```yaml
model_dir
model_filename: None
params_filename: None
eval_function: None
data_loader: None
save_dir: 'analysis_results'
checkpoint_name: 'analysis_checkpoint.pkl'
num_histogram_plots: 10
ptq_config
```
- model_dir: 必须传入的模型文件路径,可为文件夹名;若模型为ONNX类型,直接输入'.onnx'模型文件名称即可。
- model_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdmodel'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。
- params_filename: 默认为None,若model_dir为文件夹名,则必须传入以'.pdiparams'结尾的模型名称,若model_dir为'.onnx'模型文件名称,则不需要传入。
- eval_function:目前不支持为None,需要传入自定义的验证函数。
- data_loader:模型校准时使用的数据,DataLoader继承自`paddle.io.DataLoader`。可以直接使用模型套件中的DataLoader,或者根据[paddle.io.DataLoader](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#dataloader)自定义所需要的DataLoader。
- save_dir:分析后保存模型精度或pdf等文件的文件夹,默认为`analysis_results`
- checkpoint_name:由于模型可能存在大量层需要分析,因此分析过程中会中间保存结果,如果程序中断会自动加载已经分析好的结果,默认为`analysis_checkpoint.pkl`
- num_histogram_plots:需要可视化的直方分布图数量。可视化量化效果最好和最坏的该数量个权重和激活的分布图。默认为10。若不需要可视化直方图,设置为0即可。
- ptq_config:可传入的离线量化中的参数,详细可参考[离线量化文档](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/demo/quant/quant_post)
## 3. 量化分析工具产出内容
量化分析工具会默认会产出以下目录:
```
analysis_results/
├── analysis.txt
├── best_weight_hist_result.pdf
├── best_act_hist_result.pdf
├── worst_weight_hist_result.pdf
├── worst_act_hist_result.pdf
```
- 所有只量化一层的模型精度排序,将默认保存在 `./analysis_results/analysis.txt` 中。
- 通过设置参数`num_histogram_plots`,可选择绘出该数量个量化效果最好和最差层的weight和activation的直方分布图,将以PDF形式保存在 `./analysis_results` 文件夹下, 分别保存为 `best_weight_hist_result.pdf``best_act_hist_result.pdf``worst_weight_hist_result.pdf``worst_act_hist_result.pdf` 中以供对比分析。
## 3. 根据分析结果执行离线量化
执行完量化分析工具后,可根据 `analysis.txt` 中的精度排序,在量化中去掉效果较差的层,具体操作为:在调用 `paddleslim.quant.quant_post_static` 时加入参数 `skip_tensor_list`,将需要去掉的层传入即可。
...@@ -16,13 +16,14 @@ import os ...@@ -16,13 +16,14 @@ import os
import sys import sys
import numpy as np import numpy as np
import argparse import argparse
from tqdm import tqdm
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from keypoint_utils import keypoint_post_process
from post_process import PPYOLOEPostProcess
from post_process import YOLOv7PostProcess from paddleslim.quant.analysis import AnalysisQuant
def argsparser(): def argsparser():
...@@ -31,14 +32,13 @@ def argsparser(): ...@@ -31,14 +32,13 @@ def argsparser():
'--config_path', '--config_path',
type=str, type=str,
default=None, default=None,
help="path of compression strategy config.", help="path of analysis config.",
required=True) required=True)
parser.add_argument( parser.add_argument(
'--devices', '--devices',
type=str, type=str,
default='gpu', default='gpu',
help="which device used to compress.") help="which device used to compress.")
return parser return parser
...@@ -72,72 +72,100 @@ def convert_numpy_data(data, metric): ...@@ -72,72 +72,100 @@ def convert_numpy_data(data, metric):
return data_all return data_all
def eval(): def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
with tqdm(
place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() total=len(val_loader),
exe = paddle.static.Executor(place) bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
ncols=80) as t:
val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model( for batch_id, data in enumerate(val_loader):
global_config["model_dir"], data_all = convert_numpy_data(data, metric)
exe, data_input = {}
model_filename=global_config["model_filename"], for k, v in data.items():
params_filename=global_config["params_filename"]) if isinstance(config['input_list'], list):
print('Loaded model from: {}'.format(global_config["model_dir"])) if k in test_feed_names:
data_input[k] = np.array(v)
metric = global_config['metric'] elif isinstance(config['input_list'], dict):
for batch_id, data in enumerate(val_loader): if k in config['input_list'].keys():
data_all = convert_numpy_data(data, metric) data_input[config['input_list'][k]] = np.array(v)
data_input = {} outs = exe.run(compiled_test_program,
for k, v in data.items(): feed=data_input,
if isinstance(global_config['input_list'], list): fetch_list=test_fetch_list,
if k in global_config['input_list']: return_numpy=False)
data_input[k] = np.array(v) res = {}
elif isinstance(global_config['input_list'], dict): if 'arch' in config and config['arch'] == 'keypoint':
if k in global_config['input_list'].keys(): res = keypoint_post_process(data, data_input, exe,
data_input[global_config['input_list'][k]] = np.array(v) compiled_test_program,
outs = exe.run(val_program, test_fetch_list, outs)
feed=data_input, if 'arch' in config and config['arch'] == 'PPYOLOE':
fetch_list=fetch_targets, postprocess = PPYOLOEPostProcess(
return_numpy=False) score_threshold=0.01, nms_threshold=0.6)
res = {} res = postprocess(np.array(outs[0]), data_all['scale_factor'])
postprocess = YOLOv7PostProcess( else:
score_threshold=0.001, nms_threshold=0.65, multi_label=True) for out in outs:
res = postprocess(np.array(outs[0]), data_all['scale_factor']) v = np.array(out)
metric.update(data_all, res) if len(v.shape) > 1:
if batch_id % 100 == 0: res['bbox'] = v
print('Eval iter:', batch_id) else:
res['bbox_num'] = v
metric.update(data_all, res)
t.update()
metric.accumulate() metric.accumulate()
metric.log() metric.log()
map_res = metric.get_results()
metric.reset() metric.reset()
map_key = 'keypoint' if 'arch' in config and config[
'arch'] == 'keypoint' else 'bbox'
return map_res[map_key][0]
def main(): def main():
global global_config
all_config = load_slim_config(FLAGS.config_path)
global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
dataset = reader_cfg['EvalDataset'] global config
config = load_config(FLAGS.config_path)
ptq_config = config['PTQ']
data_loader = create('EvalReader')(config['EvalDataset'],
config['worker_num'],
return_list=True)
data_loader = reader_wrapper(data_loader, config['input_list'])
dataset = config['EvalDataset']
global val_loader global val_loader
val_loader = create('EvalReader')(reader_cfg['EvalDataset'], _eval_batch_sampler = paddle.io.BatchSampler(
reader_cfg['worker_num'], dataset, batch_size=config['EvalReader']['batch_size'])
val_loader = create('EvalReader')(dataset,
config['worker_num'],
batch_sampler=_eval_batch_sampler,
return_list=True) return_list=True)
metric = None global metric
if reader_cfg['metric'] == 'COCO': if config['metric'] == 'COCO':
clsid2catid = {v: k for k, v in dataset.catid2clsid.items()} clsid2catid = {v: k for k, v in dataset.catid2clsid.items()}
anno_file = dataset.get_anno() anno_file = dataset.get_anno()
metric = COCOMetric( metric = COCOMetric(
anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox') anno_file=anno_file, clsid2catid=clsid2catid, IouType='bbox')
elif reader_cfg['metric'] == 'VOC': elif config['metric'] == 'VOC':
metric = VOCMetric( metric = VOCMetric(
label_list=dataset.get_label_list(), label_list=dataset.get_label_list(),
class_num=reader_cfg['num_classes'], class_num=config['num_classes'],
map_type=reader_cfg['map_type']) map_type=config['map_type'])
elif config['metric'] == 'KeyPointTopDownCOCOEval':
anno_file = dataset.get_anno()
metric = KeyPointTopDownCOCOEval(anno_file,
len(dataset), 17, 'output_eval')
else: else:
raise ValueError("metric currently only supports COCO and VOC.") raise ValueError("metric currently only supports COCO and VOC.")
global_config['metric'] = metric
eval() analyzer = AnalysisQuant(
model_dir=config["model_dir"],
model_filename=config["model_filename"],
params_filename=config["params_filename"],
eval_function=eval_function,
data_loader=data_loader,
save_dir=config['save_dir'],
ptq_config=ptq_config)
analyzer.analysis()
if __name__ == '__main__': if __name__ == '__main__':
......
input_list: ['image', 'scale_factor']
model_dir: ./picodet_s_416_coco_lcnet/
model_filename: model.pdmodel
params_filename: model.pdiparams
save_dir: ./analysis_results
metric: COCO
num_classes: 80
PTQ:
quantizable_op_type: ["conv2d", "depthwise_conv2d"]
weight_quantize_type: 'abs_max'
activation_quantize_type: 'moving_average_abs_max'
is_full_quantize: False
batch_size: 10
batch_nums: 10
# Datset configuration
TrainDataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: /dataset/coco/
EvalDataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: /dataset/coco/
eval_height: &eval_height 416
eval_width: &eval_width 416
eval_size: &eval_size [*eval_height, *eval_width]
worker_num: 0
EvalReader:
inputs_def:
image_shape: [1, 3, *eval_height, *eval_width]
sample_transforms:
- Decode: {}
- Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
- NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- Permute: {}
batch_size: 32
input_list: ['image', 'scale_factor']
model_dir: ./picodet_s_416_coco_lcnet/
model_filename: model.pdmodel
params_filename: model.pdiparams
skip_tensor_list: None
metric: COCO metric: COCO
num_classes: 80 num_classes: 80
...@@ -6,22 +12,27 @@ TrainDataset: ...@@ -6,22 +12,27 @@ TrainDataset:
!COCODataSet !COCODataSet
image_dir: train2017 image_dir: train2017
anno_path: annotations/instances_train2017.json anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco/ dataset_dir: /dataset/coco/
EvalDataset: EvalDataset:
!COCODataSet !COCODataSet
image_dir: val2017 image_dir: val2017
anno_path: annotations/instances_val2017.json anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco/ dataset_dir: /dataset/coco/
eval_height: &eval_height 416
eval_width: &eval_width 416
eval_size: &eval_size [*eval_height, *eval_width]
worker_num: 0 worker_num: 0
# preprocess reader in test
EvalReader: EvalReader:
inputs_def:
image_shape: [1, 3, *eval_height, *eval_width]
sample_transforms: sample_transforms:
- Decode: {} - Decode: {}
- Resize: {target_size: [640, 640], keep_ratio: True} - Resize: {interp: 2, target_size: *eval_size, keep_ratio: False}
- Pad: {size: [640, 640], fill_value: [114., 114., 114.]} - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
- NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1], is_scale: True} - Permute: {}
- Permute: {} batch_size: 32
batch_size: 1
...@@ -19,10 +19,10 @@ import argparse ...@@ -19,10 +19,10 @@ import argparse
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric from ppdet.metrics import COCOMetric, VOCMetric, KeyPointTopDownCOCOEval
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config from paddleslim.common import load_config as load_slim_config
from keypoint_utils import keypoint_post_process
from post_process import YOLOv6PostProcess from post_process import PPYOLOEPostProcess
def argsparser(): def argsparser():
...@@ -78,7 +78,7 @@ def eval(): ...@@ -78,7 +78,7 @@ def eval():
exe = paddle.static.Executor(place) exe = paddle.static.Executor(place)
val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model( val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
global_config["model_dir"], global_config["model_dir"].rstrip('/'),
exe, exe,
model_filename=global_config["model_filename"], model_filename=global_config["model_filename"],
params_filename=global_config["params_filename"]) params_filename=global_config["params_filename"])
...@@ -95,14 +95,18 @@ def eval(): ...@@ -95,14 +95,18 @@ def eval():
elif isinstance(global_config['input_list'], dict): elif isinstance(global_config['input_list'], dict):
if k in global_config['input_list'].keys(): if k in global_config['input_list'].keys():
data_input[global_config['input_list'][k]] = np.array(v) data_input[global_config['input_list'][k]] = np.array(v)
outs = exe.run(val_program, outs = exe.run(val_program,
feed=data_input, feed=data_input,
fetch_list=fetch_targets, fetch_list=fetch_targets,
return_numpy=False) return_numpy=False)
res = {} res = {}
if 'arch' in global_config and global_config['arch'] == 'YOLOv6': if 'arch' in global_config and global_config['arch'] == 'keypoint':
postprocess = YOLOv6PostProcess( res = keypoint_post_process(data, data_input, exe, val_program,
score_threshold=0.001, nms_threshold=0.65, multi_label=True) fetch_targets, outs)
if 'arch' in global_config and global_config['arch'] == 'PPYOLOE':
postprocess = PPYOLOEPostProcess(
score_threshold=0.01, nms_threshold=0.6)
res = postprocess(np.array(outs[0]), data_all['scale_factor']) res = postprocess(np.array(outs[0]), data_all['scale_factor'])
else: else:
for out in outs: for out in outs:
...@@ -141,6 +145,10 @@ def main(): ...@@ -141,6 +145,10 @@ def main():
label_list=dataset.get_label_list(), label_list=dataset.get_label_list(),
class_num=reader_cfg['num_classes'], class_num=reader_cfg['num_classes'],
map_type=reader_cfg['map_type']) map_type=reader_cfg['map_type'])
elif reader_cfg['metric'] == 'KeyPointTopDownCOCOEval':
anno_file = dataset.get_anno()
metric = KeyPointTopDownCOCOEval(anno_file,
len(dataset), 17, 'output_eval')
else: else:
raise ValueError("metric currently only supports COCO and VOC.") raise ValueError("metric currently only supports COCO and VOC.")
global_config['metric'] = metric global_config['metric'] = metric
...@@ -152,7 +160,6 @@ if __name__ == '__main__': ...@@ -152,7 +160,6 @@ if __name__ == '__main__':
paddle.enable_static() paddle.enable_static()
parser = argsparser() parser = argsparser()
FLAGS = parser.parse_args() FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu'] assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices) paddle.set_device(FLAGS.devices)
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import numpy as np
import cv2
import copy
from paddleslim.common import get_logger
logger = get_logger(__name__, level=logging.INFO)
__all__ = ['keypoint_post_process']
def flip_back(output_flipped, matched_parts):
assert output_flipped.ndim == 4,\
'output_flipped should be [batch_size, num_joints, height, width]'
output_flipped = output_flipped[:, :, :, ::-1]
for pair in matched_parts:
tmp = output_flipped[:, pair[0], :, :].copy()
output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
output_flipped[:, pair[1], :, :] = tmp
return output_flipped
def get_affine_transform(center,
input_size,
rot,
output_size,
shift=(0., 0.),
inv=False):
"""Get the affine transform matrix, given the center/scale/rot/output_size.
Args:
center (np.ndarray[2, ]): Center of the bounding box (x, y).
input_size (np.ndarray[2, ]): Size of input feature (width, height).
rot (float): Rotation angle (degree).
output_size (np.ndarray[2, ]): Size of the destination heatmaps.
shift (0-100%): Shift translation ratio wrt the width/height.
Default (0., 0.).
inv (bool): Option to inverse the affine transform direction.
(inv=False: src->dst or inv=True: dst->src)
Returns:
np.ndarray: The transform matrix.
"""
assert len(center) == 2
assert len(output_size) == 2
assert len(shift) == 2
if not isinstance(input_size, (np.ndarray, list)):
input_size = np.array([input_size, input_size], dtype=np.float32)
scale_tmp = input_size
shift = np.array(shift)
src_w = scale_tmp[0]
dst_w = output_size[0]
dst_h = output_size[1]
rot_rad = np.pi * rot / 180
src_dir = rotate_point([0., src_w * -0.5], rot_rad)
dst_dir = np.array([0., dst_w * -0.5])
src = np.zeros((3, 2), dtype=np.float32)
src[0, :] = center + scale_tmp * shift
src[1, :] = center + src_dir + scale_tmp * shift
src[2, :] = _get_3rd_point(src[0, :], src[1, :])
dst = np.zeros((3, 2), dtype=np.float32)
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
if inv:
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return trans
def _get_3rd_point(a, b):
"""To calculate the affine matrix, three pairs of points are required. This
function is used to get the 3rd point, given 2D points a & b.
The 3rd point is defined by rotating vector `a - b` by 90 degrees
anticlockwise, using b as the rotation center.
Args:
a (np.ndarray): point(x,y)
b (np.ndarray): point(x,y)
Returns:
np.ndarray: The 3rd point.
"""
assert len(
a) == 2, 'input of _get_3rd_point should be point with length of 2'
assert len(
b) == 2, 'input of _get_3rd_point should be point with length of 2'
direction = a - b
third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
return third_pt
def rotate_point(pt, angle_rad):
"""Rotate a point by an angle.
Args:
pt (list[float]): 2 dimensional point to be rotated
angle_rad (float): rotation angle by radian
Returns:
list[float]: Rotated point.
"""
assert len(pt) == 2
sn, cs = np.sin(angle_rad), np.cos(angle_rad)
new_x = pt[0] * cs - pt[1] * sn
new_y = pt[0] * sn + pt[1] * cs
rotated_pt = [new_x, new_y]
return rotated_pt
def affine_transform(pt, t):
new_pt = np.array([pt[0], pt[1], 1.]).T
new_pt = np.dot(t, new_pt)
return new_pt[:2]
def transform_preds(coords, center, scale, output_size):
target_coords = np.zeros(coords.shape)
trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
for p in range(coords.shape[0]):
target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
return target_coords
class HRNetPostProcess(object):
def __init__(self, use_dark=True):
self.use_dark = use_dark
def get_max_preds(self, heatmaps):
'''get predictions from score maps
Args:
heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
Returns:
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
'''
assert isinstance(heatmaps,
np.ndarray), 'heatmaps should be numpy.ndarray'
assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
batch_size = heatmaps.shape[0]
num_joints = heatmaps.shape[1]
width = heatmaps.shape[3]
heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
idx = np.argmax(heatmaps_reshaped, 2)
maxvals = np.amax(heatmaps_reshaped, 2)
maxvals = maxvals.reshape((batch_size, num_joints, 1))
idx = idx.reshape((batch_size, num_joints, 1))
preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
preds[:, :, 0] = (preds[:, :, 0]) % width
preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
pred_mask = pred_mask.astype(np.float32)
preds *= pred_mask
return preds, maxvals
def gaussian_blur(self, heatmap, kernel):
border = (kernel - 1) // 2
batch_size = heatmap.shape[0]
num_joints = heatmap.shape[1]
height = heatmap.shape[2]
width = heatmap.shape[3]
for i in range(batch_size):
for j in range(num_joints):
origin_max = np.max(heatmap[i, j])
dr = np.zeros((height + 2 * border, width + 2 * border))
dr[border:-border, border:-border] = heatmap[i, j].copy()
dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
heatmap[i, j] = dr[border:-border, border:-border].copy()
heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
return heatmap
def dark_parse(self, hm, coord):
heatmap_height = hm.shape[0]
heatmap_width = hm.shape[1]
px = int(coord[0])
py = int(coord[1])
if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
+ hm[py-1][px-1])
dyy = 0.25 * (
hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
derivative = np.matrix([[dx], [dy]])
hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
if dxx * dyy - dxy**2 != 0:
hessianinv = hessian.I
offset = -hessianinv * derivative
offset = np.squeeze(np.array(offset.T), axis=0)
coord += offset
return coord
def dark_postprocess(self, hm, coords, kernelsize):
'''
DARK postpocessing, Zhang et al. Distribution-Aware Coordinate
Representation for Human Pose Estimation (CVPR 2020).
'''
hm = self.gaussian_blur(hm, kernelsize)
hm = np.maximum(hm, 1e-10)
hm = np.log(hm)
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
return coords
def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
"""
The highest heatvalue location with a quarter offset in the
direction from the highest response to the second highest response.
Args:
heatmaps (numpy.ndarray): The predicted heatmaps
center (numpy.ndarray): The boxes center
scale (numpy.ndarray): The scale factor
Returns:
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
"""
coords, maxvals = self.get_max_preds(heatmaps)
heatmap_height = heatmaps.shape[2]
heatmap_width = heatmaps.shape[3]
if self.use_dark:
coords = self.dark_postprocess(heatmaps, coords, kernelsize)
else:
for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
hm = heatmaps[n][p]
px = int(math.floor(coords[n][p][0] + 0.5))
py = int(math.floor(coords[n][p][1] + 0.5))
if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
diff = np.array([
hm[py][px + 1] - hm[py][px - 1],
hm[py + 1][px] - hm[py - 1][px]
])
coords[n][p] += np.sign(diff) * .25
preds = coords.copy()
# Transform back
for i in range(coords.shape[0]):
preds[i] = transform_preds(coords[i], center[i], scale[i],
[heatmap_width, heatmap_height])
return preds, maxvals
def __call__(self, output, center, scale):
preds, maxvals = self.get_final_preds(np.array(output), center, scale)
outputs = [[
np.concatenate(
(preds, maxvals), axis=-1), np.mean(
maxvals, axis=1)
]]
return outputs
def keypoint_post_process(data, data_input, exe, val_program, fetch_targets,
outs):
data_input['image'] = np.flip(data_input['image'], [3])
output_flipped = exe.run(val_program,
feed=data_input,
fetch_list=fetch_targets,
return_numpy=False)
output_flipped = np.array(output_flipped[0])
flip_perm = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14],
[15, 16]]
output_flipped = flip_back(output_flipped, flip_perm)
output_flipped[:, :, :, 1:] = copy.copy(output_flipped)[:, :, :, 0:-1]
hrnet_outputs = (np.array(outs[0]) + output_flipped) * 0.5
imshape = (
np.array(data['im_shape']))[:, ::-1] if 'im_shape' in data else None
center = np.array(data['center']) if 'center' in data else np.round(
imshape / 2.)
scale = np.array(data['scale']) if 'scale' in data else imshape / 200.
post_process = HRNetPostProcess()
outputs = post_process(hrnet_outputs, center, scale)
return {'keypoint': outputs}
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import cv2
def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200):
"""
Args:
box_scores (N, 5): boxes in corner-form and probabilities.
iou_threshold: intersection over union threshold.
top_k: keep top_k results. If k <= 0, keep all the results.
candidate_size: only consider the candidates with the highest scores.
Returns:
picked: a list of indexes of the kept boxes
"""
scores = box_scores[:, -1]
boxes = box_scores[:, :-1]
picked = []
indexes = np.argsort(scores)
indexes = indexes[-candidate_size:]
while len(indexes) > 0:
current = indexes[-1]
picked.append(current)
if 0 < top_k == len(picked) or len(indexes) == 1:
break
current_box = boxes[current, :]
indexes = indexes[:-1]
rest_boxes = boxes[indexes, :]
iou = iou_of(
rest_boxes,
np.expand_dims(
current_box, axis=0), )
indexes = indexes[iou <= iou_threshold]
return box_scores[picked, :]
def iou_of(boxes0, boxes1, eps=1e-5):
"""Return intersection-over-union (Jaccard index) of boxes.
Args:
boxes0 (N, 4): ground truth boxes.
boxes1 (N or 1, 4): predicted boxes.
eps: a small number to avoid 0 as denominator.
Returns:
iou (N): IoU values.
"""
overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2])
overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:])
overlap_area = area_of(overlap_left_top, overlap_right_bottom)
area0 = area_of(boxes0[..., :2], boxes0[..., 2:])
area1 = area_of(boxes1[..., :2], boxes1[..., 2:])
return overlap_area / (area0 + area1 - overlap_area + eps)
def area_of(left_top, right_bottom):
"""Compute the areas of rectangles given two corners.
Args:
left_top (N, 2): left top corner.
right_bottom (N, 2): right bottom corner.
Returns:
area (N): return the area.
"""
hw = np.clip(right_bottom - left_top, 0.0, None)
return hw[..., 0] * hw[..., 1]
class PPYOLOEPostProcess(object):
"""
Args:
input_shape (int): network input image size
scale_factor (float): scale factor of ori image
"""
def __init__(self,
score_threshold=0.4,
nms_threshold=0.5,
nms_top_k=10000,
keep_top_k=300):
self.score_threshold = score_threshold
self.nms_threshold = nms_threshold
self.nms_top_k = nms_top_k
self.keep_top_k = keep_top_k
def _non_max_suppression(self, prediction, scale_factor):
batch_size = prediction.shape[0]
out_boxes_list = []
box_num_list = []
for batch_id in range(batch_size):
bboxes, confidences = prediction[batch_id][..., :4], prediction[
batch_id][..., 4:]
# nms
picked_box_probs = []
picked_labels = []
for class_index in range(0, confidences.shape[1]):
probs = confidences[:, class_index]
mask = probs > self.score_threshold
probs = probs[mask]
if probs.shape[0] == 0:
continue
subset_boxes = bboxes[mask, :]
box_probs = np.concatenate(
[subset_boxes, probs.reshape(-1, 1)], axis=1)
box_probs = hard_nms(
box_probs,
iou_threshold=self.nms_threshold,
top_k=self.nms_top_k)
picked_box_probs.append(box_probs)
picked_labels.extend([class_index] * box_probs.shape[0])
if len(picked_box_probs) == 0:
out_boxes_list.append(np.empty((0, 4)))
else:
picked_box_probs = np.concatenate(picked_box_probs)
# resize output boxes
picked_box_probs[:, 0] /= scale_factor[batch_id][1]
picked_box_probs[:, 2] /= scale_factor[batch_id][1]
picked_box_probs[:, 1] /= scale_factor[batch_id][0]
picked_box_probs[:, 3] /= scale_factor[batch_id][0]
# clas score box
out_box = np.concatenate(
[
np.expand_dims(
np.array(picked_labels), axis=-1), np.expand_dims(
picked_box_probs[:, 4], axis=-1),
picked_box_probs[:, :4]
],
axis=1)
if out_box.shape[0] > self.keep_top_k:
out_box = out_box[out_box[:, 1].argsort()[::-1]
[:self.keep_top_k]]
out_boxes_list.append(out_box)
box_num_list.append(out_box.shape[0])
out_boxes_list = np.concatenate(out_boxes_list, axis=0)
box_num_list = np.array(box_num_list)
return out_boxes_list, box_num_list
def __call__(self, outs, scale_factor):
out_boxes_list, box_num_list = self._non_max_suppression(outs,
scale_factor)
return {'bbox': out_boxes_list, 'bbox_num': box_num_list}
...@@ -19,8 +19,6 @@ import argparse ...@@ -19,8 +19,6 @@ import argparse
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
from paddleslim.quant import quant_post_static from paddleslim.quant import quant_post_static
...@@ -64,33 +62,32 @@ def reader_wrapper(reader, input_list): ...@@ -64,33 +62,32 @@ def reader_wrapper(reader, input_list):
def main(): def main():
global global_config global config
all_config = load_slim_config(FLAGS.config_path) config = load_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
train_loader = create('EvalReader')(reader_cfg['TrainDataset'], train_loader = create('EvalReader')(config['TrainDataset'],
reader_cfg['worker_num'], config['worker_num'],
return_list=True) return_list=True)
train_loader = reader_wrapper(train_loader, global_config['input_list']) train_loader = reader_wrapper(train_loader, config['input_list'])
place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
exe = paddle.static.Executor(place) exe = paddle.static.Executor(place)
quant_post_static( quant_post_static(
executor=exe, executor=exe,
model_dir=global_config["model_dir"], model_dir=config["model_dir"],
quantize_model_path=FLAGS.save_dir, quantize_model_path=FLAGS.save_dir,
data_loader=train_loader, data_loader=train_loader,
model_filename=global_config["model_filename"], model_filename=config["model_filename"],
params_filename=global_config["params_filename"], params_filename=config["params_filename"],
batch_size=32, batch_size=4,
batch_nums=10, batch_nums=64,
algo=FLAGS.algo, algo=FLAGS.algo,
hist_percent=0.999, hist_percent=0.999,
is_full_quantize=False, is_full_quantize=False,
bias_correction=False, bias_correction=False,
onnx_format=False) onnx_format=False,
skip_tensor_list=config['skip_tensor_list']
if 'skip_tensor_list' in config else None)
if __name__ == '__main__': if __name__ == '__main__':
......
# YOLO系列离线量化示例
目录:
- [1.简介](#1简介)
- [2.Benchmark](#2Benchmark)
- [3.离线量化流程](#离线量化流程)
- [3.1 准备环境](#31-准备环境)
- [3.2 准备数据集](#32-准备数据集)
- [3.3 准备预测模型](#33-准备预测模型)
- [3.4 离线量化并产出模型](#34-离线量化并产出模型)
- [3.5 测试模型精度](#35-测试模型精度)
- [3.6 提高离线量化精度](#36-提高离线量化精度)
- [4.预测部署](#4预测部署)
- [5.FAQ](5FAQ)
本示例将以[ultralytics/yolov5](https://github.com/ultralytics/yolov5)[meituan/YOLOv6](https://github.com/meituan/YOLOv6)[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) YOLO系列目标检测模型为例,将PyTorch框架产出的推理模型转换为Paddle推理模型,使用离线量化功能进行压缩,并使用敏感度分析功能提升离线量化精度。离线量化产出的模型可以用PaddleInference部署,也可以导出为ONNX格式模型文件,并用TensorRT部署。
## 2.Benchmark
| 模型 | 策略 | 输入尺寸 | mAP<sup>val<br>0.5:0.95 | 预测时延<sup><small>FP32</small><sup><br><sup>(ms) |预测时延<sup><small>FP16</small><sup><br><sup>(ms) | 预测时延<sup><small>INT8</small><sup><br><sup>(ms) | 配置文件 | Inference模型 |
| :-------- |:-------- |:--------: | :---------------------: | :----------------: | :----------------: | :---------------: | :-----------------------------: | :-----------------------------: |
| YOLOv5s | Base模型 | 640*640 | 37.4 | 5.95ms | 2.44ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx) |
| YOLOv5s | KL离线量化 | 640*640 | 36.0 | - | - | 1.87ms | - | - |
| | | | | | | | | |
| YOLOv6s | Base模型 | 640*640 | 42.4 | 9.06ms | 2.90ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx) |
| YOLOv6s | KL离线量化(量化分析前) | 640*640 | 30.3 | - | - | 1.83ms | - | - |
| YOLOv6s | KL离线量化(量化分析后) | 640*640 | 39.7 | - | - | - | - | [Infer Model](https://bj.bcebos.com/v1/paddle-slim-models/act/yolov6s_analyzed_ptq.tar) |
| | | | | | | | | |
| YOLOv7 | Base模型 | 640*640 | 51.1 | 26.84ms | 7.44ms | - | - | [Model](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx) |
| YOLOv7 | KL离线量化 | 640*640 | 50.2 | - | - | 4.55ms | - | - |
说明:
- mAP的指标均在COCO val2017数据集中评测得到。
## 3. 离线量化流程
#### 3.1 准备环境
- PaddlePaddle >= 2.3 (可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)下载安装)
- PaddleSlim > 2.3版本
- opencv-python
(1)安装paddlepaddle:
```shell
# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu
```
(2)安装paddleslim:
```shell
pip install paddleslim
```
#### 3.2 准备数据集
本示例默认以COCO数据进行自动压缩实验,可以从 [MS COCO官网](https://cocodataset.org) 下载 [Train](http://images.cocodataset.org/zips/train2017.zip)[Val](http://images.cocodataset.org/zips/val2017.zip)[annotation](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
目录格式如下:
```
dataset/coco/
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ | ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000580008.jpg
│ | ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
```
#### 3.3 准备预测模型
(1)准备ONNX模型:
- YOLOv5:可通过[ultralytics/yolov5](https://github.com/ultralytics/yolov5) 官方的[导出教程](https://github.com/ultralytics/yolov5/issues/251)来准备ONNX模型,也可以下载准备好的[yolov5s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx)
- YOLOv6:可通过[WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7)的导出脚本来准备ONNX模型,也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7.onnx)
- YOLOv7:可通过[meituan/YOLOv6](https://github.com/meituan/YOLOv6)官方的[导出教程](https://github.com/meituan/YOLOv6/blob/main/deploy/ONNX/README.md)来准备ONNX模型,也可以下载已经准备好的[yolov6s.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov6s.onnx)
#### 3.4 离线量化并产出模型
离线量化示例通过post_quant.py脚本启动,会使用接口```paddleslim.quant.quant_post_static```对模型进行量化。配置config文件中模型路径、数据路径和量化相关的参数,配置完成后便可对模型进行离线量化。具体运行命令为:
- YOLOv5
```shell
python post_quant.py --config_path=./configs/yolov5s_ptq.yaml --save_dir=./yolov5s_ptq_out
```
- YOLOv6
```shell
python post_quant.py --config_path=./configs/yolov6s_ptq.yaml --save_dir=./yolov6s_ptq_out
```
- YOLOv7
```shell
python post_quant.py --config_path=./configs/yolov7s_ptq.yaml --save_dir=./yolov7s_ptq_out
```
#### 3.5 测试模型精度
修改 [yolov5s_ptq.yaml](./configs/yolov5s_ptq.yaml)`model_dir`字段为模型存储路径,然后使用eval.py脚本得到模型的mAP:
```shell
export CUDA_VISIBLE_DEVICES=0
python eval.py --config_path=./configs/yolov5s_ptq.yaml
```
#### 3.6 提高离线量化精度
本节介绍如何使用量化分析工具提升离线量化精度。离线量化功能仅需使用少量数据,且使用简单、能快速得到量化模型,但往往会造成较大的精度损失。PaddleSlim提供量化分析工具,会使用接口```paddleslim.quant.AnalysisQuant```,可视化展示出不适合量化的层,通过跳过这些层,提高离线量化模型精度。
由于YOLOv6离线量化效果较差,以YOLOv6为例,量化分析工具具体使用方法如下:
```shell
python analysis.py --config_path=./configs/yolov6s_analysis.yaml
```
如下图,经过量化分析之后,可以发现`conv2d_2.w_0``conv2d_11.w_0``conv2d_15.w_0``conv2d_46.w_0``conv2d_49.w_0` 这些层会导致较大的精度损失。
<p align="center">
<img src="./images/sensitivity_rank.png" width=849 hspace='10'/> <br />
</p>
对比权重直方分布图后,可以发现量化损失较小的层数值分布相对平稳,数值处于-0.25到0.25之间,而量化损失较大的层数值分布非常极端,绝大部分值趋近于0,且数值处于-0.1到0.1之间,尽管看上去都是正太分布,但大量值为0是不利于量化统计scale值的。
<p align="center">
<img src="./images/hist_compare.png" width=849 hspace='10'/> <br />
</p>
经此分析,在进行离线量化时,可以跳过这些导致精度下降较多的层,可使用 [yolov6s_analyzed_ptq.yaml](./configs/yolov6s_analyzed_ptq.yaml),然后再次进行离线量化。跳过这些层后,离线量化精度上升9.4个点。
```shell
python post_quant.py --config_path=./configs/yolov6s_analyzed_ptq.yaml --save_dir=./yolov6s_analyzed_ptq_out
```
## 4.预测部署
## 5.FAQ
- 如果想对模型进行自动压缩,可进入[YOLO系列模型自动压缩示例](https://github.com/PaddlePaddle/PaddleSlim/tree/develop/example/auto_compression/pytorch_yolo_series)中进行实验。
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
import argparse
import paddle
from tqdm import tqdm
from post_process import YOLOv6PostProcess, coco_metric
from dataset import COCOValDataset, COCOTrainDataset
from paddleslim.common import load_config, load_onnx_model
from paddleslim.quant.analysis import AnalysisQuant
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--config_path',
type=str,
default=None,
help="path of analysis config.",
required=True)
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser
def eval_function(exe, compiled_test_program, test_feed_names, test_fetch_list):
bboxes_list, bbox_nums_list, image_id_list = [], [], []
with tqdm(
total=len(val_loader),
bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
ncols=80) as t:
for data in val_loader:
data_all = {k: np.array(v) for k, v in data.items()}
outs = exe.run(compiled_test_program,
feed={test_feed_names[0]: data_all['image']},
fetch_list=test_fetch_list,
return_numpy=False)
res = {}
postprocess = YOLOv6PostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor'])
bboxes_list.append(res['bbox'])
bbox_nums_list.append(res['bbox_num'])
image_id_list.append(np.array(data_all['im_id']))
t.update()
map_res = coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
return map_res[0]
def main():
global config
config = load_config(FLAGS.config_path)
ptq_config = config['PTQ']
input_name = 'x2paddle_image_arrays' if config[
'arch'] == 'YOLOv6' else 'x2paddle_images'
dataset = COCOTrainDataset(
dataset_dir=config['dataset_dir'],
image_dir=config['val_image_dir'],
anno_path=config['val_anno_path'],
input_name=input_name)
data_loader = paddle.io.DataLoader(
dataset, batch_size=1, shuffle=True, drop_last=True, num_workers=0)
global val_loader
dataset = COCOValDataset(
dataset_dir=config['dataset_dir'],
image_dir=config['val_image_dir'],
anno_path=config['val_anno_path'])
global anno_file
anno_file = dataset.ann_file
val_loader = paddle.io.DataLoader(
dataset, batch_size=1, shuffle=False, drop_last=False, num_workers=0)
load_onnx_model(config["model_dir"])
inference_model_path = config["model_dir"].rstrip().rstrip(
'.onnx') + '_infer'
analyzer = AnalysisQuant(
model_dir=inference_model_path,
model_filename='model.pdmodel',
params_filename='model.pdiparams',
eval_function=eval_function,
data_loader=data_loader,
save_dir=config['save_dir'],
ptq_config=ptq_config)
analyzer.analysis()
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices)
main()
arch: YOLOv5
model_dir: ./yolov5s.onnx
dataset_dir: dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
skip_tensors: None # you can set it after analysis
arch: YOLOv6
model_dir: ./yolov6s.onnx
save_dir: ./analysis_results
dataset_dir: /dataset/coco/
val_image_dir: val2017
val_anno_path: annotations/instances_val2017.json
PTQ:
quantizable_op_type: ["conv2d", "depthwise_conv2d"]
weight_quantize_type: 'abs_max'
activation_quantize_type: 'moving_average_abs_max'
is_full_quantize: False
batch_size: 10
batch_nums: 10
arch: YOLOv6
model_dir: ./yolov6s.onnx
dataset_dir: /dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
skip_tensor_list: ['conv2d_2.w_0', 'conv2d_15.w_0', 'conv2d_46.w_0', 'conv2d_11.w_0', 'conv2d_49.w_0']
arch: YOLOv6
model_dir: ./yolov6s.onnx
dataset_dir: /dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
skip_tensor_list: None
arch: YOLOv7
model_dir: ./yolov7s.onnx
dataset_dir: /dataset/coco/
train_image_dir: train2017
val_image_dir: val2017
train_anno_path: annotations/instances_train2017.json
val_anno_path: annotations/instances_val2017.json
from pycocotools.coco import COCO
import cv2
import os
import numpy as np
import paddle
class COCOValDataset(paddle.io.Dataset):
def __init__(self,
dataset_dir=None,
image_dir=None,
anno_path=None,
img_size=[640, 640],
input_name='x2paddle_images'):
self.dataset_dir = dataset_dir
self.image_dir = image_dir
self.img_size = img_size
self.input_name = input_name
self.ann_file = os.path.join(dataset_dir, anno_path)
self.coco = COCO(self.ann_file)
ori_ids = list(sorted(self.coco.imgs.keys()))
# check gt bbox
clean_ids = []
for idx in ori_ids:
ins_anno_ids = self.coco.getAnnIds(imgIds=[idx], iscrowd=False)
instances = self.coco.loadAnns(ins_anno_ids)
num_bbox = 0
for inst in instances:
if inst.get('ignore', False):
continue
if 'bbox' not in inst.keys():
continue
elif not any(np.array(inst['bbox'])):
continue
else:
num_bbox += 1
if num_bbox > 0:
clean_ids.append(idx)
self.ids = clean_ids
def __getitem__(self, idx):
img_id = self.ids[idx]
img = self._get_img_data_from_img_id(img_id)
img, scale_factor = self.image_preprocess(img, self.img_size)
return {
'image': img,
'im_id': np.array([img_id]),
'scale_factor': scale_factor
}
def __len__(self):
return len(self.ids)
def _get_img_data_from_img_id(self, img_id):
img_info = self.coco.loadImgs(img_id)[0]
img_path = os.path.join(self.dataset_dir, self.image_dir,
img_info['file_name'])
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
return img
def _generate_scale(self, im, target_shape, keep_ratio=True):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
origin_shape = im.shape[:2]
if keep_ratio:
im_size_min = np.min(origin_shape)
im_size_max = np.max(origin_shape)
target_size_min = np.min(target_shape)
target_size_max = np.max(target_shape)
im_scale = float(target_size_min) / float(im_size_min)
if np.round(im_scale * im_size_max) > target_size_max:
im_scale = float(target_size_max) / float(im_size_max)
im_scale_x = im_scale
im_scale_y = im_scale
else:
resize_h, resize_w = target_shape
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
def image_preprocess(self, img, target_shape):
# Resize image
im_scale_y, im_scale_x = self._generate_scale(img, target_shape)
img = cv2.resize(
img,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=cv2.INTER_LINEAR)
# Pad
im_h, im_w = img.shape[:2]
h, w = target_shape[:]
if h != im_h or w != im_w:
canvas = np.ones((h, w, 3), dtype=np.float32)
canvas *= np.array([114.0, 114.0, 114.0], dtype=np.float32)
canvas[0:im_h, 0:im_w, :] = img.astype(np.float32)
img = canvas
img = np.transpose(img / 255, [2, 0, 1])
scale_factor = np.array([im_scale_y, im_scale_x])
return img.astype(np.float32), scale_factor
class COCOTrainDataset(COCOValDataset):
def __getitem__(self, idx):
img_id = self.ids[idx]
img = self._get_img_data_from_img_id(img_id)
img, scale_factor = self.image_preprocess(img, self.img_size)
return {self.input_name: img}
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
import argparse
from tqdm import tqdm
import paddle
from paddleslim.common import load_config as load_slim_config
from paddleslim.common import load_inference_model
from post_process import YOLOPostProcess, coco_metric
from dataset import COCOValDataset
def argsparser():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--config_path',
type=str,
default=None,
help="path of compression strategy config.",
required=True)
parser.add_argument(
'--batch_size', type=int, default=1, help="Batch size of model input.")
parser.add_argument(
'--devices',
type=str,
default='gpu',
help="which device used to compress.")
return parser
def eval():
place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
exe = paddle.static.Executor(place)
val_program, feed_target_names, fetch_targets = load_inference_model(
config["model_dir"], exe, "model.pdmodel", "model.pdiparams")
bboxes_list, bbox_nums_list, image_id_list = [], [], []
with tqdm(
total=len(val_loader),
bar_format='Evaluation stage, Run batch:|{bar}| {n_fmt}/{total_fmt}',
ncols=80) as t:
for data in val_loader:
data_all = {k: np.array(v) for k, v in data.items()}
outs = exe.run(val_program,
feed={feed_target_names[0]: data_all['image']},
fetch_list=fetch_targets,
return_numpy=False)
postprocess = YOLOPostProcess(
score_threshold=0.001, nms_threshold=0.65, multi_label=True)
res = postprocess(np.array(outs[0]), data_all['scale_factor'])
bboxes_list.append(res['bbox'])
bbox_nums_list.append(res['bbox_num'])
image_id_list.append(np.array(data_all['im_id']))
t.update()
coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list)
def main():
global config
config = load_slim_config(FLAGS.config_path)
global val_loader
dataset = COCOValDataset(
dataset_dir=config['dataset_dir'],
image_dir=config['val_image_dir'],
anno_path=config['val_anno_path'])
global anno_file
anno_file = dataset.ann_file
val_loader = paddle.io.DataLoader(
dataset, batch_size=FLAGS.batch_size, drop_last=True)
eval()
if __name__ == '__main__':
paddle.enable_static()
parser = argsparser()
FLAGS = parser.parse_args()
assert FLAGS.devices in ['cpu', 'gpu', 'xpu', 'npu']
paddle.set_device(FLAGS.devices)
main()
...@@ -14,6 +14,8 @@ ...@@ -14,6 +14,8 @@
import numpy as np import numpy as np
import cv2 import cv2
import json
import sys
def box_area(boxes): def box_area(boxes):
...@@ -68,9 +70,9 @@ def nms(boxes, scores, iou_threshold): ...@@ -68,9 +70,9 @@ def nms(boxes, scores, iou_threshold):
return keep return keep
class YOLOv6PostProcess(object): class YOLOPostProcess(object):
""" """
Post process of YOLOv6 network. Post process of YOLO serise network.
args: args:
score_threshold(float): Threshold to filter out bounding boxes with low score_threshold(float): Threshold to filter out bounding boxes with low
confidence score. If not provided, consider all boxes. confidence score. If not provided, consider all boxes.
...@@ -157,8 +159,8 @@ class YOLOv6PostProcess(object): ...@@ -157,8 +159,8 @@ class YOLOv6PostProcess(object):
if len(pred.shape) == 1: if len(pred.shape) == 1:
pred = pred[np.newaxis, :] pred = pred[np.newaxis, :]
pred_bboxes = pred[:, :4] pred_bboxes = pred[:, :4]
scale_factor = np.tile(scale_factor[i][::-1], (1, 2)) scale = np.tile(scale_factor[i][::-1], (2))
pred_bboxes /= scale_factor pred_bboxes /= scale
bbox = np.concatenate( bbox = np.concatenate(
[ [
pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis], pred[:, -1][:, np.newaxis], pred[:, -2][:, np.newaxis],
...@@ -171,3 +173,59 @@ class YOLOv6PostProcess(object): ...@@ -171,3 +173,59 @@ class YOLOv6PostProcess(object):
bboxs = np.concatenate(bboxs, axis=0) bboxs = np.concatenate(bboxs, axis=0)
box_nums = np.array(box_nums) box_nums = np.array(box_nums)
return {'bbox': bboxs, 'bbox_num': box_nums} return {'bbox': bboxs, 'bbox_num': box_nums}
def coco_metric(anno_file, bboxes_list, bbox_nums_list, image_id_list):
try:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
except:
print(
"[ERROR] Not found pycocotools, please install by `pip install pycocotools`"
)
sys.exit(1)
coco_gt = COCO(anno_file)
cats = coco_gt.loadCats(coco_gt.getCatIds())
clsid2catid = {i: cat['id'] for i, cat in enumerate(cats)}
results = []
for bboxes, bbox_nums, image_id in zip(bboxes_list, bbox_nums_list,
image_id_list):
results += _get_det_res(bboxes, bbox_nums, image_id, clsid2catid)
output = "bbox.json"
with open(output, 'w') as f:
json.dump(results, f)
coco_dt = coco_gt.loadRes(output)
coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
return coco_eval.stats
def _get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map):
det_res = []
k = 0
for i in range(len(bbox_nums)):
cur_image_id = int(image_id[i][0])
det_nums = bbox_nums[i]
for j in range(det_nums):
dt = bboxes[k]
k = k + 1
num_id, score, xmin, ymin, xmax, ymax = dt.tolist()
if int(num_id) < 0:
continue
category_id = label_to_cat_id_map[int(num_id)]
w = xmax - xmin
h = ymax - ymin
bbox = [xmin, ymin, w, h]
dt_res = {
'image_id': cur_image_id,
'category_id': category_id,
'bbox': bbox,
'score': score
}
det_res.append(dt_res)
return det_res
...@@ -17,11 +17,9 @@ import sys ...@@ -17,11 +17,9 @@ import sys
import numpy as np import numpy as np
import argparse import argparse
import paddle import paddle
from ppdet.core.workspace import load_config, merge_config from paddleslim.common import load_config, load_onnx_model
from ppdet.core.workspace import create
from ppdet.metrics import COCOMetric, VOCMetric
from paddleslim.auto_compression.config_helpers import load_config as load_slim_config
from paddleslim.quant import quant_post_static from paddleslim.quant import quant_post_static
from dataset import COCOTrainDataset
def argsparser(): def argsparser():
...@@ -30,7 +28,7 @@ def argsparser(): ...@@ -30,7 +28,7 @@ def argsparser():
'--config_path', '--config_path',
type=str, type=str,
default=None, default=None,
help="path of compression strategy config.", help="path of post training quantization config.",
required=True) required=True)
parser.add_argument( parser.add_argument(
'--save_dir', '--save_dir',
...@@ -48,49 +46,45 @@ def argsparser(): ...@@ -48,49 +46,45 @@ def argsparser():
return parser return parser
def reader_wrapper(reader, input_list):
def gen():
for data in reader:
in_dict = {}
if isinstance(input_list, list):
for input_name in input_list:
in_dict[input_name] = data[input_name]
elif isinstance(input_list, dict):
for input_name in input_list.keys():
in_dict[input_list[input_name]] = data[input_name]
yield in_dict
return gen
def main(): def main():
global global_config global config
all_config = load_slim_config(FLAGS.config_path) config = load_config(FLAGS.config_path)
assert "Global" in all_config, f"Key 'Global' not found in config file. \n{all_config}"
global_config = all_config["Global"]
reader_cfg = load_config(global_config['reader_config'])
train_loader = create('EvalReader')(reader_cfg['TrainDataset'], input_name = 'x2paddle_image_arrays' if config[
reader_cfg['worker_num'], 'arch'] == 'YOLOv6' else 'x2paddle_images'
return_list=True) dataset = COCOTrainDataset(
train_loader = reader_wrapper(train_loader, global_config['input_list']) dataset_dir=config['dataset_dir'],
image_dir=config['val_image_dir'],
anno_path=config['val_anno_path'],
input_name=input_name)
train_loader = paddle.io.DataLoader(
dataset, batch_size=1, shuffle=True, drop_last=True, num_workers=0)
place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace() place = paddle.CUDAPlace(0) if FLAGS.devices == 'gpu' else paddle.CPUPlace()
exe = paddle.static.Executor(place) exe = paddle.static.Executor(place)
# since the type pf model converted from pytorch is onnx,
# use load_onnx_model firstly and rename the model_dir
load_onnx_model(config["model_dir"])
inference_model_path = config["model_dir"].rstrip().rstrip(
'.onnx') + '_infer'
quant_post_static( quant_post_static(
executor=exe, executor=exe,
model_dir=global_config["model_dir"], model_dir=inference_model_path,
quantize_model_path=FLAGS.save_dir, quantize_model_path=FLAGS.save_dir,
data_loader=train_loader, data_loader=train_loader,
model_filename=global_config["model_filename"], model_filename='model.pdmodel',
params_filename=global_config["params_filename"], params_filename='model.pdiparams',
batch_size=32, batch_size=32,
batch_nums=10, batch_nums=10,
algo=FLAGS.algo, algo=FLAGS.algo,
hist_percent=0.999, hist_percent=0.999,
is_full_quantize=False, is_full_quantize=False,
bias_correction=False, bias_correction=False,
onnx_format=False) onnx_format=True,
skip_tensor_list=config['skip_tensor_list']
if 'skip_tensor_list' in config else None)
if __name__ == '__main__': if __name__ == '__main__':
......
...@@ -135,8 +135,8 @@ def save_cls_model(model, input_shape, save_dir, data_type): ...@@ -135,8 +135,8 @@ def save_cls_model(model, input_shape, save_dir, data_type):
weight_bits=8, weight_bits=8,
activation_bits=8) activation_bits=8)
model_file = os.path.join(quantize_model_path, '__model__') model_file = os.path.join(quantize_model_path, 'model.pdmodel')
param_file = os.path.join(quantize_model_path, '__params__') param_file = os.path.join(quantize_model_path, 'model.pdiparams')
return model_file, param_file return model_file, param_file
......
...@@ -95,10 +95,10 @@ class TableLatencyPredictor(LatencyPredictor): ...@@ -95,10 +95,10 @@ class TableLatencyPredictor(LatencyPredictor):
raise NotImplementedError( raise NotImplementedError(
'latency predictor does NOT support running on Windows.') 'latency predictor does NOT support running on Windows.')
elif platform.system().lower() == 'darwin': elif platform.system().lower() == 'darwin':
py_verion = platform.python_version().split('.') py_version = platform.python_version().split('.')
if int(py_version[0]) != 3 or int(py_version[1]) != 9: if int(py_version[0]) != 3 or int(py_version[1]) != 9:
raise NotImplementedError( raise NotImplementedError(
'latency predictor does NOT support running on macOS when python version is not 3.9.' 'Latency predictor does NOT support running on macOS when python version is not 3.9.'
) )
_logger.info("pip install paddleslim-opt-tools") _logger.info("pip install paddleslim-opt-tools")
......
...@@ -19,8 +19,14 @@ from .config_helpers import * ...@@ -19,8 +19,14 @@ from .config_helpers import *
from .utils import * from .utils import *
__all__ = [ __all__ = [
"AutoCompression", "Quantization", "Distillation", "AutoCompression",
"MultiTeacherDistillation", "HyperParameterOptimization", "Prune", "Quantization",
"UnstructurePrune", "ProgramInfo", "TrainConfig", "save_config", "Distillation",
"load_config", "predict_compressed_model" "MultiTeacherDistillation",
"HyperParameterOptimization",
"Prune",
"UnstructurePrune",
"ProgramInfo",
"TrainConfig",
"predict_compressed_model",
] ]
...@@ -47,8 +47,10 @@ default_hpo_config = { ...@@ -47,8 +47,10 @@ default_hpo_config = {
# default quant config, can be used by ptq&hpo and qat&distillation # default quant config, can be used by ptq&hpo and qat&distillation
default_quant_config = { default_quant_config = {
'quantize_op_types': 'quantize_op_types': [
['conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2'], 'conv2d', 'depthwise_conv2d', 'conv2d_transpose', 'mul', 'matmul',
'matmul_v2'
],
'weight_bits': 8, 'weight_bits': 8,
'activation_bits': 8, 'activation_bits': 8,
"is_full_quantize": False, "is_full_quantize": False,
......
...@@ -29,13 +29,15 @@ from ..quant.quanter import convert, quant_post ...@@ -29,13 +29,15 @@ from ..quant.quanter import convert, quant_post
from ..common.recover_program import recover_inference_program from ..common.recover_program import recover_inference_program
from ..common import get_logger from ..common import get_logger
from ..common.patterns import get_patterns from ..common.patterns import get_patterns
from ..common.load_model import load_inference_model, get_model_dir, export_onnx
from ..common.dataloader import wrap_dataloader, get_feed_vars
from ..common.config_helper import load_config
from ..analysis import TableLatencyPredictor from ..analysis import TableLatencyPredictor
from .create_compressed_program import build_distill_program, build_quant_program, build_prune_program, remove_unused_var_nodes from .create_compressed_program import build_distill_program, build_quant_program, build_prune_program, remove_unused_var_nodes
from .strategy_config import TrainConfig, ProgramInfo, merge_config from .strategy_config import TrainConfig, ProgramInfo, merge_config
from .auto_strategy import prepare_strategy, get_final_quant_config, create_strategy_config, create_train_config from .auto_strategy import prepare_strategy, get_final_quant_config, create_strategy_config, create_train_config
from .config_helpers import load_config, extract_strategy_config, extract_train_config from .config_helpers import extract_strategy_config, extract_train_config
from .utils.predict import with_variable_shape from .utils.predict import with_variable_shape
from .utils import get_feed_vars, wrap_dataloader, load_inference_model
_logger = get_logger(__name__, level=logging.INFO) _logger = get_logger(__name__, level=logging.INFO)
...@@ -49,10 +51,10 @@ except Exception as e: ...@@ -49,10 +51,10 @@ except Exception as e:
class AutoCompression: class AutoCompression:
def __init__(self, def __init__(self,
model_dir, model_dir,
model_filename,
params_filename,
save_dir,
train_dataloader, train_dataloader,
model_filename=None,
params_filename=None,
save_dir='./output',
config=None, config=None,
input_shapes=None, input_shapes=None,
target_speedup=None, target_speedup=None,
...@@ -66,13 +68,13 @@ class AutoCompression: ...@@ -66,13 +68,13 @@ class AutoCompression:
model_dir(str): The path of inference model that will be compressed, and model_dir(str): The path of inference model that will be compressed, and
the model and params that saved by ``paddle.static.save_inference_model`` the model and params that saved by ``paddle.static.save_inference_model``
are under the path. are under the path.
train_dataloader(Python Generator, Paddle.io.DataLoader): The
Generator or Dataloader provides train data, and it could
return a batch every time.
model_filename(str): The name of model file. model_filename(str): The name of model file.
params_filename(str): The name of params file. params_filename(str): The name of params file.
save_dir(str): The path to save compressed model. The models in this directory will be overwrited save_dir(str): The path to save compressed model. The models in this directory will be overwrited
after calling 'compress()' function. after calling 'compress()' function.
train_data_loader(Python Generator, Paddle.io.DataLoader): The
Generator or Dataloader provides train data, and it could
return a batch every time.
input_shapes(dict|tuple|list): It is used when the model has implicit dimensions except batch size. input_shapes(dict|tuple|list): It is used when the model has implicit dimensions except batch size.
If it is a dict, the key is the name of input and the value is the shape. If it is a dict, the key is the name of input and the value is the shape.
Given the input shape of input "X" is [-1, 3, -1, -1] which means the batch size, hight Given the input shape of input "X" is [-1, 3, -1, -1] which means the batch size, hight
...@@ -117,18 +119,8 @@ class AutoCompression: ...@@ -117,18 +119,8 @@ class AutoCompression:
deploy_hardware(str, optional): The hardware you want to deploy. Default: 'gpu'. deploy_hardware(str, optional): The hardware you want to deploy. Default: 'gpu'.
""" """
self.model_dir = model_dir.rstrip('/') self.model_dir = model_dir.rstrip('/')
self.updated_model_dir, self.model_filename, self.params_filename = get_model_dir(
if model_filename == 'None': model_dir, model_filename, params_filename)
model_filename = None
self.model_filename = model_filename
if params_filename == 'None':
params_filename = None
self.params_filename = params_filename
if params_filename is None and model_filename is not None:
raise NotImplementedError(
"NOT SUPPORT parameters saved in separate files. Please convert it to single binary file first."
)
self.final_dir = save_dir self.final_dir = save_dir
if not os.path.exists(self.final_dir): if not os.path.exists(self.final_dir):
...@@ -163,8 +155,7 @@ class AutoCompression: ...@@ -163,8 +155,7 @@ class AutoCompression:
paddle.enable_static() paddle.enable_static()
self._exe, self._places = self._prepare_envs() self._exe, self._places = self._prepare_envs()
self.model_type = self._get_model_type(self._exe, self.model_dir, self.model_type = self._get_model_type()
model_filename, params_filename)
if self.train_config is not None and self.train_config.use_fleet: if self.train_config is not None and self.train_config.use_fleet:
fleet.init(is_collective=True) fleet.init(is_collective=True)
...@@ -249,8 +240,8 @@ class AutoCompression: ...@@ -249,8 +240,8 @@ class AutoCompression:
paddle.enable_static() paddle.enable_static()
exe = paddle.static.Executor(paddle.CPUPlace()) exe = paddle.static.Executor(paddle.CPUPlace())
[inference_program, feed_target_names, [inference_program, feed_target_names,
fetch_targets] = (load_inference_model(model_dir, exe, model_filename, fetch_targets] = load_inference_model(model_dir, exe, model_filename,
params_filename)) params_filename)
if type(input_shapes) in [list, tuple]: if type(input_shapes) in [list, tuple]:
assert len( assert len(
...@@ -310,23 +301,26 @@ class AutoCompression: ...@@ -310,23 +301,26 @@ class AutoCompression:
exe = paddle.static.Executor(places) exe = paddle.static.Executor(places)
return exe, places return exe, places
def _get_model_type(self, exe, model_dir, model_filename, params_filename): def _get_model_type(self):
[inference_program, _, _]= (load_inference_model( \ [inference_program, _, _] = (load_inference_model(
model_dir, \ self.model_dir,
model_filename=model_filename, params_filename=params_filename, model_filename=self.model_filename,
executor=exe)) params_filename=self.params_filename,
executor=self._exe))
_, _, model_type = get_patterns(inference_program) _, _, model_type = get_patterns(inference_program)
if self.model_filename is None: if self.model_filename is None:
new_model_filename = '__new_model__' opt_model_filename = '__opt_model__'
else: else:
new_model_filename = 'new_' + self.model_filename opt_model_filename = 'opt_' + self.model_filename
program_bytes = inference_program._remove_training_info( program_bytes = inference_program._remove_training_info(
clip_extra=False).desc.serialize_to_string() clip_extra=False).desc.serialize_to_string()
with open(os.path.join(self.model_dir, new_model_filename), "wb") as f: with open(
os.path.join(self.updated_model_dir, opt_model_filename),
"wb") as f:
f.write(program_bytes) f.write(program_bytes)
shutil.move( shutil.move(
os.path.join(self.model_dir, new_model_filename), os.path.join(self.updated_model_dir, opt_model_filename),
os.path.join(self.model_dir, self.model_filename)) os.path.join(self.updated_model_dir, self.model_filename))
_logger.info(f"Detect model type: {model_type}") _logger.info(f"Detect model type: {model_type}")
return model_type return model_type
...@@ -603,10 +597,17 @@ class AutoCompression: ...@@ -603,10 +597,17 @@ class AutoCompression:
train_config): train_config):
# start compress, including train/eval model # start compress, including train/eval model
# TODO: add the emd loss of evaluation model. # TODO: add the emd loss of evaluation model.
if self.updated_model_dir != self.model_dir:
# If model is ONNX, convert it to inference model firstly.
load_inference_model(
self.model_dir,
model_filename=self.model_filename,
params_filename=self.params_filename,
executor=self._exe)
if strategy == 'quant_post': if strategy == 'quant_post':
quant_post( quant_post(
self._exe, self._exe,
model_dir=self.model_dir, model_dir=self.updated_model_dir,
quantize_model_path=os.path.join( quantize_model_path=os.path.join(
self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))), self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))),
data_loader=self.train_dataloader, data_loader=self.train_dataloader,
...@@ -632,11 +633,17 @@ class AutoCompression: ...@@ -632,11 +633,17 @@ class AutoCompression:
if platform.system().lower() != 'linux': if platform.system().lower() != 'linux':
raise NotImplementedError( raise NotImplementedError(
"post-quant-hpo is not support in system other than linux") "post-quant-hpo is not support in system other than linux")
if self.updated_model_dir != self.model_dir:
# If model is ONNX, convert it to inference model firstly.
load_inference_model(
self.model_dir,
model_filename=self.model_filename,
params_filename=self.params_filename,
executor=self._exe)
post_quant_hpo.quant_post_hpo( post_quant_hpo.quant_post_hpo(
self._exe, self._exe,
self._places, self._places,
model_dir=self.model_dir, model_dir=self.updated_model_dir,
quantize_model_path=os.path.join( quantize_model_path=os.path.join(
self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))), self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))),
train_dataloader=self.train_dataloader, train_dataloader=self.train_dataloader,
...@@ -707,7 +714,10 @@ class AutoCompression: ...@@ -707,7 +714,10 @@ class AutoCompression:
best_metric = -1.0 best_metric = -1.0
total_epochs = train_config.epochs if train_config.epochs else 100 total_epochs = train_config.epochs if train_config.epochs else 100
total_train_iter = 0 total_train_iter = 0
stop_training = False
for epoch_id in range(total_epochs): for epoch_id in range(total_epochs):
if stop_training:
break
for batch_id, data in enumerate(self.train_dataloader()): for batch_id, data in enumerate(self.train_dataloader()):
np_probs_float, = self._exe.run(train_program_info.program, \ np_probs_float, = self._exe.run(train_program_info.program, \
feed=data, \ feed=data, \
...@@ -753,6 +763,10 @@ class AutoCompression: ...@@ -753,6 +763,10 @@ class AutoCompression:
abs(best_metric - abs(best_metric -
self.metric_before_compressed) self.metric_before_compressed)
) / self.metric_before_compressed <= 0.005: ) / self.metric_before_compressed <= 0.005:
_logger.info(
"The error rate between the compressed model and original model is less than 5%. The training process ends."
)
stop_training = True
break break
else: else:
_logger.info( _logger.info(
...@@ -760,14 +774,18 @@ class AutoCompression: ...@@ -760,14 +774,18 @@ class AutoCompression:
format(epoch_id, metric, best_metric)) format(epoch_id, metric, best_metric))
if train_config.target_metric is not None: if train_config.target_metric is not None:
if metric > float(train_config.target_metric): if metric > float(train_config.target_metric):
stop_training = True
_logger.info(
"The metric of compressed model has reached the target metric. The training process ends."
)
break break
else: else:
_logger.warning( _logger.warning(
"Not set eval function, so unable to test accuracy performance." "Not set eval function, so unable to test accuracy performance."
) )
if train_config.train_iter and total_train_iter >= train_config.train_iter: if (train_config.train_iter and total_train_iter >=
epoch_id = total_epochs train_config.train_iter) or stop_training:
break break
if 'unstructure' in self._strategy or train_config.sparse_model: if 'unstructure' in self._strategy or train_config.sparse_model:
...@@ -787,15 +805,19 @@ class AutoCompression: ...@@ -787,15 +805,19 @@ class AutoCompression:
os.remove(os.path.join(self.tmp_dir, 'best_model.pdopt')) os.remove(os.path.join(self.tmp_dir, 'best_model.pdopt'))
os.remove(os.path.join(self.tmp_dir, 'best_model.pdparams')) os.remove(os.path.join(self.tmp_dir, 'best_model.pdparams'))
if 'qat' in strategy:
test_program, int8_program = convert(test_program, self._places, self._quant_config, \
scope=paddle.static.global_scope(), \
save_int8=True)
model_dir = os.path.join(self.tmp_dir, model_dir = os.path.join(self.tmp_dir,
'strategy_{}'.format(str(strategy_idx + 1))) 'strategy_{}'.format(str(strategy_idx + 1)))
if not os.path.exists(model_dir): if not os.path.exists(model_dir):
os.makedirs(model_dir) os.makedirs(model_dir)
if 'qat' in strategy:
test_program = convert(
test_program,
self._places,
self._quant_config,
scope=paddle.static.global_scope(),
save_clip_ranges_path=self.final_dir)
feed_vars = [ feed_vars = [
test_program.global_block().var(name) test_program.global_block().var(name)
for name in test_program_info.feed_target_names for name in test_program_info.feed_target_names
...@@ -816,3 +838,17 @@ class AutoCompression: ...@@ -816,3 +838,17 @@ class AutoCompression:
fetch_vars=test_program_info.fetch_targets, fetch_vars=test_program_info.fetch_targets,
executor=self._exe, executor=self._exe,
program=test_program) program=test_program)
def export_onnx(self,
model_name='quant_model.onnx',
deploy_backend='tensorrt'):
infer_model_path = os.path.join(self.final_dir, self.model_filename)
assert os.path.exists(
infer_model_path), 'Not found {}, please check it.'.format(
infer_model_path)
export_onnx(
self.final_dir,
model_filename=self.model_filename,
params_filename=self.params_filename,
save_file_path=os.path.join(self.final_dir, model_name),
deploy_backend=deploy_backend)
...@@ -14,42 +14,7 @@ ...@@ -14,42 +14,7 @@
import yaml import yaml
import os import os
from paddleslim.auto_compression.strategy_config import * from paddleslim.auto_compression.strategy_config import *
from ..common.config_helper import load_config
__all__ = ['save_config', 'load_config']
def print_arguments(args, level=0):
if level == 0:
print('----------- Running Arguments -----------')
for arg, value in sorted(args.items()):
if isinstance(value, dict):
print('\t' * level, '%s:' % arg)
print_arguments(value, level + 1)
else:
print('\t' * level, '%s: %s' % (arg, value))
if level == 0:
print('------------------------------------------')
def load_config(config):
"""Load configurations from yaml file into dict.
Fields validation is skipped for loading some custom information.
Args:
config(str): The path of configuration file.
Returns:
dict: A dict storing configuration information.
"""
if config is None:
return None
assert isinstance(
config,
str), f"config should be str but got type(config)={type(config)}"
assert os.path.exists(config) and os.path.isfile(
config), f"{config} not found or it is not a file."
with open(config) as f:
cfg = yaml.load(f, Loader=yaml.FullLoader)
print_arguments(cfg)
return cfg
def extract_strategy_config(config): def extract_strategy_config(config):
...@@ -101,12 +66,3 @@ def extract_train_config(config): ...@@ -101,12 +66,3 @@ def extract_train_config(config):
**value) if value is not None else TrainConfig() **value) if value is not None else TrainConfig()
# return default training config when it is not set # return default training config when it is not set
return TrainConfig() return TrainConfig()
def save_config(config, config_path):
"""
convert dict config to yaml.
"""
f = open(config_path, "w")
yaml.dump(config, f)
f.close()
...@@ -23,7 +23,7 @@ from ..dist import * ...@@ -23,7 +23,7 @@ from ..dist import *
from ..common.recover_program import recover_inference_program, _remove_fetch_node from ..common.recover_program import recover_inference_program, _remove_fetch_node
from ..common import get_logger from ..common import get_logger
from .strategy_config import ProgramInfo from .strategy_config import ProgramInfo
from .utils import load_inference_model from ..common.load_model import load_inference_model
_logger = get_logger(__name__, level=logging.INFO) _logger = get_logger(__name__, level=logging.INFO)
__all__ = [ __all__ = [
...@@ -52,7 +52,8 @@ def _create_optimizer(train_config): ...@@ -52,7 +52,8 @@ def _create_optimizer(train_config):
optimizer_builder = train_config['optimizer_builder'] optimizer_builder = train_config['optimizer_builder']
assert isinstance( assert isinstance(
optimizer_builder, dict optimizer_builder, dict
), f"Value of 'optimizer_builder' in train_config should be dict but got {type(optimizer_builder)}" ), "Value of 'optimizer_builder' in train_config should be dict but got {}".format(
type(optimizer_builder))
if 'grad_clip' in optimizer_builder: if 'grad_clip' in optimizer_builder:
g_clip_params = optimizer_builder['grad_clip'] g_clip_params = optimizer_builder['grad_clip']
g_clip_type = g_clip_params.pop('type') g_clip_type = g_clip_params.pop('type')
...@@ -444,9 +445,8 @@ def build_prune_program(executor, ...@@ -444,9 +445,8 @@ def build_prune_program(executor,
"####################channel pruning##########################") "####################channel pruning##########################")
for param in pruned_program.global_block().all_parameters(): for param in pruned_program.global_block().all_parameters():
if param.name in original_shapes: if param.name in original_shapes:
_logger.info( _logger.info("{}, from {} to {}".format(
f"{param.name}, from {original_shapes[param.name]} to {param.shape}" param.name, original_shapes[param.name], param.shape))
)
_logger.info( _logger.info(
"####################channel pruning end##########################") "####################channel pruning end##########################")
train_program_info.program = pruned_program train_program_info.program = pruned_program
......
...@@ -53,7 +53,8 @@ class BaseStrategy: ...@@ -53,7 +53,8 @@ class BaseStrategy:
class Quantization(BaseStrategy): class Quantization(BaseStrategy):
def __init__(self, def __init__(self,
quantize_op_types=[ quantize_op_types=[
'conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2' 'conv2d', 'depthwise_conv2d', 'conv2d_transpose', 'mul',
'matmul', 'matmul_v2'
], ],
weight_bits=8, weight_bits=8,
activation_bits=8, activation_bits=8,
...@@ -65,6 +66,7 @@ class Quantization(BaseStrategy): ...@@ -65,6 +66,7 @@ class Quantization(BaseStrategy):
window_size=10000, window_size=10000,
moving_rate=0.9, moving_rate=0.9,
for_tensorrt=False, for_tensorrt=False,
onnx_format=False,
is_full_quantize=False): is_full_quantize=False):
""" """
Quantization Config. Quantization Config.
...@@ -80,6 +82,7 @@ class Quantization(BaseStrategy): ...@@ -80,6 +82,7 @@ class Quantization(BaseStrategy):
window_size(int): Window size for 'range_abs_max' quantization. Default: 10000. window_size(int): Window size for 'range_abs_max' quantization. Default: 10000.
moving_rate(float): The decay coefficient of moving average. Default: 0.9. moving_rate(float): The decay coefficient of moving average. Default: 0.9.
for_tensorrt(bool): If True, 'quantize_op_types' will be TENSORRT_OP_TYPES. Default: False. for_tensorrt(bool): If True, 'quantize_op_types' will be TENSORRT_OP_TYPES. Default: False.
onnx_format(bool): Whether to export the quantized model with format of ONNX. Default is False.
is_full_quantize(bool): If True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES. Default: False. is_full_quantize(bool): If True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES. Default: False.
""" """
super(Quantization, self).__init__("Quantization") super(Quantization, self).__init__("Quantization")
...@@ -95,6 +98,7 @@ class Quantization(BaseStrategy): ...@@ -95,6 +98,7 @@ class Quantization(BaseStrategy):
self.window_size = window_size self.window_size = window_size
self.moving_rate = moving_rate self.moving_rate = moving_rate
self.for_tensorrt = for_tensorrt self.for_tensorrt = for_tensorrt
self.onnx_format = onnx_format
self.is_full_quantize = is_full_quantize self.is_full_quantize = is_full_quantize
......
...@@ -14,11 +14,5 @@ ...@@ -14,11 +14,5 @@
from __future__ import absolute_import from __future__ import absolute_import
from .predict import predict_compressed_model from .predict import predict_compressed_model
from .dataloader import *
from . import dataloader
from .load_model import *
from . import load_model
__all__ = ["predict_compressed_model"] __all__ = ["predict_compressed_model"]
__all__ += dataloader.__all__
__all__ += load_model.__all__
...@@ -12,7 +12,7 @@ except: ...@@ -12,7 +12,7 @@ except:
TRANSFORM_PASS_OP_TYPES = QuantizationTransformPass._supported_quantizable_op_type TRANSFORM_PASS_OP_TYPES = QuantizationTransformPass._supported_quantizable_op_type
QUANT_DEQUANT_PASS_OP_TYPES = AddQuantDequantPass._supported_quantizable_op_type QUANT_DEQUANT_PASS_OP_TYPES = AddQuantDequantPass._supported_quantizable_op_type
from .load_model import load_inference_model from ...common.load_model import load_inference_model
def post_quant_fake(executor, def post_quant_fake(executor,
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import paddle
__all__ = ['load_inference_model']
def load_inference_model(path_prefix,
executor,
model_filename=None,
params_filename=None):
if model_filename is not None and params_filename is not None:
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=path_prefix,
executor=executor,
model_filename=model_filename,
params_filename=params_filename))
else:
model_name = '.'.join(model_filename.split('.')
[:-1]) if model_filename is not None else 'model'
if os.path.exists(os.path.join(path_prefix, model_name + '.pdmodel')):
model_path_prefix = os.path.join(path_prefix, model_name)
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=model_path_prefix, executor=executor))
else:
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=path_prefix, executor=executor))
return [inference_program, feed_target_names, fetch_targets]
...@@ -4,7 +4,7 @@ import paddle ...@@ -4,7 +4,7 @@ import paddle
from ...analysis import TableLatencyPredictor from ...analysis import TableLatencyPredictor
from .prune_model import get_sparse_model, get_prune_model from .prune_model import get_sparse_model, get_prune_model
from .fake_ptq import post_quant_fake from .fake_ptq import post_quant_fake
from .load_model import load_inference_model from ...common.load_model import load_inference_model
def with_variable_shape(model_dir, model_filename=None, params_filename=None): def with_variable_shape(model_dir, model_filename=None, params_filename=None):
...@@ -53,7 +53,7 @@ def predict_compressed_model(executor, ...@@ -53,7 +53,7 @@ def predict_compressed_model(executor,
latency_dict(dict): The latency latency of the model under various compression strategies. latency_dict(dict): The latency latency of the model under various compression strategies.
""" """
local_rank = paddle.distributed.get_rank() local_rank = paddle.distributed.get_rank()
quant_model_path = f'quant_model_rank_{local_rank}_tmp' quant_model_path = 'quant_model_rank_{}_tmp'.format(local_rank)
prune_model_path = f'prune_model_rank_{local_rank}_tmp' prune_model_path = f'prune_model_rank_{local_rank}_tmp'
sparse_model_path = f'sparse_model_rank_{local_rank}_tmp' sparse_model_path = f'sparse_model_rank_{local_rank}_tmp'
......
...@@ -5,7 +5,7 @@ import paddle ...@@ -5,7 +5,7 @@ import paddle
import paddle.static as static import paddle.static as static
from ...prune import Pruner from ...prune import Pruner
from ...core import GraphWrapper from ...core import GraphWrapper
from .load_model import load_inference_model from ...common.load_model import load_inference_model
__all__ = ["get_sparse_model", "get_prune_model"] __all__ = ["get_sparse_model", "get_prune_model"]
...@@ -19,9 +19,10 @@ def get_sparse_model(executor, places, model_file, param_file, ratio, ...@@ -19,9 +19,10 @@ def get_sparse_model(executor, places, model_file, param_file, ratio,
ratio(float): The ratio to prune the model. ratio(float): The ratio to prune the model.
save_path(str): The save path of pruned model. save_path(str): The save path of pruned model.
""" """
assert os.path.exists(model_file), f'{model_file} does not exist.' assert os.path.exists(model_file), '{} does not exist.'.format(model_file)
assert os.path.exists( assert os.path.exists(
param_file) or param_file is None, f'{param_file} does not exist.' param_file) or param_file is None, '{} does not exist.'.format(
param_file)
paddle.enable_static() paddle.enable_static()
SKIP = ['image', 'feed', 'pool2d_0.tmp_0'] SKIP = ['image', 'feed', 'pool2d_0.tmp_0']
......
...@@ -25,11 +25,16 @@ from .analyze_helper import VarCollector ...@@ -25,11 +25,16 @@ from .analyze_helper import VarCollector
from . import wrapper_function from . import wrapper_function
from . import recover_program from . import recover_program
from . import patterns from . import patterns
from .load_model import load_inference_model, get_model_dir, load_onnx_model, export_onnx
from .dataloader import wrap_dataloader, get_feed_vars
from .config_helper import load_config, save_config
__all__ = [ __all__ = [
'EvolutionaryController', 'SAController', 'get_logger', 'ControllerServer', 'EvolutionaryController', 'SAController', 'get_logger', 'ControllerServer',
'ControllerClient', 'lock', 'unlock', 'cached_reader', 'AvgrageMeter', 'ControllerClient', 'lock', 'unlock', 'cached_reader', 'AvgrageMeter',
'Server', 'Client', 'RLBaseController', 'VarCollector' 'Server', 'Client', 'RLBaseController', 'VarCollector', 'load_onnx_model',
'load_inference_model', 'get_model_dir', 'wrap_dataloader', 'get_feed_vars',
'load_config', 'save_config'
] ]
__all__ += wrapper_function.__all__ __all__ += wrapper_function.__all__
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import yaml
import os
__all__ = ['load_config', 'save_config']
def print_arguments(args, level=0):
if level == 0:
print('----------- Running Arguments -----------')
for arg, value in sorted(args.items()):
if isinstance(value, dict):
print('\t' * level, '%s:' % arg)
print_arguments(value, level + 1)
else:
print('\t' * level, '%s: %s' % (arg, value))
if level == 0:
print('------------------------------------------')
def load_config(config):
"""Load configurations from yaml file into dict.
Fields validation is skipped for loading some custom information.
Args:
config(str): The path of configuration file.
Returns:
dict: A dict storing configuration information.
"""
if config is None:
return None
assert isinstance(
config,
str), f"config should be str but got type(config)={type(config)}"
assert os.path.exists(config) and os.path.isfile(
config), f"{config} not found or it is not a file."
with open(config) as f:
cfg = yaml.load(f, Loader=yaml.FullLoader)
print_arguments(cfg)
return cfg
def save_config(config, config_path):
"""
convert dict config to yaml.
"""
f = open(config_path, "w")
yaml.dump(config, f)
f.close()
...@@ -3,6 +3,7 @@ import time ...@@ -3,6 +3,7 @@ import time
import numpy as np import numpy as np
import paddle import paddle
from collections.abc import Iterable from collections.abc import Iterable
from .load_model import load_inference_model
__all__ = ["wrap_dataloader", "get_feed_vars"] __all__ = ["wrap_dataloader", "get_feed_vars"]
...@@ -13,7 +14,7 @@ def get_feed_vars(model_dir, model_filename, params_filename): ...@@ -13,7 +14,7 @@ def get_feed_vars(model_dir, model_filename, params_filename):
paddle.enable_static() paddle.enable_static()
exe = paddle.static.Executor(paddle.CPUPlace()) exe = paddle.static.Executor(paddle.CPUPlace())
[inference_program, feed_target_names, fetch_targets] = ( [inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model( load_inference_model(
model_dir, model_dir,
exe, exe,
model_filename=model_filename, model_filename=model_filename,
......
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import logging
import os
import shutil
import sys
import pkg_resources as pkg
import paddle
from . import get_logger
_logger = get_logger(__name__, level=logging.INFO)
__all__ = [
'load_inference_model', 'get_model_dir', 'load_onnx_model', 'export_onnx'
]
def load_inference_model(path_prefix,
executor,
model_filename=None,
params_filename=None):
# Load onnx model to Inference model.
if path_prefix.endswith('.onnx'):
inference_program, feed_target_names, fetch_targets = load_onnx_model(
path_prefix)
return [inference_program, feed_target_names, fetch_targets]
# Load Inference model.
# TODO: clean code
if model_filename is not None and model_filename.endswith('.pdmodel'):
model_name = '.'.join(model_filename.split('.')[:-1])
assert os.path.exists(
os.path.join(path_prefix, model_name + '.pdmodel')
), 'Please check {}, or fix model_filename parameter.'.format(
os.path.join(path_prefix, model_name + '.pdmodel'))
assert os.path.exists(
os.path.join(path_prefix, model_name + '.pdiparams')
), 'Please check {}, or fix params_filename parameter.'.format(
os.path.join(path_prefix, model_name + '.pdiparams'))
model_path_prefix = os.path.join(path_prefix, model_name)
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=model_path_prefix, executor=executor))
elif model_filename is not None and params_filename is not None:
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=path_prefix,
executor=executor,
model_filename=model_filename,
params_filename=params_filename))
else:
model_name = '.'.join(model_filename.split('.')
[:-1]) if model_filename is not None else 'model'
if os.path.exists(os.path.join(path_prefix, model_name + '.pdmodel')):
model_path_prefix = os.path.join(path_prefix, model_name)
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=model_path_prefix, executor=executor))
else:
[inference_program, feed_target_names, fetch_targets] = (
paddle.static.load_inference_model(
path_prefix=path_prefix, executor=executor))
return [inference_program, feed_target_names, fetch_targets]
def get_model_dir(model_dir, model_filename, params_filename):
if model_dir.endswith('.onnx'):
updated_model_dir = model_dir.rstrip().rstrip('.onnx') + '_infer'
else:
updated_model_dir = model_dir.rstrip('/')
if model_filename == None:
updated_model_filename = 'model.pdmodel'
else:
updated_model_filename = model_filename
if params_filename == None:
updated_params_filename = 'model.pdiparams'
else:
updated_params_filename = params_filename
if params_filename is None and model_filename is not None:
raise NotImplementedError(
"NOT SUPPORT parameters saved in separate files. Please convert it to single binary file first."
)
return updated_model_dir, updated_model_filename, updated_params_filename
def load_onnx_model(model_path, disable_feedback=False):
assert model_path.endswith(
'.onnx'
), '{} does not end with .onnx suffix and cannot be loaded.'.format(
model_path)
inference_model_path = model_path.rstrip().rstrip('.onnx') + '_infer'
exe = paddle.static.Executor(paddle.CPUPlace())
if os.path.exists(os.path.join(
inference_model_path, 'model.pdmodel')) and os.path.exists(
os.path.join(inference_model_path, 'model.pdiparams')):
val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
os.path.join(inference_model_path, 'model'), exe)
_logger.info('Loaded model from: {}'.format(inference_model_path))
return val_program, feed_target_names, fetch_targets
else:
# onnx to paddle inference model.
assert os.path.exists(
model_path), 'Not found `{}`, please check model path.'.format(
model_path)
try:
pkg.require('x2paddle')
except:
from pip._internal import main
main(['install', 'x2paddle'])
# check onnx installation and version
try:
pkg.require('onnx')
import onnx
version = onnx.version.version
v0, v1, v2 = version.split('.')
version_sum = int(v0) * 100 + int(v1) * 10 + int(v2)
if version_sum < 160:
_logger.error(
"onnx>=1.6.0 is required, please use \"pip install onnx\".")
except:
from pip._internal import main
main(['install', 'onnx==1.12.0'])
from x2paddle.decoder.onnx_decoder import ONNXDecoder
from x2paddle.op_mapper.onnx2paddle.onnx_op_mapper import ONNXOpMapper
from x2paddle.optimizer.optimizer import GraphOptimizer
from x2paddle.utils import ConverterCheck
time_info = int(time.time())
if not disable_feedback:
ConverterCheck(
task="ONNX", time_info=time_info, convert_state="Start").start()
# support distributed convert model
model_idx = paddle.distributed.get_rank(
) if paddle.distributed.get_world_size() > 1 else 0
try:
_logger.info("Now translating model from onnx to paddle.")
model = ONNXDecoder(model_path)
mapper = ONNXOpMapper(model)
mapper.paddle_graph.build()
graph_opt = GraphOptimizer(source_frame="onnx")
graph_opt.optimize(mapper.paddle_graph)
_logger.info("Model optimized.")
onnx2paddle_out_dir = os.path.join(
inference_model_path, 'onnx2paddle_{}'.format(model_idx))
mapper.paddle_graph.gen_model(onnx2paddle_out_dir)
_logger.info("Successfully exported Paddle static graph model!")
if not disable_feedback:
ConverterCheck(
task="ONNX", time_info=time_info,
convert_state="Success").start()
except Exception as e:
_logger.warning(e)
_logger.error(
"x2paddle threw an exception, you can ask for help at: https://github.com/PaddlePaddle/X2Paddle/issues"
)
sys.exit(1)
if paddle.distributed.get_rank() == 0:
shutil.move(
os.path.join(onnx2paddle_out_dir, 'inference_model',
'model.pdmodel'),
os.path.join(inference_model_path, 'model.pdmodel'))
shutil.move(
os.path.join(onnx2paddle_out_dir, 'inference_model',
'model.pdiparams'),
os.path.join(inference_model_path, 'model.pdiparams'))
load_model_path = inference_model_path
else:
load_model_path = os.path.join(onnx2paddle_out_dir,
'inference_model')
paddle.enable_static()
val_program, feed_target_names, fetch_targets = paddle.static.load_inference_model(
os.path.join(load_model_path, 'model'), exe)
_logger.info('Loaded model from: {}'.format(load_model_path))
# Clean up the file storage directory
shutil.rmtree(
os.path.join(inference_model_path, 'onnx2paddle_{}'.format(
model_idx)))
return val_program, feed_target_names, fetch_targets
def export_onnx(model_dir,
model_filename=None,
params_filename=None,
save_file_path='output.onnx',
opset_version=13,
deploy_backend='tensorrt'):
if not model_filename:
model_filename = 'model.pdmodel'
if not params_filename:
params_filename = 'model.pdiparams'
try:
pkg.require('paddle2onnx')
except:
from pip._internal import main
main(['install', 'paddle2onnx==1.0.0rc3'])
import paddle2onnx
paddle2onnx.command.c_paddle_to_onnx(
model_file=os.path.join(model_dir, model_filename),
params_file=os.path.join(model_dir, params_filename),
save_file=save_file_path,
opset_version=opset_version,
enable_onnx_checker=True,
deploy_backend=deploy_backend)
_logger.info('Convert model to ONNX: {}'.format(save_file_path))
...@@ -220,7 +220,7 @@ class PruningPlan(): ...@@ -220,7 +220,7 @@ class PruningPlan():
t_value = param.value().get_tensor() t_value = param.value().get_tensor()
value = np.array(t_value).astype("float32") value = np.array(t_value).astype("float32")
groups = _mask._op.attr('groups') groups = _mask._op.attr('groups')
if dims == 1 and groups is not None and groups > 1 and len( if groups is not None and groups > 1 and len(
value.shape) == 4: value.shape) == 4:
filter_size = value.shape[1] filter_size = value.shape[1]
except_num = np.sum(bool_mask) except_num = np.sum(bool_mask)
...@@ -230,7 +230,6 @@ class PruningPlan(): ...@@ -230,7 +230,6 @@ class PruningPlan():
sub_layer._groups = new_groups sub_layer._groups = new_groups
_logger.info("change groups from {} to {} for {}.". _logger.info("change groups from {} to {} for {}.".
format(groups, new_groups, param.name)) format(groups, new_groups, param.name))
continue
# The name of buffer can not contains "." # The name of buffer can not contains "."
backup_name = param.name.replace(".", "_") + "_backup" backup_name = param.name.replace(".", "_") + "_backup"
......
...@@ -522,7 +522,6 @@ class depthwise_conv2d(PruneWorker): ...@@ -522,7 +522,6 @@ class depthwise_conv2d(PruneWorker):
channel_axis = 1 channel_axis = 1
if data_format == "NHWC": if data_format == "NHWC":
channel_axis = 3 channel_axis = 3
if var == _in_var: if var == _in_var:
assert pruned_axis == channel_axis, "The Input of conv2d can only be pruned at channel axis, but got {}".format( assert pruned_axis == channel_axis, "The Input of conv2d can only be pruned at channel axis, but got {}".format(
pruned_axis) pruned_axis)
...@@ -533,7 +532,6 @@ class depthwise_conv2d(PruneWorker): ...@@ -533,7 +532,6 @@ class depthwise_conv2d(PruneWorker):
"repeat": repeat "repeat": repeat
}]) }])
# kernel_number * groups will be pruned by reducing groups # kernel_number * groups will be pruned by reducing groups
self.append_pruned_vars(_filter, 1, transforms)
self._visit_and_search(_filter, 0, transforms + [{ self._visit_and_search(_filter, 0, transforms + [{
"repeat": repeat "repeat": repeat
}]) }])
...@@ -546,14 +544,13 @@ class depthwise_conv2d(PruneWorker): ...@@ -546,14 +544,13 @@ class depthwise_conv2d(PruneWorker):
}]) }])
elif var == _filter: elif var == _filter:
assert pruned_axis == 0, "The filter of depthwise conv2d can only be pruned at axis 0." assert pruned_axis == 0, "The filter of depthwise conv2d can only be pruned at axis 0."
self.append_pruned_vars(_filter, 1, transforms) self.append_pruned_vars(_filter, 0, transforms)
self._visit_and_search(_in_var, channel_axis, transforms) self._visit_and_search(_in_var, channel_axis, transforms)
self._visit_and_search(_out, channel_axis, transforms) self._visit_and_search(_out, channel_axis, transforms)
elif var == _out: elif var == _out:
assert pruned_axis == channel_axis, "The Input of conv2d can only be pruned at channel axis, but got {}".format( assert pruned_axis == channel_axis, "The Input of conv2d can only be pruned at channel axis, but got {}".format(
pruned_axis) pruned_axis)
self.append_pruned_vars(_filter, 0, transforms) self.append_pruned_vars(_filter, 0, transforms)
self.append_pruned_vars(_filter, 1, transforms)
self._visit_and_search(_filter, 0, transforms) self._visit_and_search(_filter, 0, transforms)
# It will not pruning number of kernels in depthwise conv2d, # It will not pruning number of kernels in depthwise conv2d,
# so it is not neccesary to search succeed operators. # so it is not neccesary to search succeed operators.
......
...@@ -117,7 +117,7 @@ class Pruner(): ...@@ -117,7 +117,7 @@ class Pruner():
_groups = 1 _groups = 1
if not lazy: if not lazy:
# update groups of conv2d # update groups of conv2d
if pruned_axis == 1: if pruned_axis == 1 or pruned_axis == 0:
for op in param.outputs(): for op in param.outputs():
if op.type() in [ if op.type() in [
"conv2d", "conv2d_grad", "depthwise_conv2d", "conv2d", "conv2d_grad", "depthwise_conv2d",
...@@ -132,7 +132,7 @@ class Pruner(): ...@@ -132,7 +132,7 @@ class Pruner():
f"change groups of {op.type()}({param.name()}) from {op.attr('groups')} to {new_groups};" f"change groups of {op.type()}({param.name()}) from {op.attr('groups')} to {new_groups};"
) )
op.set_attr("groups", new_groups) op.set_attr("groups", new_groups)
if _groups == 1:
origin_shape = copy.deepcopy(param.shape()) origin_shape = copy.deepcopy(param.shape())
if param_shape_backup is not None: if param_shape_backup is not None:
param_shape_backup[param.name()] = origin_shape param_shape_backup[param.name()] = origin_shape
......
...@@ -35,7 +35,7 @@ try: ...@@ -35,7 +35,7 @@ try:
except Exception as e: except Exception as e:
_logger.warning(e) _logger.warning(e)
_logger.warning( _logger.warning(
f"If you want to use training-aware and post-training quantization, " "If you want to use training-aware and post-training quantization, "
"please use Paddle >= {min_paddle_version} or develop version") "please use Paddle >= {} or develop version".format(min_paddle_version))
from .quant_embedding import quant_embedding from .quant_embedding import quant_embedding
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import pickle
import copy
import logging
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
import paddle
from paddle.fluid import core
from paddle.fluid import framework
from paddle.fluid.framework import IrGraph
from paddle.fluid.executor import global_scope
from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
from paddle.fluid.contrib.slim.quantization.utils import _get_op_input_var_names, load_variable_data
from .quanter import quant_post
from ..core import GraphWrapper
from ..common import get_logger
from ..common import get_feed_vars, wrap_dataloader, load_inference_model, get_model_dir
_logger = get_logger(__name__, level=logging.INFO)
__all__ = ["AnalysisQuant"]
class AnalysisQuant(object):
def __init__(self,
model_dir,
model_filename=None,
params_filename=None,
eval_function=None,
data_loader=None,
save_dir='analysis_results',
checkpoint_name='analysis_checkpoint.pkl',
num_histogram_plots=10,
ptq_config=None):
"""
AnalysisQuant provides to analysis the sensitivity of each op in the model.
Args:
model_dir(str): the path of fp32 model that will be quantized, it can also be '.onnx'
model_filename(str, optional): the model file name of the fp32 model
params_filename(str, optional): the parameter file name of the fp32 model
eval_function(function): eval function, define by yourself to return the metric of the inference program, can be used to judge the metric of quantized model. (TODO: optional)
data_loader(Python Generator, Paddle.io.DataLoader, optional): the
Generator or Dataloader provides calibrate data, and it could
return a batch every time
save_dir(str, optional): the output dir that stores the analyzed information
checkpoint_name(str, optional): the name of checkpoint file that saves analyzed information and avoids break off while ananlyzing
ptq_config(dict, optional): the args that can initialize PostTrainingQuantization
"""
if model_filename is None:
model_filename = 'model.pdmodel'
if params_filename is None:
params_filename = 'model.pdiparams'
self.model_dir = model_dir
self.model_filename = model_filename
self.params_filename = params_filename
self.histogram_bins = 1000
self.save_dir = save_dir
self.eval_function = eval_function
self.quant_layer_names = []
self.checkpoint_name = os.path.join(save_dir, checkpoint_name)
self.quant_layer_metrics = {}
self.num_histogram_plots = num_histogram_plots
self.ptq_config = ptq_config
self.batch_nums = ptq_config[
'batch_nums'] if 'batch_nums' in ptq_config else 10
if not os.path.exists(self.save_dir):
os.mkdir(self.save_dir)
devices = paddle.device.get_device().split(':')[0]
self.places = paddle.device._convert_to_place(devices)
executor = paddle.static.Executor(self.places)
# load model
[program, self.feed_list, self.fetch_list]= load_inference_model( \
self.model_dir, \
executor=executor, \
model_filename=self.model_filename, \
params_filename=self.params_filename)
# create data_loader
self.data_loader = wrap_dataloader(data_loader, self.feed_list)
# evaluate before quant
# TODO: self.eval_function can be None
if self.eval_function is not None:
self.base_metric = self.eval_function(
executor, program, self.feed_list, self.fetch_list)
_logger.info('before quantized, the accuracy of the model is: {}'.
format(self.base_metric))
# quant and evaluate after quant (skip_list = None)
post_training_quantization = PostTrainingQuantization(
executor=executor,
data_loader=self.data_loader,
model_dir=self.model_dir,
model_filename=self.model_filename,
params_filename=self.params_filename,
skip_tensor_list=None,
algo='avg', #fastest
**self.ptq_config)
program = post_training_quantization.quantize()
self.quant_metric = self.eval_function(executor, program,
self.feed_list, self.fetch_list)
_logger.info('after quantized, the accuracy of the model is: {}'.format(
self.quant_metric))
# get quantized weight and act var name
self.quantized_weight_var_name = post_training_quantization._quantized_weight_var_name
self.quantized_act_var_name = post_training_quantization._quantized_act_var_name
executor.close()
# load tobe_analyized_layer from checkpoint
self.load_checkpoint()
self.tobe_analyized_layer = self.quantized_weight_var_name - set(
list(self.quant_layer_metrics.keys()))
self.tobe_analyized_layer = sorted(list(self.tobe_analyized_layer))
def analysis(self):
self.compute_quant_sensitivity()
self.sensitivity_ranklist = sorted(
self.quant_layer_metrics,
key=self.quant_layer_metrics.get,
reverse=False)
_logger.info('Finished computing the sensitivity of the model.')
for name in self.sensitivity_ranklist:
_logger.info("quant layer name: {}, eval metric: {}".format(
name, self.quant_layer_metrics[name]))
analysis_file = os.path.join(self.save_dir, "analysis.txt")
with open(analysis_file, "w") as analysis_ret_f:
for name in self.sensitivity_ranklist:
analysis_ret_f.write(
"quant layer name: {}, eval metric: {}\n".format(
name, self.quant_layer_metrics[name]))
_logger.info('Analysis file is saved in {}'.format(analysis_file))
self.calculate_histogram()
def save_checkpoint(self):
if not os.path.exists(self.save_dir):
os.makedirs(self.save_dir)
with open(self.checkpoint_name, 'wb') as f:
pickle.dump(self.quant_layer_metrics, f)
_logger.info('save checkpoint to {}'.format(self.checkpoint_name))
def load_checkpoint(self):
if not os.path.exists(self.checkpoint_name):
return False
with open(self.checkpoint_name, 'rb') as f:
self.quant_layer_metrics = pickle.load(f)
_logger.info('load checkpoint from {}'.format(self.checkpoint_name))
return True
def compute_quant_sensitivity(self):
'''
For each layer, quantize the weight op and evaluate the quantized model.
'''
for i, layer_name in enumerate(self.tobe_analyized_layer):
_logger.info('checking {}/{} quant model: quant layer {}'.format(
i + 1, len(self.tobe_analyized_layer), layer_name))
skip_list = copy.copy(list(self.quantized_weight_var_name))
skip_list.remove(layer_name)
executor = paddle.static.Executor(self.places)
post_training_quantization = PostTrainingQuantization(
executor=executor,
data_loader=self.data_loader,
model_dir=self.model_dir,
model_filename=self.model_filename,
params_filename=self.params_filename,
skip_tensor_list=skip_list,
algo='avg', #fastest
**self.ptq_config)
program = post_training_quantization.quantize()
_logger.info('Evaluating...')
quant_metric = self.eval_function(executor, program, self.feed_list,
self.fetch_list)
executor.close()
_logger.info(
"quant layer name: {}, eval metric: {}, the loss caused by this layer: {}".
format(layer_name, quant_metric, self.base_metric -
quant_metric))
self.quant_layer_metrics[layer_name] = quant_metric
self.save_checkpoint()
def get_act_name_by_weight(self, program, weight_names,
persistable_var_names):
act_ops_names = []
for op_name in weight_names:
for block_id in range(len(program.blocks)):
for op in program.blocks[block_id].ops:
var_name_list = _get_op_input_var_names(op)
if op_name in var_name_list:
for var_name in var_name_list:
if var_name not in persistable_var_names:
act_ops_names.append(var_name)
return act_ops_names
def get_hist_ops_name(self, graph, program):
if self.num_histogram_plots <= 0:
return []
best_weight_ops = self.sensitivity_ranklist[::-1][:self.
num_histogram_plots]
worst_weight_ops = self.sensitivity_ranklist[:self.num_histogram_plots]
persistable_var_names = []
for var in program.list_vars():
if var.persistable:
persistable_var_names.append(var.name)
best_act_ops = self.get_act_name_by_weight(program, best_weight_ops,
persistable_var_names)
worst_act_ops = self.get_act_name_by_weight(program, worst_weight_ops,
persistable_var_names)
return [best_weight_ops, best_act_ops, worst_weight_ops, worst_act_ops]
def collect_ops_histogram(self, scope, ops):
hist = {}
for var_name in ops:
var_tensor = load_variable_data(scope, var_name)
var_tensor = np.array(var_tensor)
min_v = float(np.min(var_tensor))
max_v = float(np.max(var_tensor))
var_tensor = var_tensor.flatten()
_, hist_edges = np.histogram(
var_tensor.copy(),
bins=self.histogram_bins,
range=(min_v, max_v))
hist[var_name] = [var_tensor, hist_edges]
return hist
def calculate_histogram(self):
'''
Sample histograms for the weight and corresponding act tensors
'''
devices = paddle.device.get_device().split(':')[0]
places = paddle.device._convert_to_place(devices)
executor = paddle.static.Executor(places)
[program, feed_list, fetch_list]= load_inference_model( \
self.model_dir, \
executor=executor, \
model_filename=self.model_filename, \
params_filename=self.params_filename)
scope = global_scope()
graph = IrGraph(core.Graph(program.desc), for_test=False)
ops_tobe_draw_hist = self.get_hist_ops_name(graph, program)
if not ops_tobe_draw_hist:
return
for var in program.list_vars():
if var.name in self.quantized_act_var_name:
var.persistable = True
# sample before collect histogram
batch_id = 0
for data in self.data_loader():
executor.run(program=program,
feed=data,
fetch_list=fetch_list,
return_numpy=False,
scope=scope)
batch_id += 1
if batch_id >= self.batch_nums:
break
pdf_names = [
'best_weight_hist_result.pdf',
'best_act_hist_result.pdf',
'worst_weight_hist_result.pdf',
'worst_act_hist_result.pdf',
]
for ops, save_pdf_name in zip(ops_tobe_draw_hist, pdf_names):
hist_data = self.collect_ops_histogram(scope, ops)
self.draw_pdf(hist_data, save_pdf_name)
def draw_pdf(self, hist_data, save_pdf_name):
pdf_path = os.path.join(self.save_dir, save_pdf_name)
with PdfPages(pdf_path) as pdf:
for name in hist_data:
plt.hist(hist_data[name][0], bins=hist_data[name][1])
plt.xlabel(name)
plt.ylabel("frequency")
plt.title("Hist of variable {}".format(name))
plt.show()
pdf.savefig()
plt.close()
_logger.info('Histogram plot is saved in {}'.format(pdf_path))
...@@ -29,13 +29,7 @@ import shutil ...@@ -29,13 +29,7 @@ import shutil
import glob import glob
from scipy.stats import wasserstein_distance from scipy.stats import wasserstein_distance
# smac import pkg_resources as pkg
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
UniformFloatHyperparameter, UniformIntegerHyperparameter
from smac.configspace import ConfigurationSpace
from smac.facade.smac_hpo_facade import SMAC4HPO
from smac.scenario.scenario import Scenario
from paddleslim.common import get_logger from paddleslim.common import get_logger
from paddleslim.quant import quant_post from paddleslim.quant import quant_post
...@@ -417,6 +411,18 @@ def quant_post_hpo( ...@@ -417,6 +411,18 @@ def quant_post_hpo(
None None
""" """
try:
pkg.require('smac')
except:
from pip._internal import main
main(['install', 'smac'])
# smac
from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
UniformFloatHyperparameter, UniformIntegerHyperparameter
from smac.configspace import ConfigurationSpace
from smac.facade.smac_hpo_facade import SMAC4HPO
from smac.scenario.scenario import Scenario
global g_quant_config global g_quant_config
g_quant_config = QuantConfig( g_quant_config = QuantConfig(
executor, place, model_dir, quantize_model_path, algo, hist_percent, executor, place, model_dir, quantize_model_path, algo, hist_percent,
......
...@@ -16,9 +16,14 @@ import os ...@@ -16,9 +16,14 @@ import os
import copy import copy
import json import json
import logging import logging
import collections
import numpy as np
import paddle import paddle
from paddle.fluid import core
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.framework import IrGraph from paddle.fluid.framework import IrGraph
from paddle.fluid.contrib.slim.quantization import WeightQuantization
from paddle.fluid.contrib.slim.quantization import QuantizationTransformPass from paddle.fluid.contrib.slim.quantization import QuantizationTransformPass
from paddle.fluid.contrib.slim.quantization import QuantizationFreezePass from paddle.fluid.contrib.slim.quantization import QuantizationFreezePass
from paddle.fluid.contrib.slim.quantization import ConvertToInt8Pass from paddle.fluid.contrib.slim.quantization import ConvertToInt8Pass
...@@ -27,18 +32,17 @@ from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization ...@@ -27,18 +32,17 @@ from paddle.fluid.contrib.slim.quantization import PostTrainingQuantization
from paddle.fluid.contrib.slim.quantization import AddQuantDequantPass from paddle.fluid.contrib.slim.quantization import AddQuantDequantPass
from paddle.fluid.contrib.slim.quantization import OutScaleForTrainingPass from paddle.fluid.contrib.slim.quantization import OutScaleForTrainingPass
from paddle.fluid.contrib.slim.quantization import OutScaleForInferencePass from paddle.fluid.contrib.slim.quantization import OutScaleForInferencePass
from ..common import get_logger
_logger = get_logger(__name__, level=logging.INFO)
try: try:
from paddle.fluid.contrib.slim.quantization import QuantizationTransformPassV2 from paddle.fluid.contrib.slim.quantization import QuantizationTransformPassV2
from paddle.fluid.contrib.slim.quantization import QuantWeightPass from paddle.fluid.contrib.slim.quantization import QuantWeightPass
from paddle.fluid.contrib.slim.quantization import AddQuantDequantPassV2 from paddle.fluid.contrib.slim.quantization import AddQuantDequantPassV2
except: except:
pass _logger.warning(
from paddle.fluid import core "Some functions fail to import, please update PaddlePaddle version to 2.3+"
from paddle.fluid.contrib.slim.quantization import WeightQuantization )
from paddle.fluid.layer_helper import LayerHelper
from ..common import get_logger
_logger = get_logger(__name__, level=logging.INFO)
WEIGHT_QUANTIZATION_TYPES = [ WEIGHT_QUANTIZATION_TYPES = [
'abs_max', 'channel_wise_abs_max', 'range_abs_max', 'moving_average_abs_max' 'abs_max', 'channel_wise_abs_max', 'range_abs_max', 'moving_average_abs_max'
...@@ -91,27 +95,54 @@ _quant_config_default = { ...@@ -91,27 +95,54 @@ _quant_config_default = {
# if True, 'quantize_op_types' will be TENSORRT_OP_TYPES # if True, 'quantize_op_types' will be TENSORRT_OP_TYPES
'for_tensorrt': False, 'for_tensorrt': False,
# if True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES # if True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES
'is_full_quantize': False 'is_full_quantize': False,
# if True, use onnx format to quant.
'onnx_format': False,
} }
# TODO: Hard-code, remove it when Paddle 2.3.1 class OutScaleForInferencePassV2(object):
class OutScaleForTrainingPassV2(OutScaleForTrainingPass): def __init__(self, scope=None):
def __init__(self, scope=None, place=None, moving_rate=0.9):
OutScaleForTrainingPass.__init__(
self, scope=scope, place=place, moving_rate=moving_rate)
def _scale_name(self, var_name):
""" """
Return the scale name for the var named `var_name`. This pass is used for setting output scales of some operators.
These output scales may be used by tensorRT or some other inference engines.
Args:
scope(fluid.Scope): The scope is used to initialize these new parameters.
""" """
return "%s@scale" % (var_name) self._scope = scope
self._teller_set = utils._out_scale_op_list
def apply(self, graph):
"""
Get output scales from the scope and set these scales in op_descs
of operators in the teller_set.
# TODO: Hard-code, remove it when Paddle 2.3.1 Args:
class OutScaleForInferencePassV2(OutScaleForInferencePass): graph(IrGraph): the target graph.
def __init__(self, scope=None): """
OutScaleForInferencePass.__init__(self, scope=scope) assert isinstance(graph,
IrGraph), 'graph must be the instance of IrGraph.'
collect_dict = collections.OrderedDict()
op_nodes = graph.all_op_nodes()
for op_node in op_nodes:
if op_node.name() in self._teller_set:
var_names = utils._get_op_output_var_names(op_node)
for var_name in var_names:
in_node = graph._find_node_by_name(op_node.outputs,
var_name)
if in_node.dtype() not in \
[core.VarDesc.VarType.FP64, core.VarDesc.VarType.FP32]:
continue
collect_dict[var_name] = {}
scale_name = self._scale_name(var_name)
scale_var = self._scope.find_var(scale_name)
assert scale_var is not None, \
"Can not find {} variable in the scope".format(scale_name)
scale_value = np.array(scale_var.get_tensor())[0]
collect_dict[var_name]['scale'] = float(scale_value)
return graph, collect_dict
def _scale_name(self, var_name): def _scale_name(self, var_name):
""" """
...@@ -222,7 +253,6 @@ def quant_aware(program, ...@@ -222,7 +253,6 @@ def quant_aware(program,
act_preprocess_func=None, act_preprocess_func=None,
optimizer_func=None, optimizer_func=None,
executor=None, executor=None,
onnx_format=False,
return_program=False, return_program=False,
draw_graph=False): draw_graph=False):
"""Add quantization and dequantization operators to "program" """Add quantization and dequantization operators to "program"
...@@ -236,7 +266,9 @@ def quant_aware(program, ...@@ -236,7 +266,9 @@ def quant_aware(program,
Default: None. Default: None.
scope(paddle.static.Scope): Scope records the mapping between variable names and variables, scope(paddle.static.Scope): Scope records the mapping between variable names and variables,
similar to brackets in programming languages. Usually users can use similar to brackets in programming languages. Usually users can use
`paddle.static.global_scope <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_. When ``None`` will use `paddle.static.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ . Default: ``None``. `paddle.static.global_scope <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_.
When ``None`` will use `paddle.static.global_scope() <https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/executor_cn/global_scope_cn.html>`_ .
Default: ``None``.
for_test(bool): If the 'program' parameter is a test program, this parameter should be set to ``True``. for_test(bool): If the 'program' parameter is a test program, this parameter should be set to ``True``.
Otherwise, set to ``False``.Default: False Otherwise, set to ``False``.Default: False
weight_quantize_func(function): Function that defines how to quantize weight. Using this weight_quantize_func(function): Function that defines how to quantize weight. Using this
...@@ -291,7 +323,8 @@ def quant_aware(program, ...@@ -291,7 +323,8 @@ def quant_aware(program,
elif op_type in QUANT_DEQUANT_PASS_OP_TYPES: elif op_type in QUANT_DEQUANT_PASS_OP_TYPES:
quant_dequant_ops.append(op_type) quant_dequant_ops.append(op_type)
if len(transform_pass_ops) > 0: if len(transform_pass_ops) > 0:
trannsform_func = 'QuantizationTransformPassV2' if onnx_format else 'QuantizationTransformPass' trannsform_func = 'QuantizationTransformPassV2' if config[
'onnx_format'] else 'QuantizationTransformPass'
transform_pass = eval(trannsform_func)( transform_pass = eval(trannsform_func)(
scope=scope, scope=scope,
place=place, place=place,
...@@ -313,7 +346,8 @@ def quant_aware(program, ...@@ -313,7 +346,8 @@ def quant_aware(program,
transform_pass.apply(main_graph) transform_pass.apply(main_graph)
if len(quant_dequant_ops) > 0: if len(quant_dequant_ops) > 0:
qdq_func = 'AddQuantDequantPassV2' if onnx_format else 'AddQuantDequantPass' qdq_func = 'AddQuantDequantPassV2' if config[
'onnx_format'] else 'AddQuantDequantPass'
quant_dequant_pass = eval(qdq_func)( quant_dequant_pass = eval(qdq_func)(
scope=scope, scope=scope,
place=place, place=place,
...@@ -323,7 +357,7 @@ def quant_aware(program, ...@@ -323,7 +357,7 @@ def quant_aware(program,
quantizable_op_type=quant_dequant_ops) quantizable_op_type=quant_dequant_ops)
quant_dequant_pass.apply(main_graph) quant_dequant_pass.apply(main_graph)
out_scale_training_pass = OutScaleForTrainingPassV2( out_scale_training_pass = OutScaleForTrainingPass(
scope=scope, place=place, moving_rate=config['moving_rate']) scope=scope, place=place, moving_rate=config['moving_rate'])
out_scale_training_pass.apply(main_graph) out_scale_training_pass.apply(main_graph)
...@@ -356,8 +390,8 @@ def quant_post_static( ...@@ -356,8 +390,8 @@ def quant_post_static(
data_loader=None, data_loader=None,
model_filename=None, model_filename=None,
params_filename=None, params_filename=None,
save_model_filename='__model__', save_model_filename='model.pdmodel',
save_params_filename='__params__', save_params_filename='model.pdiparams',
batch_size=1, batch_size=1,
batch_nums=None, batch_nums=None,
scope=None, scope=None,
...@@ -405,9 +439,9 @@ def quant_post_static( ...@@ -405,9 +439,9 @@ def quant_post_static(
When all parameters are saved in a single file, set it When all parameters are saved in a single file, set it
as filename. If parameters are saved in separate files, as filename. If parameters are saved in separate files,
set it as 'None'. Default : 'None'. set it as 'None'. Default : 'None'.
save_model_filename(str): The name of model file to save the quantized inference program. Default: '__model__'. save_model_filename(str): The name of model file to save the quantized inference program. Default: 'model.pdmodel'.
save_params_filename(str): The name of file to save all related parameters. save_params_filename(str): The name of file to save all related parameters.
If it is set None, parameters will be saved in separate files. Default: '__params__'. If it is set None, parameters will be saved in separate files. Default: 'model.pdiparams'.
batch_size(int, optional): The batch size of DataLoader, default is 1. batch_size(int, optional): The batch size of DataLoader, default is 1.
batch_nums(int, optional): If batch_nums is not None, the number of calibrate batch_nums(int, optional): If batch_nums is not None, the number of calibrate
data is 'batch_size*batch_nums'. If batch_nums is None, use all data data is 'batch_size*batch_nums'. If batch_nums is None, use all data
...@@ -508,6 +542,22 @@ def quant_post_static( ...@@ -508,6 +542,22 @@ def quant_post_static(
quantize_model_path, quantize_model_path,
model_filename=save_model_filename, model_filename=save_model_filename,
params_filename=save_params_filename) params_filename=save_params_filename)
if onnx_format:
try:
collect_dict = post_training_quantization._calibration_scales
save_quant_table_path = os.path.join(quantize_model_path,
'calibration_table.txt')
with open(save_quant_table_path, 'w') as txt_file:
for tensor_name in collect_dict.keys():
write_line = '{} {}'.format(
tensor_name, collect_dict[tensor_name]['scale']) + '\n'
txt_file.write(write_line)
_logger.info("Quantization clip ranges of tensors is save in: {}".
format(save_quant_table_path))
except:
_logger.warning(
"Unable to generate `calibration_table.txt`, please update PaddlePaddle >= 2.3.3"
)
# We have changed the quant_post to quant_post_static. # We have changed the quant_post to quant_post_static.
...@@ -521,7 +571,7 @@ def convert(program, ...@@ -521,7 +571,7 @@ def convert(program,
config=None, config=None,
scope=None, scope=None,
save_int8=False, save_int8=False,
onnx_format=False): save_clip_ranges_path='./'):
""" """
convert quantized and well-trained ``program`` to final quantized convert quantized and well-trained ``program`` to final quantized
``program``that can be used to save ``inference model``. ``program``that can be used to save ``inference model``.
...@@ -543,6 +593,7 @@ def convert(program, ...@@ -543,6 +593,7 @@ def convert(program,
save_int8: Whether to return ``program`` which model parameters' save_int8: Whether to return ``program`` which model parameters'
dtype is ``int8``. This parameter can only be used to dtype is ``int8``. This parameter can only be used to
get model size. Default: ``False``. get model size. Default: ``False``.
save_clip_ranges_path: If config.onnx_format=True, quantization clip ranges will be saved locally.
Returns: Returns:
Tuple : freezed program which can be used for inference. Tuple : freezed program which can be used for inference.
...@@ -560,11 +611,22 @@ def convert(program, ...@@ -560,11 +611,22 @@ def convert(program,
_logger.info("convert config {}".format(config)) _logger.info("convert config {}".format(config))
test_graph = IrGraph(core.Graph(program.desc), for_test=True) test_graph = IrGraph(core.Graph(program.desc), for_test=True)
if onnx_format: if config['onnx_format']:
quant_weight_pass = QuantWeightPass(scope, place) quant_weight_pass = QuantWeightPass(scope, place)
quant_weight_pass.apply(test_graph) quant_weight_pass.apply(test_graph)
else:
out_scale_infer_pass = OutScaleForInferencePassV2(scope=scope) out_scale_infer_pass = OutScaleForInferencePassV2(scope=scope)
_, collect_dict = out_scale_infer_pass.apply(test_graph)
save_quant_table_path = os.path.join(save_clip_ranges_path,
'calibration_table.txt')
with open(save_quant_table_path, 'w') as txt_file:
for tensor_name in collect_dict.keys():
write_line = '{} {}'.format(
tensor_name, collect_dict[tensor_name]['scale']) + '\n'
txt_file.write(write_line)
_logger.info("Quantization clip ranges of tensors is save in: {}".
format(save_quant_table_path))
else:
out_scale_infer_pass = OutScaleForInferencePass(scope=scope)
out_scale_infer_pass.apply(test_graph) out_scale_infer_pass.apply(test_graph)
# Freeze the graph after training by adjusting the quantize # Freeze the graph after training by adjusting the quantize
# operators' order for the inference. # operators' order for the inference.
......
...@@ -8,7 +8,8 @@ import unittest ...@@ -8,7 +8,8 @@ import unittest
import numpy as np import numpy as np
from paddle.io import Dataset from paddle.io import Dataset
from paddleslim.auto_compression import AutoCompression from paddleslim.auto_compression import AutoCompression
from paddleslim.auto_compression.config_helpers import load_config from paddleslim.common import load_config
from paddleslim.common import load_inference_model, export_onnx
class RandomEvalDataset(Dataset): class RandomEvalDataset(Dataset):
...@@ -120,5 +121,36 @@ class TestDictQATDist(ACTBase): ...@@ -120,5 +121,36 @@ class TestDictQATDist(ACTBase):
ac.compress() ac.compress()
class TestLoadONNXModel(ACTBase):
def __init__(self, *args, **kwargs):
super(TestLoadONNXModel, self).__init__(*args, **kwargs)
os.system(
'wget https://paddle-slim-models.bj.bcebos.com/act/yolov5s.onnx')
self.model_dir = 'yolov5s.onnx'
def test_compress(self):
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
_, _, _ = load_inference_model(
self.model_dir,
executor=exe,
model_filename='model.pdmodel',
params_filename='model.paiparams')
# reload model
_, _, _ = load_inference_model(
self.model_dir,
executor=exe,
model_filename='model.pdmodel',
params_filename='model.paiparams')
# convert onnx
export_onnx(
self.model_dir,
model_filename='model.pdmodel',
params_filename='model.paiparams',
save_file_path='output.onnx',
opset_version=13,
deploy_backend='tensorrt')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -473,9 +473,9 @@ class TestDepthwiseConv2d(TestPruneWorker): ...@@ -473,9 +473,9 @@ class TestDepthwiseConv2d(TestPruneWorker):
def set_cases(self): def set_cases(self):
weight_var = self.graph.var('conv1.w_0') weight_var = self.graph.var('conv1.w_0')
self.cases.append((self.in_var, 1, {'conv1.w_0': [0, 1]})) self.cases.append((self.in_var, 1, {'conv1.w_0': [0]}))
self.cases.append((self.out_var, 1, {'conv1.w_0': [0, 1]})) self.cases.append((self.out_var, 1, {'conv1.w_0': [0]}))
self.cases.append((weight_var, 0, {'conv1.w_0': [1]})) self.cases.append((weight_var, 0, {'conv1.w_0': [0]}))
def test_prune(self): def test_prune(self):
self.check_in_out() self.check_in_out()
......
...@@ -132,8 +132,8 @@ class TestQuantAwareCase1(StaticCase): ...@@ -132,8 +132,8 @@ class TestQuantAwareCase1(StaticCase):
quant_post_prog, feed_target_names, fetch_targets = paddle.fluid.io.load_inference_model( quant_post_prog, feed_target_names, fetch_targets = paddle.fluid.io.load_inference_model(
dirname='./test_quant_post_inference', dirname='./test_quant_post_inference',
executor=exe, executor=exe,
model_filename='__model__', model_filename='model.pdmodel',
params_filename='__params__') params_filename='model.pdiparams')
top1_2, top5_2 = test(quant_post_prog, fetch_targets) top1_2, top5_2 = test(quant_post_prog, fetch_targets)
print("before quantization: top1: {}, top5: {}".format(top1_1, top5_1)) print("before quantization: top1: {}, top5: {}".format(top1_1, top5_1))
print("after quantization: top1: {}, top5: {}".format(top1_2, top5_2)) print("after quantization: top1: {}, top5: {}".format(top1_2, top5_2))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册