未验证 提交 e894fa86 编写于 作者: C cuicheng01 提交者: GitHub

Merge branch 'develop' into add_person_demo

...@@ -7,6 +7,8 @@ ...@@ -7,6 +7,8 @@
飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
**近期更新** **近期更新**
- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4037898),具体内容可以在 AI Stuio 上体验。
- 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNet v2](./docs/zh_CN/models/PP-LCNetV2.md)
- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) - 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)
- 2022.1.27 全面升级文档;新增[PaddleServing C++ pipeline部署方式](./deploy/paddleserving)[18M图像识别安卓部署Demo](./deploy/lite_shitu) - 2022.1.27 全面升级文档;新增[PaddleServing C++ pipeline部署方式](./deploy/paddleserving)[18M图像识别安卓部署Demo](./deploy/lite_shitu)
- 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo - 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo
......
...@@ -49,10 +49,15 @@ class ClsPredictor(Predictor): ...@@ -49,10 +49,15 @@ class ClsPredictor(Predictor):
pid = os.getpid() pid = os.getpid()
size = config["PreProcess"]["transform_ops"][1]["CropImage"][ size = config["PreProcess"]["transform_ops"][1]["CropImage"][
"size"] "size"]
if config["Global"].get("use_int8", False):
precision = "int8"
elif config["Global"].get("use_fp16", False):
precision = "fp16"
else:
precision = "fp32"
self.auto_logger = auto_log.AutoLogger( self.auto_logger = auto_log.AutoLogger(
model_name=config["Global"].get("model_name", "cls"), model_name=config["Global"].get("model_name", "cls"),
model_precision='fp16' model_precision=precision,
if config["Global"]["use_fp16"] else 'fp32',
batch_size=config["Global"].get("batch_size", 1), batch_size=config["Global"].get("batch_size", 1),
data_shape=[3, size, size], data_shape=[3, size, size],
save_path=config["Global"].get("save_log_path", save_path=config["Global"].get("save_log_path",
......
...@@ -42,8 +42,22 @@ class Predictor(object): ...@@ -42,8 +42,22 @@ class Predictor(object):
def create_paddle_predictor(self, args, inference_model_dir=None): def create_paddle_predictor(self, args, inference_model_dir=None):
if inference_model_dir is None: if inference_model_dir is None:
inference_model_dir = args.inference_model_dir inference_model_dir = args.inference_model_dir
params_file = os.path.join(inference_model_dir, "inference.pdiparams") if "inference_int8.pdiparams" in os.listdir(inference_model_dir):
model_file = os.path.join(inference_model_dir, "inference.pdmodel") params_file = os.path.join(inference_model_dir,
"inference_int8.pdiparams")
model_file = os.path.join(inference_model_dir,
"inference_int8.pdmodel")
assert args.get(
"use_fp16", False
) is False, "fp16 mode is not supported for int8 model inference, please set use_fp16 as False during inference."
else:
params_file = os.path.join(inference_model_dir,
"inference.pdiparams")
model_file = os.path.join(inference_model_dir, "inference.pdmodel")
assert args.get(
"use_int8", False
) is False, "int8 mode is not supported for fp32 model inference, please set use_int8 as False during inference."
config = Config(model_file, params_file) config = Config(model_file, params_file)
if args.use_gpu: if args.use_gpu:
...@@ -63,12 +77,18 @@ class Predictor(object): ...@@ -63,12 +77,18 @@ class Predictor(object):
config.disable_glog_info() config.disable_glog_info()
config.switch_ir_optim(args.ir_optim) # default true config.switch_ir_optim(args.ir_optim) # default true
if args.use_tensorrt: if args.use_tensorrt:
precision = Config.Precision.Float32
if args.get("use_int8", False):
precision = Config.Precision.Int8
elif args.get("use_fp16", False):
precision = Config.Precision.Half
config.enable_tensorrt_engine( config.enable_tensorrt_engine(
precision_mode=Config.Precision.Half precision_mode=precision,
if args.use_fp16 else Config.Precision.Float32,
max_batch_size=args.batch_size, max_batch_size=args.batch_size,
workspace_size=1 << 30, workspace_size=1 << 30,
min_subgraph_size=30) min_subgraph_size=30,
use_calib_mode=False)
config.enable_memory_optim() config.enable_memory_optim()
# use zero copy # use zero copy
......
...@@ -3,22 +3,49 @@ ...@@ -3,22 +3,49 @@
## 目录 ## 目录
* [1. 概述](#1) * [1. 概述](#1)
* [2. 精度、FLOPs 和参数量](#2) * [2. 结构信息](#2)
* [3. 实验结果](#3)
<a name='1'></a> <a name='1'></a>
## 1. 概述 ## 1. 概述
PP-HGNet是百度自研的一个在 GPU 端上高性能的网络,该网络在 VOVNet 的基础上融合了 ResNet_vd、PPLCNet 的优点,使用了可学习的下采样层,组合成了一个在 GPU 设备上速度快、精度高的网络,超越其他 GPU 端 SOTA 模型 PP-HGNet(High Performance GPU Net) 是百度飞桨视觉团队自研的更适用于 GPU 平台的高性能骨干网络,该网络在 VOVNet 的基础上使用了可学习的下采样层(LDS Layer),融合了 ResNet_vd、PPLCNet 等模型的优点,该模型在 GPU 平台上与其他 SOTA 模型在相同的速度下有着更高的精度。在同等速度下,该模型高于 ResNet34-D 模型 3.8 个百分点,高于 ResNet50-D 模型 2.4 个百分点,在使用百度自研 SSLD 蒸馏策略后,超越 ResNet50-D 模型 4.7 个百分点。与此同时,在相同精度下,其推理速度也远超主流 VisionTransformer 的推理速度
<a name='2'></a> <a name='2'></a>
## 2.精度、FLOPs 和参数量 ## 2. 结构信息
| Models | Top1 | Top5 | FLOPs<br>(G) | Params<br/>(M) | PP-HGNet 作者针对 GPU 设备,对目前 GPU 友好的网络做了分析和归纳,尽可能多的使用 3x3 标准卷积(计算密度最高)。在此将 VOVNet 作为基准模型,将主要的有利于 GPU 推理的改进点进行融合。从而得到一个有利于 GPU 推理的骨干网络,同样速度下,精度大幅超越其他 CNN 或者 VisionTransformer 模型。
|:--:|:--:|:--:|:--:|:--:|
| PPHGNet_tiny | 79.83 | 95.04 | 4.54 | 14.75 |
| PPHGNet_tiny_ssld | 81.95 | 96.12 | 4.54 | 14.75 |
| PPHGNet_small | 81.51 | 95.82 | 8.53 | 24.38 |
关于 Inference speed 等信息,敬请期待。 PP-HGNet 骨干网络的整体结构如下:
![](../../images/PP-HGNet/PP-HGNet.png)
其中,PP-HGNet是由多个HG-Block组成,HG-Block的细节如下:
![](../../images/PP-HGNet/PP-HGNet-block.png)
<a name='3'></a>
## 3. 实验结果
PP-HGNet 与其他模型的比较如下,其中测试机器为 NVIDIA® Tesla® V100,开启 TensorRT 引擎,精度类型为 FP32。在相同速度下,PP-HGNet 精度均超越了其他 SOTA CNN 模型,在与 SwinTransformer 模型的比较中,在更高精度的同时,速度快 2 倍以上。
| Model | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|-------|---------------|---------------|-------------|
| ResNet34 | 74.57 | 92.14 | 1.97 |
| ResNet34_vd | 75.98 | 92.98 | 2.00 |
| EfficientNetB0 | 77.38 | 93.31 | 1.96 |
| <b>PPHGNet_tiny<b> | <b>79.83<b> | <b>95.04<b> | <b>1.77<b> |
| <b>PPHGNet_tiny_ssld<b> | <b>81.95<b> | <b>96.12<b> | <b>1.77<b> |
| ResNet50 | 76.50 | 93.00 | 2.54 |
| ResNet50_vd | 79.12 | 94.44 | 2.60 |
| ResNet50_rsb | 80.40 | | 2.54 |
| EfficientNetB1 | 79.15 | 94.41 | 2.88 |
| SwinTransformer_tiny | 81.2 | 95.5 | 6.59 |
| <b>PPHGNet_small<b> | <b>81.51<b>| <b>95.82<b> | <b>2.52<b> |
| <b>PPHGNet_small_ssld<b> | <b>83.82<b>| <b>96.81<b> | <b>2.52<b> |
关于更多 PP-HGNet 的介绍以及下游任务的表现,敬请期待。
# PP-LCNetV2
---
## 1. 概述
骨干网络对计算机视觉下游任务的影响不言而喻,不仅对下游模型的性能影响很大,而且模型效率也极大地受此影响,但现有的大多骨干网络在真实应用中的效率并不理想,特别是缺乏针对 Intel CPU 平台所优化的骨干网络,我们测试了现有的主流轻量级模型,发现在 Intel CPU 平台上的效率并不理想,然而目前 Intel CPU 平台在工业界仍有大量使用场景,因此我们提出了 PP-LCNet 系列模型,PP-LCNetV2 是在 [PP-LCNetV1](./PP-LCNet.md) 基础上所改进的。
## 2. 设计细节
![](../../images/PP-LCNetV2/net.png)
PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 PP-LCNetV1 的基础上优化而来,主要使用重参数化策略组合了不同大小卷积核的深度卷积,并优化了点卷积、Shortcut等。
### 2.1 Rep 策略
卷积核的大小决定了卷积层感受野的大小,通过组合使用不同大小的卷积核,能够获取不同尺度的特征,因此 PPLCNetV2 在 Stage3、Stage4 中,在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积,同时为了避免对模型效率的影响,使用重参数化(Re parameterization,Rep)策略对同层的 DW 卷积进行融合,如下图所示。
![](../../images/PP-LCNetV2/rep.png)
### 2.2 PW 卷积
深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成,用以替换标准卷积,为了使深度可分离卷积具有更强的拟合能力,我们尝试使用两层 PW 卷积,同时为了控制模型效率不受影响,两层 PW 卷积设置为:第一个在通道维度对特征图压缩,第二个再通过放大还原特征图通道,如下图所示。通过实验发现,该策略能够显著提高模型性能,同时为了平衡对模型效率带来的影响,PPLCNetV2 仅在 Stage4、Stage5 中使用了该策略。
![](../../images/PP-LCNetV2/split_pw.png)
### 2.3 Shortcut
残差结构(residual)自提出以来,被诸多模型广泛使用,但在轻量级卷积神经网络中,由于残差结构所带来的元素级(element-wise)加法操作,会对模型的速度造成影响,我们在 PP-LCNetV2 中,以 Stage 为单位实验了 残差结构对模型的影响,发现残差结构的使用并非一定会带来性能的提高,因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构:在 Block 中增加 Shortcut,如下图所示。
![](../../images/PP-LCNetV2/shortcut.png)
### 2.4 激活函数
在目前的轻量级卷积神经网络中,ReLU、Hard-Swish 激活函数最为常用,虽然在模型性能方面,Hard-Swish 通常更为优秀,然而我们发现部分推理平台对于 Hard-Swish 激活函数的效率优化并不理想,因此为了兼顾通用性,PP-LCNetV2 默认使用了 ReLU 激活函数,并且我们测试发现,ReLU 激活函数对于较大模型的性能影响较小。
### 2.5 SE 模块
虽然 SE 模块能够显著提高模型性能,但其对模型速度的影响同样不可忽视,在 PP-LCNetV1 中,我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中,我们以 Stage 为单位对 SE 模块的位置做了进一步实验,并发现在 Stage3 中使用能够取得更好的平衡。
## 3. 实验结果
在不使用额外数据的前提下,PPLCNetV2_base 模型在图像分类 ImageNet 数据集上能够取得超过 77% 的 Top1 Acc,同时在 Intel CPU 平台的推理时间在 4.4 ms 以下,如下表所示,其中推理时间基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。
| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|-------|-----------|----------|---------------|---------------|-------------|
| MobileNetV3_Large_x1_25 | 7.4 | 714 | 76.4 | 93.00 | 5.19 |
| PPLCNetV2_x2_5 | 9 | 906 | 76.60 | 93.00 | 7.25 |
| <b>PPLCNetV2_base<b> | <b>6.6<b> | <b>604<b> | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> |
关于 PP-LCNetV2 模型的更多信息,敬请关注。
## 人员出入管理
近几年,AI视觉技术在安防、工业制造等场景在产业智能化升级进程中发挥着举足轻重的作用。【进出管控】作为各行业中的关键场景,应用需求十分迫切。 如在居家防盗、机房管控以及景区危险告警等场景中,存在大量对异常目标(人、车或其他物体)不经允许擅自进入规定区域的及时检测需求。利用深度学习视觉技术,可以及时准确地对闯入行为进行识别并发出告警信息。切实保障人员的生命财产安全。相比传统人力监管的方式,不仅可以实现7*24小时不间断的全方位保护,还能极大地降低管理成本,解放劳动力。
但在真实产业中,要实现高精度的人员进出识别不是一件容易的事,在实际场景中存在着各种各样的问题:
**摄像头采集到的图像会受到建筑、机器、车辆等遮挡的影响**
**天气多种多样,要适应白天、黑夜、雾天和雨天等**
针对上述场景,本次飞桨产业实践范例库推出了重点区域人员进出管控实践示例,提供从数据准备、技术方案、模型训练优化,到模型部署的全流程可复用方案,有效解决了不同光照、不同天气等室外复杂环境下的图像分类问题,并且极大地降低了数据标注和算力成本,适用于厂区巡检、家居防盗、景区管理等多个产业应用。
![result](./imgs/someone.gif)
**注**: AI Studio在线运行代码请参考[人员出入管理](https://aistudio.baidu.com/aistudio/projectdetail/4037898)
...@@ -32,7 +32,7 @@ from ppcls.arch.distill.afd_attention import LinearTransformStudent, LinearTrans ...@@ -32,7 +32,7 @@ from ppcls.arch.distill.afd_attention import LinearTransformStudent, LinearTrans
__all__ = ["build_model", "RecModel", "DistillationModel", "AttentionModel"] __all__ = ["build_model", "RecModel", "DistillationModel", "AttentionModel"]
def build_model(config): def build_model(config, mode="train"):
arch_config = copy.deepcopy(config["Arch"]) arch_config = copy.deepcopy(config["Arch"])
model_type = arch_config.pop("name") model_type = arch_config.pop("name")
use_sync_bn = arch_config.pop("use_sync_bn", False) use_sync_bn = arch_config.pop("use_sync_bn", False)
...@@ -43,7 +43,8 @@ def build_model(config): ...@@ -43,7 +43,8 @@ def build_model(config):
if isinstance(arch, TheseusLayer): if isinstance(arch, TheseusLayer):
prune_model(config, arch) prune_model(config, arch)
quantize_model(config, arch) quantize_model(config, arch, mode)
return arch return arch
...@@ -54,6 +55,7 @@ def apply_to_static(config, model): ...@@ -54,6 +55,7 @@ def apply_to_static(config, model):
specs = None specs = None
if 'image_shape' in config['Global']: if 'image_shape' in config['Global']:
specs = [InputSpec([None] + config['Global']['image_shape'])] specs = [InputSpec([None] + config['Global']['image_shape'])]
specs[0].stop_gradient = True
model = to_static(model, input_spec=specs) model = to_static(model, input_spec=specs)
logger.info("Successfully to apply @to_static with specs: {}".format( logger.info("Successfully to apply @to_static with specs: {}".format(
specs)) specs))
......
...@@ -22,6 +22,7 @@ from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19 ...@@ -22,6 +22,7 @@ from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19
from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3 from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3
from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C
from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5 from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5
from ppcls.arch.backbone.legendary_models.pp_lcnet_v2 import PPLCNetV2_base
from ppcls.arch.backbone.legendary_models.esnet import ESNet_x0_25, ESNet_x0_5, ESNet_x0_75, ESNet_x1_0 from ppcls.arch.backbone.legendary_models.esnet import ESNet_x0_25, ESNet_x0_5, ESNet_x0_75, ESNet_x1_0
from ppcls.arch.backbone.legendary_models.pp_hgnet import PPHGNet_tiny, PPHGNet_small, PPHGNet_base from ppcls.arch.backbone.legendary_models.pp_hgnet import PPHGNet_tiny, PPHGNet_small, PPHGNet_base
...@@ -51,7 +52,7 @@ from ppcls.arch.backbone.model_zoo.darknet import DarkNet53 ...@@ -51,7 +52,7 @@ from ppcls.arch.backbone.model_zoo.darknet import DarkNet53
from ppcls.arch.backbone.model_zoo.regnet import RegNetX_200MF, RegNetX_4GF, RegNetX_32GF, RegNetY_200MF, RegNetY_4GF, RegNetY_32GF from ppcls.arch.backbone.model_zoo.regnet import RegNetX_200MF, RegNetX_4GF, RegNetX_32GF, RegNetY_200MF, RegNetY_4GF, RegNetY_32GF
from ppcls.arch.backbone.model_zoo.vision_transformer import ViT_small_patch16_224, ViT_base_patch16_224, ViT_base_patch16_384, ViT_base_patch32_384, ViT_large_patch16_224, ViT_large_patch16_384, ViT_large_patch32_384 from ppcls.arch.backbone.model_zoo.vision_transformer import ViT_small_patch16_224, ViT_base_patch16_224, ViT_base_patch16_384, ViT_base_patch32_384, ViT_large_patch16_224, ViT_large_patch16_384, ViT_large_patch32_384
from ppcls.arch.backbone.model_zoo.distilled_vision_transformer import DeiT_tiny_patch16_224, DeiT_small_patch16_224, DeiT_base_patch16_224, DeiT_tiny_distilled_patch16_224, DeiT_small_distilled_patch16_224, DeiT_base_distilled_patch16_224, DeiT_base_patch16_384, DeiT_base_distilled_patch16_384 from ppcls.arch.backbone.model_zoo.distilled_vision_transformer import DeiT_tiny_patch16_224, DeiT_small_patch16_224, DeiT_base_patch16_224, DeiT_tiny_distilled_patch16_224, DeiT_small_distilled_patch16_224, DeiT_base_distilled_patch16_224, DeiT_base_patch16_384, DeiT_base_distilled_patch16_384
from ppcls.arch.backbone.model_zoo.swin_transformer import SwinTransformer_tiny_patch4_window7_224, SwinTransformer_small_patch4_window7_224, SwinTransformer_base_patch4_window7_224, SwinTransformer_base_patch4_window12_384, SwinTransformer_large_patch4_window7_224, SwinTransformer_large_patch4_window12_384 from ppcls.arch.backbone.legendary_models.swin_transformer import SwinTransformer_tiny_patch4_window7_224, SwinTransformer_small_patch4_window7_224, SwinTransformer_base_patch4_window7_224, SwinTransformer_base_patch4_window12_384, SwinTransformer_large_patch4_window7_224, SwinTransformer_large_patch4_window12_384
from ppcls.arch.backbone.model_zoo.cswin_transformer import CSWinTransformer_tiny_224, CSWinTransformer_small_224, CSWinTransformer_base_224, CSWinTransformer_large_224, CSWinTransformer_base_384, CSWinTransformer_large_384 from ppcls.arch.backbone.model_zoo.cswin_transformer import CSWinTransformer_tiny_224, CSWinTransformer_small_224, CSWinTransformer_base_224, CSWinTransformer_large_224, CSWinTransformer_base_384, CSWinTransformer_large_384
from ppcls.arch.backbone.model_zoo.mixnet import MixNet_S, MixNet_M, MixNet_L from ppcls.arch.backbone.model_zoo.mixnet import MixNet_S, MixNet_M, MixNet_L
from ppcls.arch.backbone.model_zoo.rexnet import ReXNet_1_0, ReXNet_1_3, ReXNet_1_5, ReXNet_2_0, ReXNet_3_0 from ppcls.arch.backbone.model_zoo.rexnet import ReXNet_1_0, ReXNet_1_3, ReXNet_1_5, ReXNet_2_0, ReXNet_3_0
......
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import, division, print_function
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle import ParamAttr
from paddle.nn import AdaptiveAvgPool2D, BatchNorm2D, Conv2D, Dropout, Linear
from paddle.regularizer import L2Decay
from paddle.nn.initializer import KaimingNormal
from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
MODEL_URLS = {
"PPLCNetV2_base":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_pretrained.pdparams",
}
__all__ = list(MODEL_URLS.keys())
NET_CONFIG = {
# in_channels, kernel_size, split_pw, use_rep, use_se, use_shortcut
"stage1": [64, 3, False, False, False, False],
"stage2": [128, 3, False, False, False, False],
"stage3": [256, 5, True, True, True, False],
"stage4": [512, 5, False, True, False, True],
}
def make_divisible(v, divisor=8, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class ConvBNLayer(TheseusLayer):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride,
groups=1,
use_act=True):
super().__init__()
self.use_act = use_act
self.conv = Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(initializer=KaimingNormal()),
bias_attr=False)
self.bn = BatchNorm2D(
out_channels,
weight_attr=ParamAttr(regularizer=L2Decay(0.0)),
bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
if self.use_act:
self.act = nn.ReLU()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
if self.use_act:
x = self.act(x)
return x
class SEModule(TheseusLayer):
def __init__(self, channel, reduction=4):
super().__init__()
self.avg_pool = AdaptiveAvgPool2D(1)
self.conv1 = Conv2D(
in_channels=channel,
out_channels=channel // reduction,
kernel_size=1,
stride=1,
padding=0)
self.relu = nn.ReLU()
self.conv2 = Conv2D(
in_channels=channel // reduction,
out_channels=channel,
kernel_size=1,
stride=1,
padding=0)
self.hardsigmoid = nn.Sigmoid()
def forward(self, x):
identity = x
x = self.avg_pool(x)
x = self.conv1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.hardsigmoid(x)
x = paddle.multiply(x=identity, y=x)
return x
class RepDepthwiseSeparable(TheseusLayer):
def __init__(self,
in_channels,
out_channels,
stride,
dw_size=3,
split_pw=False,
use_rep=False,
use_se=False,
use_shortcut=False):
super().__init__()
self.is_repped = False
self.dw_size = dw_size
self.split_pw = split_pw
self.use_rep = use_rep
self.use_se = use_se
self.use_shortcut = True if use_shortcut and stride == 1 and in_channels == out_channels else False
if self.use_rep:
self.dw_conv_list = nn.LayerList()
for kernel_size in range(self.dw_size, 0, -2):
if kernel_size == 1 and stride != 1:
continue
dw_conv = ConvBNLayer(
in_channels=in_channels,
out_channels=in_channels,
kernel_size=kernel_size,
stride=stride,
groups=in_channels,
use_act=False)
self.dw_conv_list.append(dw_conv)
self.dw_conv = nn.Conv2D(
in_channels=in_channels,
out_channels=in_channels,
kernel_size=dw_size,
stride=stride,
padding=(dw_size - 1) // 2,
groups=in_channels)
else:
self.dw_conv = ConvBNLayer(
in_channels=in_channels,
out_channels=in_channels,
kernel_size=dw_size,
stride=stride,
groups=in_channels)
self.act = nn.ReLU()
if use_se:
self.se = SEModule(in_channels)
if self.split_pw:
pw_ratio = 0.5
self.pw_conv_1 = ConvBNLayer(
in_channels=in_channels,
kernel_size=1,
out_channels=int(out_channels * pw_ratio),
stride=1)
self.pw_conv_2 = ConvBNLayer(
in_channels=int(out_channels * pw_ratio),
kernel_size=1,
out_channels=out_channels,
stride=1)
else:
self.pw_conv = ConvBNLayer(
in_channels=in_channels,
kernel_size=1,
out_channels=out_channels,
stride=1)
def forward(self, x):
if self.use_rep:
input_x = x
if self.is_repped:
x = self.act(self.dw_conv(x))
else:
y = self.dw_conv_list[0](x)
for dw_conv in self.dw_conv_list[1:]:
y += dw_conv(x)
x = self.act(y)
else:
x = self.dw_conv(x)
if self.use_se:
x = self.se(x)
if self.split_pw:
x = self.pw_conv_1(x)
x = self.pw_conv_2(x)
else:
x = self.pw_conv(x)
if self.use_shortcut:
x = x + input_x
return x
def rep(self):
if self.use_rep:
self.is_repped = True
kernel, bias = self._get_equivalent_kernel_bias()
self.dw_conv.weight.set_value(kernel)
self.dw_conv.bias.set_value(bias)
def _get_equivalent_kernel_bias(self):
kernel_sum = 0
bias_sum = 0
for dw_conv in self.dw_conv_list:
kernel, bias = self._fuse_bn_tensor(dw_conv)
kernel = self._pad_tensor(kernel, to_size=self.dw_size)
kernel_sum += kernel
bias_sum += bias
return kernel_sum, bias_sum
def _fuse_bn_tensor(self, branch):
kernel = branch.conv.weight
running_mean = branch.bn._mean
running_var = branch.bn._variance
gamma = branch.bn.weight
beta = branch.bn.bias
eps = branch.bn._epsilon
std = (running_var + eps).sqrt()
t = (gamma / std).reshape((-1, 1, 1, 1))
return kernel * t, beta - running_mean * gamma / std
def _pad_tensor(self, tensor, to_size):
from_size = tensor.shape[-1]
if from_size == to_size:
return tensor
pad = (to_size - from_size) // 2
return F.pad(tensor, [pad, pad, pad, pad])
class PPLCNetV2(TheseusLayer):
def __init__(self,
scale,
depths,
class_num=1000,
dropout_prob=0,
use_last_conv=True,
class_expand=1280):
super().__init__()
self.scale = scale
self.use_last_conv = use_last_conv
self.class_expand = class_expand
self.stem = nn.Sequential(* [
ConvBNLayer(
in_channels=3,
kernel_size=3,
out_channels=make_divisible(32 * scale),
stride=2), RepDepthwiseSeparable(
in_channels=make_divisible(32 * scale),
out_channels=make_divisible(64 * scale),
stride=1,
dw_size=3)
])
# stages
self.stages = nn.LayerList()
for depth_idx, k in enumerate(NET_CONFIG):
in_channels, kernel_size, split_pw, use_rep, use_se, use_shortcut = NET_CONFIG[
k]
self.stages.append(
nn.Sequential(* [
RepDepthwiseSeparable(
in_channels=make_divisible((in_channels if i == 0 else
in_channels * 2) * scale),
out_channels=make_divisible(in_channels * 2 * scale),
stride=2 if i == 0 else 1,
dw_size=kernel_size,
split_pw=split_pw,
use_rep=use_rep,
use_se=use_se,
use_shortcut=use_shortcut)
for i in range(depths[depth_idx])
]))
self.avg_pool = AdaptiveAvgPool2D(1)
if self.use_last_conv:
self.last_conv = Conv2D(
in_channels=make_divisible(NET_CONFIG["stage4"][0] * 2 *
scale),
out_channels=self.class_expand,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.act = nn.ReLU()
self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
in_features = self.class_expand if self.use_last_conv else NET_CONFIG[
"stage4"][0] * 2 * scale
self.fc = Linear(in_features, class_num)
def forward(self, x):
x = self.stem(x)
for stage in self.stages:
x = stage(x)
x = self.avg_pool(x)
if self.use_last_conv:
x = self.last_conv(x)
x = self.act(x)
x = self.dropout(x)
x = self.flatten(x)
x = self.fc(x)
return x
def _load_pretrained(pretrained, model, model_url, use_ssld):
if pretrained is False:
pass
elif pretrained is True:
load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
elif isinstance(pretrained, str):
load_dygraph_pretrain(model, pretrained)
else:
raise RuntimeError(
"pretrained type is not available. Please use `string` or `boolean` type."
)
def PPLCNetV2_base(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNetV2_base
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNetV2_base` model depends on args.
"""
model = PPLCNetV2(
scale=1.0, depths=[2, 2, 6, 2], dropout_prob=0.2, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNetV2_base"], use_ssld)
return model
...@@ -20,9 +20,10 @@ import numpy as np ...@@ -20,9 +20,10 @@ import numpy as np
import paddle import paddle
from paddle import ParamAttr from paddle import ParamAttr
import paddle.nn as nn import paddle.nn as nn
from paddle.nn import Conv2D, BatchNorm, Linear from paddle.nn import Conv2D, BatchNorm, Linear, BatchNorm2D
from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddle.nn.initializer import Uniform from paddle.nn.initializer import Uniform
from paddle.regularizer import L2Decay
import math import math
from ppcls.arch.backbone.base.theseus_layer import TheseusLayer from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
...@@ -132,11 +133,12 @@ class ConvBNLayer(TheseusLayer): ...@@ -132,11 +133,12 @@ class ConvBNLayer(TheseusLayer):
weight_attr=ParamAttr(learning_rate=lr_mult), weight_attr=ParamAttr(learning_rate=lr_mult),
bias_attr=False, bias_attr=False,
data_format=data_format) data_format=data_format)
self.bn = BatchNorm(
num_filters, weight_attr = ParamAttr(learning_rate=lr_mult, trainable=True)
param_attr=ParamAttr(learning_rate=lr_mult), bias_attr = ParamAttr(learning_rate=lr_mult, trainable=True)
bias_attr=ParamAttr(learning_rate=lr_mult),
data_layout=data_format) self.bn = BatchNorm2D(
num_filters, weight_attr=weight_attr, bias_attr=bias_attr)
self.relu = nn.ReLU() self.relu = nn.ReLU()
def forward(self, x): def forward(self, x):
...@@ -192,6 +194,7 @@ class BottleneckBlock(TheseusLayer): ...@@ -192,6 +194,7 @@ class BottleneckBlock(TheseusLayer):
is_vd_mode=False if if_first else True, is_vd_mode=False if if_first else True,
lr_mult=lr_mult, lr_mult=lr_mult,
data_format=data_format) data_format=data_format)
self.relu = nn.ReLU() self.relu = nn.ReLU()
self.shortcut = shortcut self.shortcut = shortcut
...@@ -312,7 +315,7 @@ class ResNet(TheseusLayer): ...@@ -312,7 +315,7 @@ class ResNet(TheseusLayer):
[[input_image_channel, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]] [[input_image_channel, 32, 3, 2], [32, 32, 3, 1], [32, 64, 3, 1]]
} }
self.stem = nn.Sequential(*[ self.stem = nn.Sequential(* [
ConvBNLayer( ConvBNLayer(
num_channels=in_c, num_channels=in_c,
num_filters=out_c, num_filters=out_c,
......
...@@ -21,8 +21,8 @@ import paddle.nn as nn ...@@ -21,8 +21,8 @@ import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle.nn.initializer import TruncatedNormal, Constant from paddle.nn.initializer import TruncatedNormal, Constant
from .vision_transformer import trunc_normal_, zeros_, ones_, to_2tuple, DropPath, Identity from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
from ppcls.arch.backbone.model_zoo.vision_transformer import trunc_normal_, zeros_, ones_, to_2tuple, DropPath, Identity
from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
MODEL_URLS = { MODEL_URLS = {
...@@ -589,7 +589,7 @@ class PatchEmbed(nn.Layer): ...@@ -589,7 +589,7 @@ class PatchEmbed(nn.Layer):
return flops return flops
class SwinTransformer(nn.Layer): class SwinTransformer(TheseusLayer):
""" Swin Transformer """ Swin Transformer
A PaddlePaddle impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` - A PaddlePaddle impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows` -
https://arxiv.org/pdf/2103.14030 https://arxiv.org/pdf/2103.14030
......
...@@ -124,13 +124,7 @@ class RepVGGBlock(nn.Layer): ...@@ -124,13 +124,7 @@ class RepVGGBlock(nn.Layer):
groups=groups) groups=groups)
def forward(self, inputs): def forward(self, inputs):
if not self.training and not self.is_repped: if self.is_repped:
self.rep()
self.is_repped = True
if self.training and self.is_repped:
self.is_repped = False
if not self.training:
return self.nonlinearity(self.rbr_reparam(inputs)) return self.nonlinearity(self.rbr_reparam(inputs))
if self.rbr_identity is None: if self.rbr_identity is None:
...@@ -154,6 +148,7 @@ class RepVGGBlock(nn.Layer): ...@@ -154,6 +148,7 @@ class RepVGGBlock(nn.Layer):
kernel, bias = self.get_equivalent_kernel_bias() kernel, bias = self.get_equivalent_kernel_bias()
self.rbr_reparam.weight.set_value(kernel) self.rbr_reparam.weight.set_value(kernel)
self.rbr_reparam.bias.set_value(bias) self.rbr_reparam.bias.set_value(bias)
self.is_repped = True
def get_equivalent_kernel_bias(self): def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense) kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
......
...@@ -40,12 +40,14 @@ QUANT_CONFIG = { ...@@ -40,12 +40,14 @@ QUANT_CONFIG = {
} }
def quantize_model(config, model): def quantize_model(config, model, mode="train"):
if config.get("Slim", False) and config["Slim"].get("quant", False): if config.get("Slim", False) and config["Slim"].get("quant", False):
from paddleslim.dygraph.quant import QAT from paddleslim.dygraph.quant import QAT
assert config["Slim"]["quant"]["name"].lower( assert config["Slim"]["quant"]["name"].lower(
) == 'pact', 'Only PACT quantization method is supported now' ) == 'pact', 'Only PACT quantization method is supported now'
QUANT_CONFIG["activation_preprocess_type"] = "PACT" QUANT_CONFIG["activation_preprocess_type"] = "PACT"
if mode in ["infer", "export"]:
QUANT_CONFIG['activation_preprocess_type'] = None
model.quanter = QAT(config=QUANT_CONFIG) model.quanter = QAT(config=QUANT_CONFIG)
model.quanter.quantize(model) model.quanter.quantize(model)
logger.info("QAT model summary:") logger.info("QAT model summary:")
......
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: "./output/"
device: "gpu"
save_interval: 5
eval_during_train: True
eval_interval: 1
epochs: 30
print_batch_step: 20
use_visualdl: False
# used for static mode and model export
image_shape: [3, 256, 192]
save_inference_dir: "./inference"
use_multilabel: True
# model architecture
Arch:
name: "ResNet50"
pretrained: True
class_num: 26
# loss function config for traing/eval process
Loss:
Train:
- MultiLabelLoss:
weight: 1.0
weight_ratio: True
size_sum: True
Eval:
- MultiLabelLoss:
weight: 1.0
weight_ratio: True
size_sum: True
Optimizer:
name: Adam
lr:
name: Piecewise
decay_epochs: [12, 18, 24, 28]
values: [0.0001, 0.00001, 0.000001, 0.0000001]
regularizer:
name: 'L2'
coeff: 0.0005
clip_norm: 10
# data loader for train and eval
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/attribute/data/"
cls_label_path: "dataset/attribute/trainval.txt"
label_ratio: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [192, 256]
- Padv2:
size: [212, 276]
pad_mode: 1
fill_value: 0
- RandomCropImage:
size: [192, 256]
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: True
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/attribute/data/"
cls_label_path: "dataset/attribute/test.txt"
label_ratio: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [192, 256]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Metric:
Eval:
- ATTRMetric:
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 480
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNetV2_base
class_num: 1000
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: MultiScaleDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
# support to specify width and height respectively:
# scales: [(160,160), (192,192), (224,224) (288,288) (320,320)]
sampler:
name: MultiScaleSampler
scales: [160, 192, 224, 288, 320]
# first_bs: batch size for the first image resolution in the scales list
# divide_factor: to ensure the width and height dimensions can be devided by downsampling multiple
first_bs: 500
divided_factor: 32
is_training: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]
...@@ -105,7 +105,6 @@ DataLoader: ...@@ -105,7 +105,6 @@ DataLoader:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
...@@ -132,7 +131,6 @@ Infer: ...@@ -132,7 +131,6 @@ Infer:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
- ToCHWImage: - ToCHWImage:
PostProcess: PostProcess:
......
...@@ -15,6 +15,13 @@ Global: ...@@ -15,6 +15,13 @@ Global:
image_shape: [*image_channel, 224, 224] image_shape: [*image_channel, 224, 224]
save_inference_dir: ./inference save_inference_dir: ./inference
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O2: pure fp16
level: O2
# model architecture # model architecture
Arch: Arch:
name: SE_ResNeXt101_32x4d name: SE_ResNeXt101_32x4d
...@@ -32,13 +39,6 @@ Loss: ...@@ -32,13 +39,6 @@ Loss:
- CELoss: - CELoss:
weight: 1.0 weight: 1.0
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
# O2: pure fp16
level: O2
Optimizer: Optimizer:
name: Momentum name: Momentum
momentum: 0.9 momentum: 0.9
...@@ -99,10 +99,9 @@ DataLoader: ...@@ -99,10 +99,9 @@ DataLoader:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
sampler: sampler:
name: BatchSampler name: DistributedBatchSampler
batch_size: 64 batch_size: 64
drop_last: False drop_last: False
shuffle: False shuffle: False
...@@ -126,7 +125,6 @@ Infer: ...@@ -126,7 +125,6 @@ Infer:
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ''
output_fp16: True
channel_num: *image_channel channel_num: *image_channel
- ToCHWImage: - ToCHWImage:
PostProcess: PostProcess:
......
...@@ -12,6 +12,7 @@ Global: ...@@ -12,6 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
eval_mode: "retrieval" eval_mode: "retrieval"
retrieval_feature_from: "backbone" # 'backbone' or 'neck' retrieval_feature_from: "backbone" # 'backbone' or 'neck'
re_ranking: False
# used for static mode and model export # used for static mode and model export
image_shape: [3, 256, 128] image_shape: [3, 256, 128]
save_inference_dir: "./inference" save_inference_dir: "./inference"
......
...@@ -12,6 +12,7 @@ Global: ...@@ -12,6 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
eval_mode: "retrieval" eval_mode: "retrieval"
retrieval_feature_from: "features" # 'backbone' or 'features' retrieval_feature_from: "features" # 'backbone' or 'features'
re_ranking: False
# used for static mode and model export # used for static mode and model export
image_shape: [3, 256, 128] image_shape: [3, 256, 128]
save_inference_dir: "./inference" save_inference_dir: "./inference"
......
...@@ -12,6 +12,7 @@ Global: ...@@ -12,6 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
eval_mode: "retrieval" eval_mode: "retrieval"
retrieval_feature_from: "features" # 'backbone' or 'features' retrieval_feature_from: "features" # 'backbone' or 'features'
re_ranking: False
# used for static mode and model export # used for static mode and model export
image_shape: [3, 256, 128] image_shape: [3, 256, 128]
save_inference_dir: "./inference" save_inference_dir: "./inference"
......
...@@ -44,11 +44,11 @@ def create_operators(params): ...@@ -44,11 +44,11 @@ def create_operators(params):
class CommonDataset(Dataset): class CommonDataset(Dataset):
def __init__( def __init__(self,
self, image_root,
image_root, cls_label_path,
cls_label_path, transform_ops=None,
transform_ops=None, ): label_ratio=False):
self._img_root = image_root self._img_root = image_root
self._cls_path = cls_label_path self._cls_path = cls_label_path
if transform_ops: if transform_ops:
...@@ -56,7 +56,10 @@ class CommonDataset(Dataset): ...@@ -56,7 +56,10 @@ class CommonDataset(Dataset):
self.images = [] self.images = []
self.labels = [] self.labels = []
self._load_anno() if label_ratio:
self.label_ratio = self._load_anno(label_ratio=label_ratio)
else:
self._load_anno()
def _load_anno(self): def _load_anno(self):
pass pass
......
...@@ -25,7 +25,7 @@ from .common_dataset import CommonDataset ...@@ -25,7 +25,7 @@ from .common_dataset import CommonDataset
class MultiLabelDataset(CommonDataset): class MultiLabelDataset(CommonDataset):
def _load_anno(self): def _load_anno(self, label_ratio=False):
assert os.path.exists(self._cls_path) assert os.path.exists(self._cls_path)
assert os.path.exists(self._img_root) assert os.path.exists(self._img_root)
self.images = [] self.images = []
...@@ -41,6 +41,8 @@ class MultiLabelDataset(CommonDataset): ...@@ -41,6 +41,8 @@ class MultiLabelDataset(CommonDataset):
self.labels.append(labels) self.labels.append(labels)
assert os.path.exists(self.images[-1]) assert os.path.exists(self.images[-1])
if label_ratio:
return np.array(self.labels).mean(0).astype("float32")
def __getitem__(self, idx): def __getitem__(self, idx):
try: try:
...@@ -50,7 +52,10 @@ class MultiLabelDataset(CommonDataset): ...@@ -50,7 +52,10 @@ class MultiLabelDataset(CommonDataset):
img = transform(img, self._transform_ops) img = transform(img, self._transform_ops)
img = img.transpose((2, 0, 1)) img = img.transpose((2, 0, 1))
label = np.array(self.labels[idx]).astype("float32") label = np.array(self.labels[idx]).astype("float32")
return (img, label) if self.label_ratio is not None:
return (img, np.array([label, self.label_ratio]))
else:
return (img, label)
except Exception as ex: except Exception as ex:
logger.error("Exception occured when parse line: {} with msg: {}". logger.error("Exception occured when parse line: {} with msg: {}".
......
...@@ -33,6 +33,8 @@ from ppcls.data.preprocess.ops.operators import AugMix ...@@ -33,6 +33,8 @@ from ppcls.data.preprocess.ops.operators import AugMix
from ppcls.data.preprocess.ops.operators import Pad from ppcls.data.preprocess.ops.operators import Pad
from ppcls.data.preprocess.ops.operators import ToTensor from ppcls.data.preprocess.ops.operators import ToTensor
from ppcls.data.preprocess.ops.operators import Normalize from ppcls.data.preprocess.ops.operators import Normalize
from ppcls.data.preprocess.ops.operators import RandomCropImage
from ppcls.data.preprocess.ops.operators import Padv2
from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator
...@@ -40,6 +42,7 @@ import numpy as np ...@@ -40,6 +42,7 @@ import numpy as np
from PIL import Image from PIL import Image
import random import random
def transform(data, ops=[]): def transform(data, ops=[]):
""" transform """ """ transform """
for op in ops: for op in ops:
......
...@@ -190,6 +190,105 @@ class CropImage(object): ...@@ -190,6 +190,105 @@ class CropImage(object):
return img[h_start:h_end, w_start:w_end, :] return img[h_start:h_end, w_start:w_end, :]
class Padv2(object):
def __init__(self,
size=None,
size_divisor=32,
pad_mode=0,
offsets=None,
fill_value=(127.5, 127.5, 127.5)):
"""
Pad image to a specified size or multiple of size_divisor.
Args:
size (int, list): image target size, if None, pad to multiple of size_divisor, default None
size_divisor (int): size divisor, default 32
pad_mode (int): pad mode, currently only supports four modes [-1, 0, 1, 2]. if -1, use specified offsets
if 0, only pad to right and bottom. if 1, pad according to center. if 2, only pad left and top
offsets (list): [offset_x, offset_y], specify offset while padding, only supported pad_mode=-1
fill_value (bool): rgb value of pad area, default (127.5, 127.5, 127.5)
"""
if not isinstance(size, (int, list)):
raise TypeError(
"Type of target_size is invalid when random_size is True. \
Must be List, now is {}".format(type(size)))
if isinstance(size, int):
size = [size, size]
assert pad_mode in [
-1, 0, 1, 2
], 'currently only supports four modes [-1, 0, 1, 2]'
if pad_mode == -1:
assert offsets, 'if pad_mode is -1, offsets should not be None'
self.size = size
self.size_divisor = size_divisor
self.pad_mode = pad_mode
self.fill_value = fill_value
self.offsets = offsets
def apply_image(self, image, offsets, im_size, size):
x, y = offsets
im_h, im_w = im_size
h, w = size
canvas = np.ones((h, w, 3), dtype=np.float32)
canvas *= np.array(self.fill_value, dtype=np.float32)
canvas[y:y + im_h, x:x + im_w, :] = image.astype(np.float32)
return canvas
def __call__(self, img):
im_h, im_w = img.shape[:2]
if self.size:
w, h = self.size
assert (
im_h <= h and im_w <= w
), '(h, w) of target size should be greater than (im_h, im_w)'
else:
h = int(np.ceil(im_h / self.size_divisor) * self.size_divisor)
w = int(np.ceil(im_w / self.size_divisor) * self.size_divisor)
if h == im_h and w == im_w:
return img.astype(np.float32)
if self.pad_mode == -1:
offset_x, offset_y = self.offsets
elif self.pad_mode == 0:
offset_y, offset_x = 0, 0
elif self.pad_mode == 1:
offset_y, offset_x = (h - im_h) // 2, (w - im_w) // 2
else:
offset_y, offset_x = h - im_h, w - im_w
offsets, im_size, size = [offset_x, offset_y], [im_h, im_w], [h, w]
return self.apply_image(img, offsets, im_size, size)
class RandomCropImage(object):
"""Random crop image only
"""
def __init__(self, size):
super(RandomCropImage, self).__init__()
if isinstance(size, int):
size = [size, size]
self.size = size
def __call__(self, img):
h, w = img.shape[:2]
tw, th = self.size
i = random.randint(0, h - th)
j = random.randint(0, w - tw)
img = img[i:i + th, j:j + tw, :]
if img.shape[0] != 256 or img.shape[1] != 192:
raise ValueError('sample: ', h, w, i, j, th, tw, img.shape)
return img
class RandCropImage(object): class RandCropImage(object):
""" random crop image """ """ random crop image """
...@@ -463,8 +562,8 @@ class Pad(object): ...@@ -463,8 +562,8 @@ class Pad(object):
# Process fill color for affine transforms # Process fill color for affine transforms
major_found, minor_found = (int(v) major_found, minor_found = (int(v)
for v in PILLOW_VERSION.split('.')[:2]) for v in PILLOW_VERSION.split('.')[:2])
major_required, minor_required = ( major_required, minor_required = (int(v) for v in
int(v) for v in min_pil_version.split('.')[:2]) min_pil_version.split('.')[:2])
if major_found < major_required or (major_found == major_required and if major_found < major_required or (major_found == major_required and
minor_found < minor_required): minor_found < minor_required):
if fill is None: if fill is None:
......
...@@ -189,7 +189,7 @@ class Engine(object): ...@@ -189,7 +189,7 @@ class Engine(object):
self.eval_metric_func = None self.eval_metric_func = None
# build model # build model
self.model = build_model(self.config) self.model = build_model(self.config, self.mode)
# set @to_static for benchmark, skip this by default. # set @to_static for benchmark, skip this by default.
apply_to_static(self.config, self.model) apply_to_static(self.config, self.model)
...@@ -239,7 +239,7 @@ class Engine(object): ...@@ -239,7 +239,7 @@ class Engine(object):
self.amp_eval = self.config["AMP"].get("use_fp16_test", False) self.amp_eval = self.config["AMP"].get("use_fp16_test", False)
# TODO(gaotingquan): Paddle not yet support FP32 evaluation when training with AMPO2 # TODO(gaotingquan): Paddle not yet support FP32 evaluation when training with AMPO2
if self.config["Global"].get( if self.mode == "train" and self.config["Global"].get(
"eval_during_train", "eval_during_train",
True) and self.amp_level == "O2" and self.amp_eval == False: True) and self.amp_level == "O2" and self.amp_eval == False:
msg = "PaddlePaddle only support FP16 evaluation when training with AMP O2 now. " msg = "PaddlePaddle only support FP16 evaluation when training with AMP O2 now. "
...@@ -269,10 +269,11 @@ class Engine(object): ...@@ -269,10 +269,11 @@ class Engine(object):
save_dtype='float32') save_dtype='float32')
# paddle version >= 2.3.0 or develop # paddle version >= 2.3.0 or develop
else: else:
self.model = paddle.amp.decorate( if self.mode == "train" or self.amp_eval:
models=self.model, self.model = paddle.amp.decorate(
level=self.amp_level, models=self.model,
save_dtype='float32') level=self.amp_level,
save_dtype='float32')
if self.mode == "train" and len(self.train_loss_func.parameters( if self.mode == "train" and len(self.train_loss_func.parameters(
)) > 0: )) > 0:
...@@ -432,7 +433,17 @@ class Engine(object): ...@@ -432,7 +433,17 @@ class Engine(object):
image_file_list.append(image_file) image_file_list.append(image_file)
if len(batch_data) >= batch_size or idx == len(image_list) - 1: if len(batch_data) >= batch_size or idx == len(image_list) - 1:
batch_tensor = paddle.to_tensor(batch_data) batch_tensor = paddle.to_tensor(batch_data)
out = self.model(batch_tensor)
if self.amp and self.amp_eval:
with paddle.amp.auto_cast(
custom_black_list={
"flatten_contiguous_range", "greater_than"
},
level=self.amp_level):
out = self.model(batch_tensor)
else:
out = self.model(batch_tensor)
if isinstance(out, list): if isinstance(out, list):
out = out[0] out = out[0]
if isinstance(out, dict) and "logits" in out: if isinstance(out, dict) and "logits" in out:
...@@ -453,26 +464,31 @@ class Engine(object): ...@@ -453,26 +464,31 @@ class Engine(object):
self.config["Global"]["pretrained_model"]) self.config["Global"]["pretrained_model"])
model.eval() model.eval()
# for rep nets
for layer in self.model.sublayers():
if hasattr(layer, "rep"):
layer.rep()
save_path = os.path.join(self.config["Global"]["save_inference_dir"], save_path = os.path.join(self.config["Global"]["save_inference_dir"],
"inference") "inference")
if model.quanter:
model.quanter.save_quantized_model( model = paddle.jit.to_static(
model.base_model, model,
save_path, input_spec=[
input_spec=[ paddle.static.InputSpec(
paddle.static.InputSpec( shape=[None] + self.config["Global"]["image_shape"],
shape=[None] + self.config["Global"]["image_shape"], dtype='float32')
dtype='float32') ])
]) if hasattr(model.base_model,
"quanter") and model.base_model.quanter is not None:
model.base_model.quanter.save_quantized_model(model,
save_path + "_int8")
else: else:
model = paddle.jit.to_static(
model,
input_spec=[
paddle.static.InputSpec(
shape=[None] + self.config["Global"]["image_shape"],
dtype='float32')
])
paddle.jit.save(model, save_path) paddle.jit.save(model, save_path)
logger.info(
f"Export succeeded! The inference model exported has been saved in \"{self.config['Global']['save_inference_dir']}\"."
)
class ExportModel(TheseusLayer): class ExportModel(TheseusLayer):
......
...@@ -82,6 +82,7 @@ def classification_eval(engine, epoch_id=0): ...@@ -82,6 +82,7 @@ def classification_eval(engine, epoch_id=0):
# gather Tensor when distributed # gather Tensor when distributed
if paddle.distributed.get_world_size() > 1: if paddle.distributed.get_world_size() > 1:
label_list = [] label_list = []
paddle.distributed.all_gather(label_list, batch[1]) paddle.distributed.all_gather(label_list, batch[1])
labels = paddle.concat(label_list, 0) labels = paddle.concat(label_list, 0)
...@@ -123,6 +124,7 @@ def classification_eval(engine, epoch_id=0): ...@@ -123,6 +124,7 @@ def classification_eval(engine, epoch_id=0):
output_info[key] = AverageMeter(key, '7.5f') output_info[key] = AverageMeter(key, '7.5f')
output_info[key].update(loss_dict[key].numpy()[0], output_info[key].update(loss_dict[key].numpy()[0],
current_samples) current_samples)
# calc metric # calc metric
if engine.eval_metric_func is not None: if engine.eval_metric_func is not None:
engine.eval_metric_func(preds, labels) engine.eval_metric_func(preds, labels)
...@@ -137,11 +139,14 @@ def classification_eval(engine, epoch_id=0): ...@@ -137,11 +139,14 @@ def classification_eval(engine, epoch_id=0):
ips_msg = "ips: {:.5f} images/sec".format( ips_msg = "ips: {:.5f} images/sec".format(
batch_size / time_info["batch_cost"].avg) batch_size / time_info["batch_cost"].avg)
metric_msg = ", ".join([ if "ATTRMetric" in engine.config["Metric"]["Eval"][0]:
"{}: {:.5f}".format(key, output_info[key].val) metric_msg = ""
for key in output_info else:
]) metric_msg = ", ".join([
metric_msg += ", {}".format(engine.eval_metric_func.avg_info) "{}: {:.5f}".format(key, output_info[key].val)
for key in output_info
])
metric_msg += ", {}".format(engine.eval_metric_func.avg_info)
logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format( logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format(
epoch_id, iter_id, epoch_id, iter_id,
len(engine.eval_dataloader), metric_msg, time_msg, ips_msg)) len(engine.eval_dataloader), metric_msg, time_msg, ips_msg))
...@@ -149,14 +154,29 @@ def classification_eval(engine, epoch_id=0): ...@@ -149,14 +154,29 @@ def classification_eval(engine, epoch_id=0):
tic = time.time() tic = time.time()
if engine.use_dali: if engine.use_dali:
engine.eval_dataloader.reset() engine.eval_dataloader.reset()
metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].avg) for key in output_info if "ATTRMetric" in engine.config["Metric"]["Eval"][0]:
]) metric_msg = ", ".join([
metric_msg += ", {}".format(engine.eval_metric_func.avg_info) "evalres: ma: {:.5f} label_f1: {:.5f} label_pos_recall: {:.5f} label_neg_recall: {:.5f} instance_f1: {:.5f} instance_acc: {:.5f} instance_prec: {:.5f} instance_recall: {:.5f}".
logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg)) format(*engine.eval_metric_func.attr_res())
])
# do not try to save best eval.model logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg))
if engine.eval_metric_func is None:
return -1 # do not try to save best eval.model
# return 1st metric in the dict if engine.eval_metric_func is None:
return engine.eval_metric_func.avg return -1
# return 1st metric in the dict
return engine.eval_metric_func.attr_res()[0]
else:
metric_msg = ", ".join([
"{}: {:.5f}".format(key, output_info[key].avg)
for key in output_info
])
metric_msg += ", {}".format(engine.eval_metric_func.avg_info)
logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg))
# do not try to save best eval.model
if engine.eval_metric_func is None:
return -1
# return 1st metric in the dict
return engine.eval_metric_func.avg
...@@ -16,6 +16,9 @@ from __future__ import division ...@@ -16,6 +16,9 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
import platform import platform
from typing import Optional
import numpy as np
import paddle import paddle
from ppcls.utils import logger from ppcls.utils import logger
...@@ -48,34 +51,67 @@ def retrieval_eval(engine, epoch_id=0): ...@@ -48,34 +51,67 @@ def retrieval_eval(engine, epoch_id=0):
if engine.eval_loss_func is None: if engine.eval_loss_func is None:
metric_dict = {metric_key: 0.} metric_dict = {metric_key: 0.}
else: else:
reranking_flag = engine.config['Global'].get('re_ranking', False)
logger.info(f"re_ranking={reranking_flag}")
metric_dict = dict() metric_dict = dict()
for block_idx, block_fea in enumerate(fea_blocks): if reranking_flag:
similarity_matrix = paddle.matmul( # set the order from small to large
block_fea, gallery_feas, transpose_y=True) for i in range(len(engine.eval_metric_func.metric_func_list)):
if query_query_id is not None: if hasattr(engine.eval_metric_func.metric_func_list[i], 'descending') \
query_id_block = query_id_blocks[block_idx] and engine.eval_metric_func.metric_func_list[i].descending is True:
query_id_mask = (query_id_block != gallery_unique_id.t()) engine.eval_metric_func.metric_func_list[
i].descending = False
image_id_block = image_id_blocks[block_idx] logger.warning(
image_id_mask = (image_id_block != gallery_img_id.t()) f"re_ranking=True,{engine.eval_metric_func.metric_func_list[i].__class__.__name__}.descending has been set to False"
)
keep_mask = paddle.logical_or(query_id_mask, image_id_mask)
similarity_matrix = similarity_matrix * keep_mask.astype( # compute distance matrix(The smaller the value, the more similar)
"float32") distmat = re_ranking(
else: query_feas, gallery_feas, k1=20, k2=6, lambda_value=0.3)
keep_mask = None
metric_tmp = engine.eval_metric_func(similarity_matrix,
image_id_blocks[block_idx],
gallery_img_id, keep_mask)
# compute keep mask
query_id_mask = (query_query_id != gallery_unique_id.t())
image_id_mask = (query_img_id != gallery_img_id.t())
keep_mask = paddle.logical_or(query_id_mask, image_id_mask)
# set inf(1e9) distance to those exist in gallery
distmat = distmat * keep_mask.astype("float32")
inf_mat = (paddle.logical_not(keep_mask).astype("float32")) * 1e20
distmat = distmat + inf_mat
# compute metric
metric_tmp = engine.eval_metric_func(distmat, query_img_id,
gallery_img_id, keep_mask)
for key in metric_tmp: for key in metric_tmp:
if key not in metric_dict: metric_dict[key] = metric_tmp[key]
metric_dict[key] = metric_tmp[key] * block_fea.shape[ else:
0] / len(query_feas) for block_idx, block_fea in enumerate(fea_blocks):
similarity_matrix = paddle.matmul(
block_fea, gallery_feas, transpose_y=True) # [n,m]
if query_query_id is not None:
query_id_block = query_id_blocks[block_idx]
query_id_mask = (query_id_block != gallery_unique_id.t())
image_id_block = image_id_blocks[block_idx]
image_id_mask = (image_id_block != gallery_img_id.t())
keep_mask = paddle.logical_or(query_id_mask, image_id_mask)
similarity_matrix = similarity_matrix * keep_mask.astype(
"float32")
else: else:
metric_dict[key] += metric_tmp[key] * block_fea.shape[ keep_mask = None
0] / len(query_feas)
metric_tmp = engine.eval_metric_func(
similarity_matrix, image_id_blocks[block_idx],
gallery_img_id, keep_mask)
for key in metric_tmp:
if key not in metric_dict:
metric_dict[key] = metric_tmp[key] * block_fea.shape[
0] / len(query_feas)
else:
metric_dict[key] += metric_tmp[key] * block_fea.shape[
0] / len(query_feas)
metric_info_list = [] metric_info_list = []
for key in metric_dict: for key in metric_dict:
...@@ -185,3 +221,109 @@ def cal_feature(engine, name='gallery'): ...@@ -185,3 +221,109 @@ def cal_feature(engine, name='gallery'):
logger.info("Build {} done, all feat shape: {}, begin to eval..".format( logger.info("Build {} done, all feat shape: {}, begin to eval..".format(
name, all_feas.shape)) name, all_feas.shape))
return all_feas, all_img_id, all_unique_id return all_feas, all_img_id, all_unique_id
def re_ranking(query_feas: paddle.Tensor,
gallery_feas: paddle.Tensor,
k1: int=20,
k2: int=6,
lambda_value: int=0.5,
local_distmat: Optional[np.ndarray]=None,
only_local: bool=False) -> paddle.Tensor:
"""re-ranking, most computed with numpy
code heavily based on
https://github.com/michuanhaohao/reid-strong-baseline/blob/3da7e6f03164a92e696cb6da059b1cd771b0346d/utils/reid_metric.py
Args:
query_feas (paddle.Tensor): query features, [num_query, num_features]
gallery_feas (paddle.Tensor): gallery features, [num_gallery, num_features]
k1 (int, optional): k1. Defaults to 20.
k2 (int, optional): k2. Defaults to 6.
lambda_value (int, optional): lambda. Defaults to 0.5.
local_distmat (Optional[np.ndarray], optional): local_distmat. Defaults to None.
only_local (bool, optional): only_local. Defaults to False.
Returns:
paddle.Tensor: final_dist matrix after re-ranking, [num_query, num_gallery]
"""
query_num = query_feas.shape[0]
all_num = query_num + gallery_feas.shape[0]
if only_local:
original_dist = local_distmat
else:
feat = paddle.concat([query_feas, gallery_feas])
logger.info('using GPU to compute original distance')
# L2 distance
distmat = paddle.pow(feat, 2).sum(axis=1, keepdim=True).expand([all_num, all_num]) + \
paddle.pow(feat, 2).sum(axis=1, keepdim=True).expand([all_num, all_num]).t()
distmat = distmat.addmm(x=feat, y=feat.t(), alpha=-2.0, beta=1.0)
original_dist = distmat.cpu().numpy()
del feat
if local_distmat is not None:
original_dist = original_dist + local_distmat
gallery_num = original_dist.shape[0]
original_dist = np.transpose(original_dist / np.max(original_dist, axis=0))
V = np.zeros_like(original_dist).astype(np.float16)
initial_rank = np.argsort(original_dist).astype(np.int32)
logger.info('starting re_ranking')
for i in range(all_num):
# k-reciprocal neighbors
forward_k_neigh_index = initial_rank[i, :k1 + 1]
backward_k_neigh_index = initial_rank[forward_k_neigh_index, :k1 + 1]
fi = np.where(backward_k_neigh_index == i)[0]
k_reciprocal_index = forward_k_neigh_index[fi]
k_reciprocal_expansion_index = k_reciprocal_index
for j in range(len(k_reciprocal_index)):
candidate = k_reciprocal_index[j]
candidate_forward_k_neigh_index = initial_rank[candidate, :int(
np.around(k1 / 2)) + 1]
candidate_backward_k_neigh_index = initial_rank[
candidate_forward_k_neigh_index, :int(np.around(k1 / 2)) + 1]
fi_candidate = np.where(
candidate_backward_k_neigh_index == candidate)[0]
candidate_k_reciprocal_index = candidate_forward_k_neigh_index[
fi_candidate]
if len(
np.intersect1d(candidate_k_reciprocal_index,
k_reciprocal_index)) > 2 / 3 * len(
candidate_k_reciprocal_index):
k_reciprocal_expansion_index = np.append(
k_reciprocal_expansion_index, candidate_k_reciprocal_index)
k_reciprocal_expansion_index = np.unique(k_reciprocal_expansion_index)
weight = np.exp(-original_dist[i, k_reciprocal_expansion_index])
V[i, k_reciprocal_expansion_index] = weight / np.sum(weight)
original_dist = original_dist[:query_num, ]
if k2 != 1:
V_qe = np.zeros_like(V, dtype=np.float16)
for i in range(all_num):
V_qe[i, :] = np.mean(V[initial_rank[i, :k2], :], axis=0)
V = V_qe
del V_qe
del initial_rank
invIndex = []
for i in range(gallery_num):
invIndex.append(np.where(V[:, i] != 0)[0])
jaccard_dist = np.zeros_like(original_dist, dtype=np.float16)
for i in range(query_num):
temp_min = np.zeros(shape=[1, gallery_num], dtype=np.float16)
indNonZero = np.where(V[i, :] != 0)[0]
indImages = [invIndex[ind] for ind in indNonZero]
for j in range(len(indNonZero)):
temp_min[0, indImages[j]] = temp_min[0, indImages[j]] + np.minimum(
V[i, indNonZero[j]], V[indImages[j], indNonZero[j]])
jaccard_dist[i] = 1 - temp_min / (2 - temp_min)
final_dist = jaccard_dist * (1 - lambda_value
) + original_dist * lambda_value
del original_dist
del V
del jaccard_dist
final_dist = final_dist[:query_num, query_num:]
final_dist = paddle.to_tensor(final_dist)
return final_dist
...@@ -3,16 +3,29 @@ import paddle.nn as nn ...@@ -3,16 +3,29 @@ import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
def ratio2weight(targets, ratio):
pos_weights = targets * (1. - ratio)
neg_weights = (1. - targets) * ratio
weights = paddle.exp(neg_weights + pos_weights)
# for RAP dataloader, targets element may be 2, with or without smooth, some element must great than 1
weights = weights - weights * (targets > 1)
return weights
class MultiLabelLoss(nn.Layer): class MultiLabelLoss(nn.Layer):
""" """
Multi-label loss Multi-label loss
""" """
def __init__(self, epsilon=None): def __init__(self, epsilon=None, size_sum=False, weight_ratio=False):
super().__init__() super().__init__()
if epsilon is not None and (epsilon <= 0 or epsilon >= 1): if epsilon is not None and (epsilon <= 0 or epsilon >= 1):
epsilon = None epsilon = None
self.epsilon = epsilon self.epsilon = epsilon
self.weight_ratio = weight_ratio
self.size_sum = size_sum
def _labelsmoothing(self, target, class_num): def _labelsmoothing(self, target, class_num):
if target.ndim == 1 or target.shape[-1] != class_num: if target.ndim == 1 or target.shape[-1] != class_num:
...@@ -24,13 +37,21 @@ class MultiLabelLoss(nn.Layer): ...@@ -24,13 +37,21 @@ class MultiLabelLoss(nn.Layer):
return soft_target return soft_target
def _binary_crossentropy(self, input, target, class_num): def _binary_crossentropy(self, input, target, class_num):
if self.weight_ratio:
target, label_ratio = target[:, 0, :], target[:, 1, :]
if self.epsilon is not None: if self.epsilon is not None:
target = self._labelsmoothing(target, class_num) target = self._labelsmoothing(target, class_num)
cost = F.binary_cross_entropy_with_logits( cost = F.binary_cross_entropy_with_logits(
logit=input, label=target) logit=input, label=target, reduction='none')
else:
cost = F.binary_cross_entropy_with_logits( if self.weight_ratio:
logit=input, label=target) targets_mask = paddle.cast(target > 0.5, 'float32')
weight = ratio2weight(targets_mask, paddle.to_tensor(label_ratio))
weight = weight * (target > -1)
cost = cost * weight
if self.size_sum:
cost = cost.sum(1).mean() if self.size_sum else cost.mean()
return cost return cost
......
...@@ -20,6 +20,7 @@ from .metrics import TopkAcc, mAP, mINP, Recallk, Precisionk ...@@ -20,6 +20,7 @@ from .metrics import TopkAcc, mAP, mINP, Recallk, Precisionk
from .metrics import DistillationTopkAcc from .metrics import DistillationTopkAcc
from .metrics import GoogLeNetTopkAcc from .metrics import GoogLeNetTopkAcc
from .metrics import HammingDistance, AccuracyScore from .metrics import HammingDistance, AccuracyScore
from .metrics import ATTRMetric
from .metrics import TprAtFpr from .metrics import TprAtFpr
...@@ -55,12 +56,15 @@ class CombinedMetrics(AvgMetrics): ...@@ -55,12 +56,15 @@ class CombinedMetrics(AvgMetrics):
def avg(self): def avg(self):
return self.metric_func_list[0].avg return self.metric_func_list[0].avg
def attr_res(self):
return self.metric_func_list[0].attrmeter.res()
def reset(self): def reset(self):
for metric in self.metric_func_list: for metric in self.metric_func_list:
if hasattr(metric, "reset"): if hasattr(metric, "reset"):
metric.reset() metric.reset()
def build_metrics(config): def build_metrics(config):
metrics_list = CombinedMetrics(copy.deepcopy(config)) metrics_list = CombinedMetrics(copy.deepcopy(config))
return metrics_list return metrics_list
...@@ -22,8 +22,10 @@ from sklearn.metrics import accuracy_score as accuracy_metric ...@@ -22,8 +22,10 @@ from sklearn.metrics import accuracy_score as accuracy_metric
from sklearn.metrics import multilabel_confusion_matrix from sklearn.metrics import multilabel_confusion_matrix
from sklearn.preprocessing import binarize from sklearn.preprocessing import binarize
from easydict import EasyDict
from ppcls.metric.avg_metrics import AvgMetrics from ppcls.metric.avg_metrics import AvgMetrics
from ppcls.utils.misc import AverageMeter from ppcls.utils.misc import AverageMeter, AttrMeter
class TopkAcc(AvgMetrics): class TopkAcc(AvgMetrics):
...@@ -36,7 +38,10 @@ class TopkAcc(AvgMetrics): ...@@ -36,7 +38,10 @@ class TopkAcc(AvgMetrics):
self.reset() self.reset()
def reset(self): def reset(self):
self.avg_meters = {"top{}".format(k): AverageMeter("top{}".format(k)) for k in self.topk} self.avg_meters = {
"top{}".format(k): AverageMeter("top{}".format(k))
for k in self.topk
}
def forward(self, x, label): def forward(self, x, label):
if isinstance(x, dict): if isinstance(x, dict):
...@@ -51,15 +56,16 @@ class TopkAcc(AvgMetrics): ...@@ -51,15 +56,16 @@ class TopkAcc(AvgMetrics):
class mAP(nn.Layer): class mAP(nn.Layer):
def __init__(self): def __init__(self, descending=True):
super().__init__() super().__init__()
self.descending = descending
def forward(self, similarities_matrix, query_img_id, gallery_img_id, def forward(self, similarities_matrix, query_img_id, gallery_img_id,
keep_mask): keep_mask):
metric_dict = dict() metric_dict = dict()
choosen_indices = paddle.argsort( choosen_indices = paddle.argsort(
similarities_matrix, axis=1, descending=True) similarities_matrix, axis=1, descending=self.descending)
gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0]) gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0])
gallery_labels_transpose = paddle.broadcast_to( gallery_labels_transpose = paddle.broadcast_to(
gallery_labels_transpose, gallery_labels_transpose,
...@@ -95,15 +101,16 @@ class mAP(nn.Layer): ...@@ -95,15 +101,16 @@ class mAP(nn.Layer):
class mINP(nn.Layer): class mINP(nn.Layer):
def __init__(self): def __init__(self, descending=True):
super().__init__() super().__init__()
self.descending = descending
def forward(self, similarities_matrix, query_img_id, gallery_img_id, def forward(self, similarities_matrix, query_img_id, gallery_img_id,
keep_mask): keep_mask):
metric_dict = dict() metric_dict = dict()
choosen_indices = paddle.argsort( choosen_indices = paddle.argsort(
similarities_matrix, axis=1, descending=True) similarities_matrix, axis=1, descending=self.descending)
gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0]) gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0])
gallery_labels_transpose = paddle.broadcast_to( gallery_labels_transpose = paddle.broadcast_to(
gallery_labels_transpose, gallery_labels_transpose,
...@@ -114,7 +121,7 @@ class mINP(nn.Layer): ...@@ -114,7 +121,7 @@ class mINP(nn.Layer):
choosen_indices) choosen_indices)
equal_flag = paddle.equal(choosen_label, query_img_id) equal_flag = paddle.equal(choosen_label, query_img_id)
if keep_mask is not None: if keep_mask is not None:
keep_mask = paddle.index_sample( keep_mask = paddle.indechmx_sample(
keep_mask.astype('float32'), choosen_indices) keep_mask.astype('float32'), choosen_indices)
equal_flag = paddle.logical_and(equal_flag, equal_flag = paddle.logical_and(equal_flag,
keep_mask.astype('bool')) keep_mask.astype('bool'))
...@@ -138,7 +145,7 @@ class mINP(nn.Layer): ...@@ -138,7 +145,7 @@ class mINP(nn.Layer):
class TprAtFpr(nn.Layer): class TprAtFpr(nn.Layer):
def __init__(self, max_fpr=1/1000.): def __init__(self, max_fpr=1 / 1000.):
super().__init__() super().__init__()
self.gt_pos_score_list = [] self.gt_pos_score_list = []
self.gt_neg_score_list = [] self.gt_neg_score_list = []
...@@ -176,25 +183,30 @@ class TprAtFpr(nn.Layer): ...@@ -176,25 +183,30 @@ class TprAtFpr(nn.Layer):
threshold = i / 10000. threshold = i / 10000.
if len(gt_pos_score_list) == 0: if len(gt_pos_score_list) == 0:
continue continue
tpr = np.sum(gt_pos_score_list > threshold) / len(gt_pos_score_list) tpr = np.sum(
gt_pos_score_list > threshold) / len(gt_pos_score_list)
if len(gt_neg_score_list) == 0 and tpr > max_tpr: if len(gt_neg_score_list) == 0 and tpr > max_tpr:
max_tpr = tpr max_tpr = tpr
result = "threshold: {}, fpr: {}, tpr: {:.5f}".format(threshold, fpr, tpr) result = "threshold: {}, fpr: {}, tpr: {:.5f}".format(
fpr = np.sum(gt_neg_score_list > threshold) / len(gt_neg_score_list) threshold, fpr, tpr)
fpr = np.sum(
gt_neg_score_list > threshold) / len(gt_neg_score_list)
if fpr <= self.max_fpr and tpr > max_tpr: if fpr <= self.max_fpr and tpr > max_tpr:
max_tpr = tpr max_tpr = tpr
result = "threshold: {}, fpr: {}, tpr: {:.5f}".format(threshold, fpr, tpr) result = "threshold: {}, fpr: {}, tpr: {:.5f}".format(
threshold, fpr, tpr)
self.max_tpr = max_tpr self.max_tpr = max_tpr
return result return result
class Recallk(nn.Layer): class Recallk(nn.Layer):
def __init__(self, topk=(1, 5)): def __init__(self, topk=(1, 5), descending=True):
super().__init__() super().__init__()
assert isinstance(topk, (int, list, tuple)) assert isinstance(topk, (int, list, tuple))
if isinstance(topk, int): if isinstance(topk, int):
topk = [topk] topk = [topk]
self.topk = topk self.topk = topk
self.descending = descending
def forward(self, similarities_matrix, query_img_id, gallery_img_id, def forward(self, similarities_matrix, query_img_id, gallery_img_id,
keep_mask): keep_mask):
...@@ -202,7 +214,7 @@ class Recallk(nn.Layer): ...@@ -202,7 +214,7 @@ class Recallk(nn.Layer):
#get cmc #get cmc
choosen_indices = paddle.argsort( choosen_indices = paddle.argsort(
similarities_matrix, axis=1, descending=True) similarities_matrix, axis=1, descending=self.descending)
gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0]) gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0])
gallery_labels_transpose = paddle.broadcast_to( gallery_labels_transpose = paddle.broadcast_to(
gallery_labels_transpose, gallery_labels_transpose,
...@@ -234,12 +246,13 @@ class Recallk(nn.Layer): ...@@ -234,12 +246,13 @@ class Recallk(nn.Layer):
class Precisionk(nn.Layer): class Precisionk(nn.Layer):
def __init__(self, topk=(1, 5)): def __init__(self, topk=(1, 5), descending=True):
super().__init__() super().__init__()
assert isinstance(topk, (int, list, tuple)) assert isinstance(topk, (int, list, tuple))
if isinstance(topk, int): if isinstance(topk, int):
topk = [topk] topk = [topk]
self.topk = topk self.topk = topk
self.descending = descending
def forward(self, similarities_matrix, query_img_id, gallery_img_id, def forward(self, similarities_matrix, query_img_id, gallery_img_id,
keep_mask): keep_mask):
...@@ -247,7 +260,7 @@ class Precisionk(nn.Layer): ...@@ -247,7 +260,7 @@ class Precisionk(nn.Layer):
#get cmc #get cmc
choosen_indices = paddle.argsort( choosen_indices = paddle.argsort(
similarities_matrix, axis=1, descending=True) similarities_matrix, axis=1, descending=self.descending)
gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0]) gallery_labels_transpose = paddle.transpose(gallery_img_id, [1, 0])
gallery_labels_transpose = paddle.broadcast_to( gallery_labels_transpose = paddle.broadcast_to(
gallery_labels_transpose, gallery_labels_transpose,
...@@ -329,7 +342,8 @@ class HammingDistance(MultiLabelMetric): ...@@ -329,7 +342,8 @@ class HammingDistance(MultiLabelMetric):
metric_dict = dict() metric_dict = dict()
metric_dict["HammingDistance"] = paddle.to_tensor( metric_dict["HammingDistance"] = paddle.to_tensor(
hamming_loss(target, preds)) hamming_loss(target, preds))
self.avg_meters["HammingDistance"].update(metric_dict["HammingDistance"].numpy()[0], output.shape[0]) self.avg_meters["HammingDistance"].update(
metric_dict["HammingDistance"].numpy()[0], output.shape[0])
return metric_dict return metric_dict
...@@ -368,5 +382,66 @@ class AccuracyScore(MultiLabelMetric): ...@@ -368,5 +382,66 @@ class AccuracyScore(MultiLabelMetric):
accuracy = (sum(tps) + sum(tns)) / ( accuracy = (sum(tps) + sum(tns)) / (
sum(tps) + sum(tns) + sum(fns) + sum(fps)) sum(tps) + sum(tns) + sum(fns) + sum(fps))
metric_dict["AccuracyScore"] = paddle.to_tensor(accuracy) metric_dict["AccuracyScore"] = paddle.to_tensor(accuracy)
self.avg_meters["AccuracyScore"].update(metric_dict["AccuracyScore"].numpy()[0], output.shape[0]) self.avg_meters["AccuracyScore"].update(
metric_dict["AccuracyScore"].numpy()[0], output.shape[0])
return metric_dict
def get_attr_metrics(gt_label, preds_probs, threshold):
"""
index: evaluated label index
"""
pred_label = (preds_probs > threshold).astype(int)
eps = 1e-20
result = EasyDict()
has_fuyi = gt_label == -1
pred_label[has_fuyi] = -1
###############################
# label metrics
# TP + FN
result.gt_pos = np.sum((gt_label == 1), axis=0).astype(float)
# TN + FP
result.gt_neg = np.sum((gt_label == 0), axis=0).astype(float)
# TP
result.true_pos = np.sum((gt_label == 1) * (pred_label == 1),
axis=0).astype(float)
# TN
result.true_neg = np.sum((gt_label == 0) * (pred_label == 0),
axis=0).astype(float)
# FP
result.false_pos = np.sum(((gt_label == 0) * (pred_label == 1)),
axis=0).astype(float)
# FN
result.false_neg = np.sum(((gt_label == 1) * (pred_label == 0)),
axis=0).astype(float)
################
# instance metrics
result.gt_pos_ins = np.sum((gt_label == 1), axis=1).astype(float)
result.true_pos_ins = np.sum((pred_label == 1), axis=1).astype(float)
# true positive
result.intersect_pos = np.sum((gt_label == 1) * (pred_label == 1),
axis=1).astype(float)
# IOU
result.union_pos = np.sum(((gt_label == 1) + (pred_label == 1)),
axis=1).astype(float)
return result
class ATTRMetric(nn.Layer):
def __init__(self, threshold=0.5):
super().__init__()
self.threshold = threshold
def reset(self):
self.attrmeter = AttrMeter(threshold=0.5)
def forward(self, output, target):
metric_dict = get_attr_metrics(target[:, 0, :].numpy(),
output.numpy(), self.threshold)
self.attrmeter.update(metric_dict)
return metric_dict return metric_dict
...@@ -439,8 +439,7 @@ def run(dataloader, ...@@ -439,8 +439,7 @@ def run(dataloader,
logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info)) logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info))
else: else:
end_epoch_str = "END epoch:{:<3d}".format(epoch) end_epoch_str = "END epoch:{:<3d}".format(epoch)
logger.info("{:s} {:s} {:s} {:s}".format(end_epoch_str, mode, end_str, logger.info("{:s} {:s} {:s}".format(end_epoch_str, mode, end_str))
ips_info))
if use_dali: if use_dali:
dataloader.reset() dataloader.reset()
......
...@@ -69,3 +69,87 @@ class AverageMeter(object): ...@@ -69,3 +69,87 @@ class AverageMeter(object):
def value(self): def value(self):
return '{self.name}: {self.val:{self.fmt}}{self.postfix}'.format( return '{self.name}: {self.val:{self.fmt}}{self.postfix}'.format(
self=self) self=self)
class AttrMeter(object):
"""
Computes and stores the average and current value
Code was based on https://github.com/pytorch/examples/blob/master/imagenet/main.py
"""
def __init__(self, threshold=0.5):
self.threshold = threshold
self.reset()
def reset(self):
self.gt_pos = 0
self.gt_neg = 0
self.true_pos = 0
self.true_neg = 0
self.false_pos = 0
self.false_neg = 0
self.gt_pos_ins = []
self.true_pos_ins = []
self.intersect_pos = []
self.union_pos = []
def update(self, metric_dict):
self.gt_pos += metric_dict['gt_pos']
self.gt_neg += metric_dict['gt_neg']
self.true_pos += metric_dict['true_pos']
self.true_neg += metric_dict['true_neg']
self.false_pos += metric_dict['false_pos']
self.false_neg += metric_dict['false_neg']
self.gt_pos_ins += metric_dict['gt_pos_ins'].tolist()
self.true_pos_ins += metric_dict['true_pos_ins'].tolist()
self.intersect_pos += metric_dict['intersect_pos'].tolist()
self.union_pos += metric_dict['union_pos'].tolist()
def res(self):
import numpy as np
eps = 1e-20
label_pos_recall = 1.0 * self.true_pos / (
self.gt_pos + eps) # true positive
label_neg_recall = 1.0 * self.true_neg / (
self.gt_neg + eps) # true negative
# mean accuracy
label_ma = (label_pos_recall + label_neg_recall) / 2
label_pos_recall = np.mean(label_pos_recall)
label_neg_recall = np.mean(label_neg_recall)
label_prec = (self.true_pos / (self.true_pos + self.false_pos + eps))
label_acc = (self.true_pos /
(self.true_pos + self.false_pos + self.false_neg + eps))
label_f1 = np.mean(2 * label_prec * label_pos_recall /
(label_prec + label_pos_recall + eps))
ma = (np.mean(label_ma))
self.gt_pos_ins = np.array(self.gt_pos_ins)
self.true_pos_ins = np.array(self.true_pos_ins)
self.intersect_pos = np.array(self.intersect_pos)
self.union_pos = np.array(self.union_pos)
instance_acc = self.intersect_pos / (self.union_pos + eps)
instance_prec = self.intersect_pos / (self.true_pos_ins + eps)
instance_recall = self.intersect_pos / (self.gt_pos_ins + eps)
instance_f1 = 2 * instance_prec * instance_recall / (
instance_prec + instance_recall + eps)
instance_acc = np.mean(instance_acc)
instance_prec = np.mean(instance_prec)
instance_recall = np.mean(instance_recall)
instance_f1 = 2 * instance_prec * instance_recall / (
instance_prec + instance_recall + eps)
instance_acc = np.mean(instance_acc)
instance_prec = np.mean(instance_prec)
instance_recall = np.mean(instance_recall)
instance_f1 = np.mean(instance_f1)
res = [
ma, label_f1, label_pos_recall, label_neg_recall, instance_f1,
instance_acc, instance_prec, instance_recall
]
return res
...@@ -113,7 +113,8 @@ def init_model(config, net, optimizer=None, loss: paddle.nn.Layer=None): ...@@ -113,7 +113,8 @@ def init_model(config, net, optimizer=None, loss: paddle.nn.Layer=None):
net.set_state_dict(para_dict) net.set_state_dict(para_dict)
loss.set_state_dict(para_dict) loss.set_state_dict(para_dict)
for i in range(len(optimizer)): for i in range(len(optimizer)):
optimizer[i].set_state_dict(opti_dict) optimizer[i].set_state_dict(opti_dict[i] if isinstance(
opti_dict, list) else opti_dict)
logger.info("Finish load checkpoints from {}".format(checkpoints)) logger.info("Finish load checkpoints from {}".format(checkpoints))
return metric_dict return metric_dict
......
...@@ -9,3 +9,4 @@ scipy ...@@ -9,3 +9,4 @@ scipy
scikit-learn==0.23.2 scikit-learn==0.23.2
gast==0.3.3 gast==0.3.3
faiss-cpu==1.7.1.post2 faiss-cpu==1.7.1.post2
easydict
===========================train_params===========================
model_name:PPLCNetV2_base
python:python3.7
gpu_list:0|0,1
-o Global.device:gpu
-o Global.auto_cast:null
-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
-o Global.output_dir:./output/
-o DataLoader.Train.sampler.first_bs:8
-o Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./dataset/ILSVRC2012/val
null:null
##
trainer:norm_train
norm_train:tools/train.py -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml
null:null
##
===========================infer_params==========================
-o Global.save_inference_dir:./inference
-o Global.pretrained_model:
norm_export:tools/export_model.py -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml
quant_export:null
fpgm_export:null
distill_export:null
kl_quant:null
export2:null
pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_pretrained.pdparams
infer_model:../inference/
infer_export:True
infer_quant:Fasle
inference:python/predict_cls.py -c configs/inference_cls.yaml
-o Global.use_gpu:True|False
-o Global.enable_mkldnn:True|False
-o Global.cpu_num_threads:1|6
-o Global.batch_size:1|16
-o Global.use_tensorrt:True|False
-o Global.use_fp16:True|False
-o Global.inference_model_dir:../inference
-o Global.infer_imgs:../dataset/ILSVRC2012/val
-o Global.save_log_path:null
-o Global.benchmark:True
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,224,224]}]
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册