diff --git a/README_cn.md b/README_cn.md index e3291c263873339d80959a845c25e444b482633f..34bdc13df7ebbc05ad6809d8d630c21500cf5d00 100644 --- a/README_cn.md +++ b/README_cn.md @@ -8,32 +8,22 @@ **近期更新** -- 2021.05.14 添加`SwinTransformer` 系列模型,在ImageNet-1k上,Top1 Acc可达87.19% -- 2021.04.15 添加`MixNet`和`ReXNet`系列模型,在ImageNet-1k上`MixNet_L`模型Top1 Acc可达78.6%,`ReXNet_3_0`模型可达82.09% -- 2021.03.02 添加分类模型量化方法与使用教程。 -- 2021.02.01 添加`RepVGG`系列模型,在ImageNet-1k上Top-1 Acc可达79.65%。 -- 2021.01.27 添加`ViT`与`DeiT`模型,在ImageNet-1k上,`ViT`模型Top-1 Acc可达85.13%,`DeiT`模型可达85.1%。 -- 2021.01.08 添加whl包及其使用说明,直接安装paddleclas whl包,即可快速完成模型预测。 -- 2020.12.16 添加对cpp预测的tensorRT支持,预测加速更明显。 +- 2021.06.16 PaddleClas v2.2版本升级,集成Metric learning,向量检索等组件,新增4个图像识别应用。 - [more](./docs/zh_CN/update_history.md) ## 特性 -- 丰富的模型库:基于ImageNet1k分类数据集,PaddleClas提供了29个系列的分类网络结构和训练配置,134个预训练模型和性能评估。 +- 完整的图像识别解决方案:集成了检测、特征学习、检索等模块,广泛适用于各类图像识别任务。 +提供商品识别、车辆识别、logo识别和动漫人物识别等4个示例解决方案。 -- SSLD知识蒸馏:基于该方案蒸馏模型的识别准确率普遍提升3%以上。 - -- 数据增广:支持AutoAugment、Cutout、Cutmix等8种数据增广算法详细介绍、代码复现和在统一实验环境下的效果评估。 +- 丰富的预训练模型库:提供了29个系列共134个ImageNet预训练模型,其中6个精选系列模型支持结构快速修改。 -- 10万类图像分类预训练模型:百度自研并开源了基于10万类数据集训练的 `ResNet50_vd `模型,在一些实际场景中,使用该预训练模型的识别准确率最多可以提升30%。 +- 全面易用的特征学习组件:集成大量度量学习方法,通过配置文件即可随意组合切换。 -- 多种训练方案,包括多机训练、混合精度训练等。 - -- 多种预测推理、部署方案,包括TensorRT预测、Paddle-Lite预测、模型服务化部署、模型量化、Paddle Hub等。 - -- 可运行于Linux、Windows、MacOS等多种系统。 +- SSLD知识蒸馏:基于该方案蒸馏模型的识别准确率普遍提升3%以上。 +- 数据增广:支持AutoAugment、Cutout、Cutmix等8种数据增广算法详细介绍、代码复现和在统一实验环境下的效果评估。 ## 欢迎加入技术交流群 @@ -53,31 +43,23 @@ ## 文档教程 - [快速安装](./docs/zh_CN/tutorials/install.md) -- [30分钟玩转PaddleClas(尝鲜版)](./docs/zh_CN/tutorials/quick_start_new_user.md) -- [30分钟玩转PaddleClas(进阶版)](./docs/zh_CN/tutorials/quick_start_professional.md) -- [模型库介绍和预训练模型](./docs/zh_CN/models/models_intro.md) - - [模型库概览图](#模型库概览图) - - [SSLD知识蒸馏系列](#SSLD知识蒸馏系列) - - [ResNet及其Vd系列](#ResNet及其Vd系列) - - [移动端系列](#移动端系列) - - [SEResNeXt与Res2Net系列](#SEResNeXt与Res2Net系列) - - [DPN与DenseNet系列](#DPN与DenseNet系列) - - [HRNet](HRNet系列) - - [Inception系列](#Inception系列) - - [EfficientNet与ResNeXt101_wsl系列](#EfficientNet与ResNeXt101_wsl系列) - - [ResNeSt与RegNet系列](#ResNeSt与RegNet系列) - - [ViT与DeiT系列](#ViT_and_DeiT系列) - - [RepVGG系列](#RepVGG系列) - - [MixNet系列](#MixNet系列) - - [ReXNet系列](#ReXNet系列) - - [SwinTransformer系列](#SwinTransformer系列) - - [其他模型](#其他模型) - - HS-ResNet: arxiv文章链接: [https://arxiv.org/pdf/2010.07621.pdf](https://arxiv.org/pdf/2010.07621.pdf)。 代码和预训练模型即将开源,敬请期待。 +- 图像识别快速体验(若愚) +- 图像分类快速体验(崔程,基于30分钟入门版修改) +- 算法介绍 + - 图像识别系统] (胜禹) + - [模型库介绍和预训练模型](./docs/zh_CN/models/models_intro.md) + - [图像分类] + - ImageNet分类任务(崔程,基于30分钟进阶版修改) + - [多标签分类任务]() + - [特征学习] + - [商品识别]() + - [车辆识别]() + - [logo识别]() + - [动漫人物识别]() + - [向量检索]() - 模型训练/评估 - - [数据准备](./docs/zh_CN/tutorials/data.md) - - [模型训练与微调](./docs/zh_CN/tutorials/getting_started.md) - - [模型评估](./docs/zh_CN/tutorials/getting_started.md) - - [配置文件详解](./docs/zh_CN/tutorials/config.md) + - 图像分类任务(崔程,基于原有训练文档整理) + - 特征学习任务(陆彬) - 模型预测 - [基于训练引擎预测推理](./docs/zh_CN/tutorials/getting_started.md) - [基于Python预测引擎预测推理](./docs/zh_CN/tutorials/getting_started.md) @@ -88,391 +70,15 @@ - [模型量化压缩](deploy/slim/quant/README.md) - 高阶使用 - [知识蒸馏](./docs/zh_CN/advanced_tutorials/distillation/distillation.md) + - [模型量化](./docs/zh_CN/extension/paddle_quantization.md) - [数据增广](./docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md) - - [多标签分类](./docs/zh_CN/advanced_tutorials/multilabel/multilabel.md) - [代码解析与社区贡献指南](./docs/zh_CN/tutorials/quick_start_community.md) -- 特色拓展应用 - - [迁移学习](./docs/zh_CN/application/transfer_learning.md) - - [10万类图像分类预训练模型](./docs/zh_CN/application/transfer_learning.md) - - [通用目标检测](./docs/zh_CN/application/object_detection.md) -- FAQ - - [图像分类2021第一季精选问题(近期更新2021.02.03)](./docs/zh_CN/faq_series/faq_2021_s1.md) - - [图像分类通用30个问题](./docs/zh_CN/faq.md) - - [PaddleClas实战15个问题](./docs/zh_CN/faq.md) -- [赛事支持](./docs/zh_CN/competition_support.md) +- FAQ(暂停更新) + - [图像分类任务FAQ] - [许可证书](#许可证书) - [贡献代码](#贡献代码) -## 模型库 - - -### 模型库概览图 - -基于ImageNet1k分类数据集,PaddleClas支持24种系列分类网络结构以及对应的122个图像分类预训练模型,训练技巧、每个系列网络结构的简单介绍和性能评估将在相应章节展现,下面所有的速度指标评估环境如下: -* CPU的评估环境基于骁龙855(SD855)。 -* GPU评估环境基于T4机器,在FP32+TensorRT配置下运行500次测得(去除前10次的warmup时间)。 - -常见服务器端模型的精度指标与其预测耗时的变化曲线如下图所示。 - -![](./docs/images/models/T4_benchmark/t4.fp32.bs1.main_fps_top1.png) - - -常见移动端模型的精度指标与其预测耗时、模型存储大小的变化曲线如下图所示。 - -![](./docs/images/models/mobile_arm_storage.png) - -![](./docs/images/models/mobile_arm_top1.png) - - - -### SSLD知识蒸馏预训练模型 -基于SSLD知识蒸馏的预训练模型列表如下所示,更多关于SSLD知识蒸馏方案的介绍可以参考:[SSLD知识蒸馏文档](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)。 - -* 服务器端知识蒸馏模型 - -| 模型 | Top-1 Acc | Reference
Top-1 Acc | Acc gain | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------| -| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) | -| ResNet50_vd_
ssld | 0.824 | 0.791 | 0.033 | 3.531 | 8.090 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) | -| ResNet50_vd_
ssld_v2 | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) | -| ResNet101_vd_
ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) | -| Res2Net50_vd_
26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) | -| Res2Net101_vd_
26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) | -| Res2Net200_vd_
26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) | -| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) | -| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 13.707 | 34.435 | 34.58 | 77.47 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) | -| SE_HRNet_W64_C_ssld | 0.848 | - | - | 31.697 | 94.995 | 57.83 | 128.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) | - - -* 端侧知识蒸馏模型 - -| 模型 | Top-1 Acc | Reference
Top-1 Acc | Acc gain | SD855 time(ms)
bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址 | -|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------| -| MobileNetV1_
ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) | -| MobileNetV2_
ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) | -| MobileNetV3_
small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) | -| MobileNetV3_
large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) | -| MobileNetV3_small_
x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) | -| GhostNet_
x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) | - - -* 注: `Reference Top-1 Acc`表示PaddleClas基于ImageNet1k数据集训练得到的预训练模型精度。 - - -### ResNet及其Vd系列 - -ResNet及其Vd系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[ResNet及其Vd系列模型文档](./docs/zh_CN/models/ResNet_and_vd.md)。 - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------| -| ResNet18 | 0.7098 | 0.8992 | 1.45606 | 3.56305 | 3.66 | 11.69 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams) | -| ResNet18_vd | 0.7226 | 0.9080 | 1.54557 | 3.85363 | 4.14 | 11.71 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams) | -| ResNet34 | 0.7457 | 0.9214 | 2.34957 | 5.89821 | 7.36 | 21.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams) | -| ResNet34_vd | 0.7598 | 0.9298 | 2.43427 | 6.22257 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams) | -| ResNet34_vd_ssld | 0.7972 | 0.9490 | 2.43427 | 6.22257 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) | -| ResNet50 | 0.7650 | 0.9300 | 3.47712 | 7.84421 | 8.19 | 25.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) | -| ResNet50_vc | 0.7835 | 0.9403 | 3.52346 | 8.10725 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams) | -| ResNet50_vd | 0.7912 | 0.9444 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams) | -| ResNet50_vd_v2 | 0.7984 | 0.9493 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_v2_pretrained.pdparams) | -| ResNet101 | 0.7756 | 0.9364 | 6.07125 | 13.40573 | 15.52 | 44.55 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams) | -| ResNet101_vd | 0.8017 | 0.9497 | 6.11704 | 13.76222 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams) | -| ResNet152 | 0.7826 | 0.9396 | 8.50198 | 19.17073 | 23.05 | 60.19 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams) | -| ResNet152_vd | 0.8059 | 0.9530 | 8.54376 | 19.52157 | 23.53 | 60.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams) | -| ResNet200_vd | 0.8093 | 0.9533 | 10.80619 | 25.01731 | 30.53 | 74.74 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams) | -| ResNet50_vd_
ssld | 0.8239 | 0.9610 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) | -| ResNet50_vd_
ssld_v2 | 0.8300 | 0.9640 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) | -| ResNet101_vd_
ssld | 0.8373 | 0.9669 | 6.11704 | 13.76222 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) | - - - -### 移动端系列 - -移动端系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[移动端系列模型文档](./docs/zh_CN/models/Mobile.md)。 - -| 模型 | Top-1 Acc | Top-5 Acc | SD855 time(ms)
bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址 | -|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------| -| MobileNetV1_
x0_25 | 0.5143 | 0.7546 | 3.21985 | 0.07 | 0.46 | 1.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams) | -| MobileNetV1_
x0_5 | 0.6352 | 0.8473 | 9.579599 | 0.28 | 1.31 | 5.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams) | -| MobileNetV1_
x0_75 | 0.6881 | 0.8823 | 19.436399 | 0.63 | 2.55 | 10 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams) | -| MobileNetV1 | 0.7099 | 0.8968 | 32.523048 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams) | -| MobileNetV1_
ssld | 0.7789 | 0.9394 | 32.523048 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) | -| MobileNetV2_
x0_25 | 0.5321 | 0.7652 | 3.79925 | 0.05 | 1.5 | 6.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams) | -| MobileNetV2_
x0_5 | 0.6503 | 0.8572 | 8.7021 | 0.17 | 1.93 | 7.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams) | -| MobileNetV2_
x0_75 | 0.6983 | 0.8901 | 15.531351 | 0.35 | 2.58 | 10 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams) | -| MobileNetV2 | 0.7215 | 0.9065 | 23.317699 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams) | -| MobileNetV2_
x1_5 | 0.7412 | 0.9167 | 45.623848 | 1.32 | 6.76 | 26 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams) | -| MobileNetV2_
x2_0 | 0.7523 | 0.9258 | 74.291649 | 2.32 | 11.13 | 43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams) | -| MobileNetV2_
ssld | 0.7674 | 0.9339 | 23.317699 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) | -| MobileNetV3_
large_x1_25 | 0.7641 | 0.9295 | 28.217701 | 0.714 | 7.44 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams) | -| MobileNetV3_
large_x1_0 | 0.7532 | 0.9231 | 19.30835 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams) | -| MobileNetV3_
large_x0_75 | 0.7314 | 0.9108 | 13.5646 | 0.296 | 3.91 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams) | -| MobileNetV3_
large_x0_5 | 0.6924 | 0.8852 | 7.49315 | 0.138 | 2.67 | 11 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams) | -| MobileNetV3_
large_x0_35 | 0.6432 | 0.8546 | 5.13695 | 0.077 | 2.1 | 8.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams) | -| MobileNetV3_
small_x1_25 | 0.7067 | 0.8951 | 9.2745 | 0.195 | 3.62 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams) | -| MobileNetV3_
small_x1_0 | 0.6824 | 0.8806 | 6.5463 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams) | -| MobileNetV3_
small_x0_75 | 0.6602 | 0.8633 | 5.28435 | 0.088 | 2.37 | 9.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams) | -| MobileNetV3_
small_x0_5 | 0.5921 | 0.8152 | 3.35165 | 0.043 | 1.9 | 7.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams) | -| MobileNetV3_
small_x0_35 | 0.5303 | 0.7637 | 2.6352 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams) | -| MobileNetV3_
small_x0_35_ssld | 0.5555 | 0.7771 | 2.6352 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) | -| MobileNetV3_
large_x1_0_ssld | 0.7896 | 0.9448 | 19.30835 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) | -| MobileNetV3_small_
x1_0_ssld | 0.7129 | 0.9010 | 6.5463 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) | -| ShuffleNetV2 | 0.6880 | 0.8845 | 10.941 | 0.28 | 2.26 | 9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams) | -| ShuffleNetV2_
x0_25 | 0.4990 | 0.7379 | 2.329 | 0.03 | 0.6 | 2.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams) | -| ShuffleNetV2_
x0_33 | 0.5373 | 0.7705 | 2.64335 | 0.04 | 0.64 | 2.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams) | -| ShuffleNetV2_
x0_5 | 0.6032 | 0.8226 | 4.2613 | 0.08 | 1.36 | 5.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams) | -| ShuffleNetV2_
x1_5 | 0.7163 | 0.9015 | 19.3522 | 0.58 | 3.47 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams) | -| ShuffleNetV2_
x2_0 | 0.7315 | 0.9120 | 34.770149 | 1.12 | 7.32 | 28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams) | -| ShuffleNetV2_
swish | 0.7003 | 0.8917 | 16.023151 | 0.29 | 2.26 | 9.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams) | -| GhostNet_
x0_5 | 0.6688 | 0.8695 | 5.7143 | 0.082 | 2.6 | 10 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams) | -| GhostNet_
x1_0 | 0.7402 | 0.9165 | 13.5587 | 0.294 | 5.2 | 20 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams) | -| GhostNet_
x1_3 | 0.7579 | 0.9254 | 19.9825 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams) | -| GhostNet_
x1_3_ssld | 0.7938 | 0.9449 | 19.9825 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) | - - - -### SEResNeXt与Res2Net系列 - -SEResNeXt与Res2Net系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[SEResNeXt与Res2Net系列模型文档](./docs/zh_CN/models/SEResNext_and_Res2Net.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------| -| Res2Net50_
26w_4s | 0.7933 | 0.9457 | 4.47188 | 9.65722 | 8.52 | 25.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams) | -| Res2Net50_vd_
26w_4s | 0.7975 | 0.9491 | 4.52712 | 9.93247 | 8.37 | 25.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams) | -| Res2Net50_
14w_8s | 0.7946 | 0.9470 | 5.4026 | 10.60273 | 9.01 | 25.72 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams) | -| Res2Net101_vd_
26w_4s | 0.8064 | 0.9522 | 8.08729 | 17.31208 | 16.67 | 45.22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams) | -| Res2Net200_vd_
26w_4s | 0.8121 | 0.9571 | 14.67806 | 32.35032 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams) | -| Res2Net200_vd_
26w_4s_ssld | 0.8513 | 0.9742 | 14.67806 | 32.35032 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) | -| ResNeXt50_
32x4d | 0.7775 | 0.9382 | 7.56327 | 10.6134 | 8.02 | 23.64 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams) | -| ResNeXt50_vd_
32x4d | 0.7956 | 0.9462 | 7.62044 | 11.03385 | 8.5 | 23.66 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams) | -| ResNeXt50_
64x4d | 0.7843 | 0.9413 | 13.80962 | 18.4712 | 15.06 | 42.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams) | -| ResNeXt50_vd_
64x4d | 0.8012 | 0.9486 | 13.94449 | 18.88759 | 15.54 | 42.38 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams) | -| ResNeXt101_
32x4d | 0.7865 | 0.9419 | 16.21503 | 19.96568 | 15.01 | 41.54 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams) | -| ResNeXt101_vd_
32x4d | 0.8033 | 0.9512 | 16.28103 | 20.25611 | 15.49 | 41.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams) | -| ResNeXt101_
64x4d | 0.7835 | 0.9452 | 30.4788 | 36.29801 | 29.05 | 78.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams) | -| ResNeXt101_vd_
64x4d | 0.8078 | 0.9520 | 30.40456 | 36.77324 | 29.53 | 78.14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams) | -| ResNeXt152_
32x4d | 0.7898 | 0.9433 | 24.86299 | 29.36764 | 22.01 | 56.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams) | -| ResNeXt152_vd_
32x4d | 0.8072 | 0.9520 | 25.03258 | 30.08987 | 22.49 | 56.3 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams) | -| ResNeXt152_
64x4d | 0.7951 | 0.9471 | 46.7564 | 56.34108 | 43.03 | 107.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams) | -| ResNeXt152_vd_
64x4d | 0.8108 | 0.9534 | 47.18638 | 57.16257 | 43.52 | 107.59 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams) | -| SE_ResNet18_vd | 0.7333 | 0.9138 | 1.7691 | 4.19877 | 4.14 | 11.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams) | -| SE_ResNet34_vd | 0.7651 | 0.9320 | 2.88559 | 7.03291 | 7.84 | 21.98 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams) | -| SE_ResNet50_vd | 0.7952 | 0.9475 | 4.28393 | 10.38846 | 8.67 | 28.09 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams) | -| SE_ResNeXt50_
32x4d | 0.7844 | 0.9396 | 8.74121 | 13.563 | 8.02 | 26.16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams) | -| SE_ResNeXt50_vd_
32x4d | 0.8024 | 0.9489 | 9.17134 | 14.76192 | 10.76 | 26.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams) | -| SE_ResNeXt101_
32x4d | 0.7939 | 0.9443 | 18.82604 | 25.31814 | 15.02 | 46.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams) | -| SENet154_vd | 0.8140 | 0.9548 | 53.79794 | 66.31684 | 45.83 | 114.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams) | - - - -### DPN与DenseNet系列 - -DPN与DenseNet系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[DPN与DenseNet系列模型文档](./docs/zh_CN/models/DPN_DenseNet.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------| -| DenseNet121 | 0.7566 | 0.9258 | 4.40447 | 9.32623 | 5.69 | 7.98 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams) | -| DenseNet161 | 0.7857 | 0.9414 | 10.39152 | 22.15555 | 15.49 | 28.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams) | -| DenseNet169 | 0.7681 | 0.9331 | 6.43598 | 12.98832 | 6.74 | 14.15 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams) | -| DenseNet201 | 0.7763 | 0.9366 | 8.20652 | 17.45838 | 8.61 | 20.01 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams) | -| DenseNet264 | 0.7796 | 0.9385 | 12.14722 | 26.27707 | 11.54 | 33.37 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams) | -| DPN68 | 0.7678 | 0.9343 | 11.64915 | 12.82807 | 4.03 | 10.78 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams) | -| DPN92 | 0.7985 | 0.9480 | 18.15746 | 23.87545 | 12.54 | 36.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams) | -| DPN98 | 0.8059 | 0.9510 | 21.18196 | 33.23925 | 22.22 | 58.46 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams) | -| DPN107 | 0.8089 | 0.9532 | 27.62046 | 52.65353 | 35.06 | 82.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams) | -| DPN131 | 0.8070 | 0.9514 | 28.33119 | 46.19439 | 30.51 | 75.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams) | - - - - -### HRNet系列 - -HRNet系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[HRNet系列模型文档](./docs/zh_CN/models/HRNet.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------| -| HRNet_W18_C | 0.7692 | 0.9339 | 7.40636 | 13.29752 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams) | -| HRNet_W18_C_ssld | 0.81162 | 0.95804 | 7.40636 | 13.29752 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) | -| HRNet_W30_C | 0.7804 | 0.9402 | 9.57594 | 17.35485 | 16.23 | 37.71 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams) | -| HRNet_W32_C | 0.7828 | 0.9424 | 9.49807 | 17.72921 | 17.86 | 41.23 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams) | -| HRNet_W40_C | 0.7877 | 0.9447 | 12.12202 | 25.68184 | 25.41 | 57.55 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams) | -| HRNet_W44_C | 0.7900 | 0.9451 | 13.19858 | 32.25202 | 29.79 | 67.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams) | -| HRNet_W48_C | 0.7895 | 0.9442 | 13.70761 | 34.43572 | 34.58 | 77.47 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams) | -| HRNet_W48_C_ssld | 0.8363 | 0.9682 | 13.70761 | 34.43572 | 34.58 | 77.47 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) | -| HRNet_W64_C | 0.7930 | 0.9461 | 17.57527 | 47.9533 | 57.83 | 128.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams) | -| SE_HRNet_W64_C_ssld | 0.8475 | 0.9726 | 31.69770 | 94.99546 | 57.83 | 128.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) | - - - -### Inception系列 - -Inception系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[Inception系列模型文档](./docs/zh_CN/models/Inception.md)。 - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------| -| GoogLeNet | 0.7070 | 0.8966 | 1.88038 | 4.48882 | 2.88 | 8.46 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams) | -| Xception41 | 0.7930 | 0.9453 | 4.96939 | 17.01361 | 16.74 | 22.69 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams) | -| Xception41_deeplab | 0.7955 | 0.9438 | 5.33541 | 17.55938 | 18.16 | 26.73 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams) | -| Xception65 | 0.8100 | 0.9549 | 7.26158 | 25.88778 | 25.95 | 35.48 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams) | -| Xception65_deeplab | 0.8032 | 0.9449 | 7.60208 | 26.03699 | 27.37 | 39.52 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) | -| Xception71 | 0.8111 | 0.9545 | 8.72457 | 31.55549 | 31.77 | 37.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams) | -| InceptionV3 | 0.7914 | 0.9459 | 6.64054 | 13.53630 | 11.46 | 23.83 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams) | -| InceptionV4 | 0.8077 | 0.9526 | 12.99342 | 25.23416 | 24.57 | 42.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams) | - - - -### EfficientNet与ResNeXt101_wsl系列 - -EfficientNet与ResNeXt101_wsl系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[EfficientNet与ResNeXt101_wsl系列模型文档](./docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------| -| ResNeXt101_
32x8d_wsl | 0.8255 | 0.9674 | 18.52528 | 34.25319 | 29.14 | 78.44 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams) | -| ResNeXt101_
32x16d_wsl | 0.8424 | 0.9726 | 25.60395 | 71.88384 | 57.55 | 152.66 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x16d_wsl_pretrained.pdparams) | -| ResNeXt101_
32x32d_wsl | 0.8497 | 0.9759 | 54.87396 | 160.04337 | 115.17 | 303.11 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams) | -| ResNeXt101_
32x48d_wsl | 0.8537 | 0.9769 | 99.01698256 | 315.91261 | 173.58 | 456.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams) | -| Fix_ResNeXt101_
32x48d_wsl | 0.8626 | 0.9797 | 160.0838242 | 595.99296 | 354.23 | 456.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Fix_ResNeXt101_32x48d_wsl_pretrained.pdparams) | -| EfficientNetB0 | 0.7738 | 0.9331 | 3.442 | 6.11476 | 0.72 | 5.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams) | -| EfficientNetB1 | 0.7915 | 0.9441 | 5.3322 | 9.41795 | 1.27 | 7.52 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams) | -| EfficientNetB2 | 0.7985 | 0.9474 | 6.29351 | 10.95702 | 1.85 | 8.81 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams) | -| EfficientNetB3 | 0.8115 | 0.9541 | 7.67749 | 16.53288 | 3.43 | 11.84 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams) | -| EfficientNetB4 | 0.8285 | 0.9623 | 12.15894 | 30.94567 | 8.29 | 18.76 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams) | -| EfficientNetB5 | 0.8362 | 0.9672 | 20.48571 | 61.60252 | 19.51 | 29.61 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams) | -| EfficientNetB6 | 0.8400 | 0.9688 | 32.62402 | - | 36.27 | 42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams) | -| EfficientNetB7 | 0.8430 | 0.9689 | 53.93823 | - | 72.35 | 64.92 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams) | -| EfficientNetB0_
small | 0.7580 | 0.9258 | 2.3076 | 4.71886 | 0.72 | 4.65 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams) | - - - -### ResNeSt与RegNet系列 - -ResNeSt与RegNet系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[ResNeSt与RegNet系列模型文档](./docs/zh_CN/models/ResNeSt_RegNet.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| -| ResNeSt50_
fast_1s1x64d | 0.8035 | 0.9528 | 3.45405 | 8.72680 | 8.68 | 26.3 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams) | -| ResNeSt50 | 0.8083 | 0.9542 | 6.69042 | 8.01664 | 10.78 | 27.5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams) | -| RegNetX_4GF | 0.785 | 0.9416 | 6.46478 | 11.19862 | 8 | 22.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams) | - - - -### ViT_and_DeiT系列 - -ViT(Vision Transformer)与DeiT(Data-efficient Image Transformers)系列模型的精度、速度指标如下表所示. 更多关于该系列模型的介绍可以参考: [ViT_and_DeiT系列模型文档](./docs/zh_CN/models/ViT_and_DeiT.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------| -| ViT_small_
patch16_224 | 0.7769 | 0.9342 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | -| ViT_base_
patch16_224 | 0.8195 | 0.9617 | - | - | | 86 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | -| ViT_base_
patch16_384 | 0.8414 | 0.9717 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) | -| ViT_base_
patch32_384 | 0.8176 | 0.9613 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) | -| ViT_large_
patch16_224 | 0.8323 | 0.9650 | - | - | | 307 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | -| ViT_large_
patch16_384 | 0.8513 | 0.9736 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) | -| ViT_large_
patch32_384 | 0.8153 | 0.9608 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) | -| | | | | | | | | - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------| -| DeiT_tiny_
patch16_224 | 0.718 | 0.910 | - | - | | 5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | -| DeiT_small_
patch16_224 | 0.796 | 0.949 | - | - | | 22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | -| DeiT_base_
patch16_224 | 0.817 | 0.957 | - | - | | 86 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | -| DeiT_base_
patch16_384 | 0.830 | 0.962 | - | - | | 87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | -| DeiT_tiny_
distilled_patch16_224 | 0.741 | 0.918 | - | - | | 6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | -| DeiT_small_
distilled_patch16_224 | 0.809 | 0.953 | - | - | | 22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | -| DeiT_base_
distilled_patch16_224 | 0.831 | 0.964 | - | - | | 87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | -| DeiT_base_
distilled_patch16_384 | 0.851 | 0.973 | - | - | | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | -| | | | | | | | | - - - -### RepVGG系列 - -关于RepVGG系列模型的精度、速度指标如下表所示,更多介绍可以参考:[RepVGG系列模型文档](./docs/zh_CN/models/RepVGG.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| -| RepVGG_A0 | 0.7131 | 0.9016 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams) | -| RepVGG_A1 | 0.7380 | 0.9146 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams) | -| RepVGG_A2 | 0.7571 | 0.9264 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams) | -| RepVGG_B0 | 0.7450 | 0.9213 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams) | -| RepVGG_B1 | 0.7773 | 0.9385 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams) | -| RepVGG_B2 | 0.7813 | 0.9410 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams) | -| RepVGG_B1g2 | 0.7732 | 0.9359 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams) | -| RepVGG_B1g4 | 0.7675 | 0.9335 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams) | -| RepVGG_B2g4 | 0.7881 | 0.9448 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams) | -| RepVGG_B3g4 | 0.7965 | 0.9485 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams) | - - - -### MixNet系列 - -关于MixNet系列模型的精度、速度指标如下表所示,更多介绍可以参考:[MixNet系列模型文档](./docs/zh_CN/models/MixNet.md)。 - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(M) | Params(M) | 下载地址 | -| -------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | -| MixNet_S | 0.7628 | 0.9299 | | | 252.977 | 4.167 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams) | -| MixNet_M | 0.7767 | 0.9364 | | | 357.119 | 5.065 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams) | -| MixNet_L | 0.7860 | 0.9437 | | | 579.017 | 7.384 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams) | - - - -### ReXNet系列 - -关于ReXNet系列模型的精度、速度指标如下表所示,更多介绍可以参考:[ReXNet系列模型文档](./docs/zh_CN/models/ReXNet.md)。 - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | -| ReXNet_1_0 | 0.7746 | 0.9370 | | | 0.415 | 4.838 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams) | -| ReXNet_1_3 | 0.7913 | 0.9464 | | | 0.683 | 7.611 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams) | -| ReXNet_1_5 | 0.8006 | 0.9512 | | | 0.900 | 9.791 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_pretrained.pdparams) | -| ReXNet_2_0 | 0.8122 | 0.9536 | | | 1.561 | 16.449 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams) | -| ReXNet_3_0 | 0.8209 | 0.9612 | | | 3.445 | 34.833 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams) | - - - -### SwinTransformer系列 - -关于SwinTransformer系列模型的精度、速度指标如下表所示,更多介绍可以参考:[SwinTransformer系列模型文档](./docs/zh_CN/models/SwinTransformer.md)。 - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | -| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | | | 4.5 | 28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | -| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | | | 8.7 | 50 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | -| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | | | 15.4 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | -| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | | | 47.1 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | -| SwinTransformer_base_patch4_window7_224[1] | 0.8487 | 0.9746 | | | 15.4 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | -| SwinTransformer_base_patch4_window12_384[1] | 0.8642 | 0.9807 | | | 47.1 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | -| SwinTransformer_large_patch4_window7_224[1] | 0.8596 | 0.9783 | | | 34.5 | 197 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | -| SwinTransformer_large_patch4_window12_384[1] | 0.8719 | 0.9823 | | | 103.9 | 197 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | - -[1]:基于ImageNet22k数据集预训练,然后在ImageNet1k数据集迁移学习得到。 - - - -### 其他模型 - -关于AlexNet、SqueezeNet系列、VGG系列、DarkNet53等模型的精度、速度指标如下表所示,更多介绍可以参考:[其他模型文档](./docs/zh_CN/models/Others.md)。 - - -| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | -|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| -| AlexNet | 0.567 | 0.792 | 1.44993 | 2.46696 | 1.370 | 61.090 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams) | -| SqueezeNet1_0 | 0.596 | 0.817 | 0.96736 | 2.53221 | 1.550 | 1.240 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams) | -| SqueezeNet1_1 | 0.601 | 0.819 | 0.76032 | 1.877 | 0.690 | 1.230 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams) | -| VGG11 | 0.693 | 0.891 | 3.90412 | 9.51147 | 15.090 | 132.850 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG11_pretrained.pdparams) | -| VGG13 | 0.700 | 0.894 | 4.64684 | 12.61558 | 22.480 | 133.030 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG13_pretrained.pdparams) | -| VGG16 | 0.720 | 0.907 | 5.61769 | 16.40064 | 30.810 | 138.340 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG16_pretrained.pdparams) | -| VGG19 | 0.726 | 0.909 | 6.65221 | 20.4334 | 39.130 | 143.650 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG19_pretrained.pdparams) | -| DarkNet53 | 0.780 | 0.941 | 4.10829 | 12.1714 | 18.580 | 41.600 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams) | - ## 许可证书 diff --git a/deploy/configs/build_cartoon.yaml b/deploy/configs/build_cartoon.yaml new file mode 100644 index 0000000000000000000000000000000000000000..99aa816442cc69fde0642db9e5971c1e887a5385 --- /dev/null +++ b/deploy/configs/build_cartoon.yaml @@ -0,0 +1,37 @@ +Global: + rec_inference_model_dir: "./models/cartoon_rec_ResNet50_iCartoon_v1.0_infer/" + batch_size: 1 + use_gpu: True + enable_mkldnn: True + cpu_num_threads: 100 + enable_benchmark: True + use_fp16: False + ir_optim: True + use_tensorrt: False + gpu_mem: 8000 + enable_profile: False + +RecPreProcess: + transform_ops: + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + +RecPostProcess: null + +# indexing engine config +IndexProcess: + index_path: "./dataset/cartoon_demo_data_v1.0/index/" + image_root: "./dataset/cartoon_demo_data_v1.0/" + data_file: "./dataset/cartoon_demo_data_v1.0/data_file.txt" + delimiter: "\t" + dist_type: "IP" + pq_size: 100 + embedding_size: 2048 diff --git a/deploy/configs/build_inshop.yaml b/deploy/configs/build_inshop.yaml index ab5392d3b3aa2fc74777573d267ed38f4eb1f44d..c2638e979ec5cac2e2b40fddcdcf6782f8922a36 100644 --- a/deploy/configs/build_inshop.yaml +++ b/deploy/configs/build_inshop.yaml @@ -1,5 +1,5 @@ Global: - rec_inference_model_dir: "./inshop/rec/" + rec_inference_model_dir: "./models/product_ResNet50_vd_Inshop_v1.0_infer" batch_size: 1 use_gpu: True enable_mkldnn: True @@ -26,9 +26,9 @@ RecPostProcess: null # indexing engine config IndexProcess: - index_path: "./inshop/inshop_index/" - image_root: "./inshop/dataset/" - data_file: "./inshop/inshop_gallery_demo.txt" + index_path: "./dataset/product_demo_data_v1.0/index" + image_root: "./dataset/product_demo_data_v1.0" + data_file: "./dataset/product_demo_data_v1.0/data_file.txt" delimiter: " " dist_type: "IP" pq_size: 100 diff --git a/deploy/configs/build_logo.yaml b/deploy/configs/build_logo.yaml index ccd3f347f48b1a210accfc95a3cc067a144d4bab..2ff383b37c48a51566e80c06cc8653dfd38a982c 100644 --- a/deploy/configs/build_logo.yaml +++ b/deploy/configs/build_logo.yaml @@ -1,5 +1,5 @@ Global: - rec_inference_model_dir: "./logo/model/" + rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/" batch_size: 1 use_gpu: True enable_mkldnn: True @@ -26,9 +26,9 @@ RecPostProcess: null # indexing engine config IndexProcess: - index_path: "./logo/logo_index/" - image_root: "./logo/dataset/" - data_file: "./logo/logo_gallery_demo.txt" + index_path: "./dataset/logo_demo_data_v1.0/index/" + image_root: "./dataset/logo_demo_data_v1.0/" + data_file: "./dataset/logo_demo_data_v1.0/data_file.txt" delimiter: "\t" dist_type: "IP" pq_size: 100 diff --git a/deploy/configs/build_vehicle.yaml b/deploy/configs/build_vehicle.yaml index ba45ae9743d2719d2fa6e8c800d3379dee308ae7..c7335dd20c8a0cd4a08b1b01d5eaecbfdaafe54f 100644 --- a/deploy/configs/build_vehicle.yaml +++ b/deploy/configs/build_vehicle.yaml @@ -1,5 +1,5 @@ Global: - rec_inference_model_dir: "./vehicle/model/" + rec_inference_model_dir: "./models/vehicle_cls_ResNet50_CompCars_v1.0_infer/" batch_size: 1 use_gpu: True enable_mkldnn: True @@ -26,9 +26,9 @@ RecPostProcess: null # indexing engine config IndexProcess: - index_path: "./vehilce/vehicle_index/" - image_root: "./vehicle/dataset/" - data_file: "./vehilce/demo_gallery.txt" + index_path: "./dataset/vehicle_demo_data_v1.0/index/" + image_root: "./dataset/vehicle_demo_data_v1.0/" + data_file: "./dataset/vehicle_demo_data_v1.0/data_file.txt" delimiter: " " dist_type: "IP" pq_size: 100 diff --git a/deploy/configs/inference_icartoon.py b/deploy/configs/inference_cartoon.yaml similarity index 63% rename from deploy/configs/inference_icartoon.py rename to deploy/configs/inference_cartoon.yaml index 8bd5aeeec2123088cce3aea249e8072e075cfc2a..e47aca31d4bf7d46e225997880e00a77e7c2b640 100644 --- a/deploy/configs/inference_icartoon.py +++ b/deploy/configs/inference_cartoon.yaml @@ -1,7 +1,7 @@ Global: - infer_imgs: "./dataset/iCartoonFace/val2/0000000.jpg" - det_inference_model_dir: "./output/det" - rec_inference_model_dir: "./output/" + infer_imgs: "./dataset/cartoon_demo_data_v1.0/query/" + det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer/" + rec_inference_model_dir: "./models/cartoon_rec_ResNet50_iCartoon_v1.0_infer/" batch_size: 1 image_shape: [3, 640, 640] threshold: 0.5 @@ -9,7 +9,6 @@ Global: labe_list: - foreground - # inference engine config use_gpu: True enable_mkldnn: True cpu_num_threads: 100 @@ -34,7 +33,6 @@ DetPreProcess: DetPostProcess: {} - RecPreProcess: transform_ops: - ResizeImage: @@ -50,18 +48,8 @@ RecPreProcess: RecPostProcess: null -# indexing engine config IndexProcess: - build: - enable: False - index_path: "./icartoon_index/" - image_root: "./dataset/iCartoonFace" - data_file: "./dataset/iCartoonFace/gallery_pesudo.txt" - spacer: "\t" - dist_type: "IP" - pq_size: 100 - embedding_size: 2048 - infer: - index_path: "./icartoon_index/" - search_budget: 100 - return_k: 10 + index_path: "./dataset/cartoon_demo_data_v1.0/index/" + search_budget: 100 + return_k: 5 + dist_type: "IP" diff --git a/deploy/configs/inference_cls.yaml b/deploy/configs/inference_cls.yaml index 4b15f3f4e48086a1798e17fa293fd24e2f228303..cd8ac8bdecf16eb66410add20ff763b117954bef 100644 --- a/deploy/configs/inference_cls.yaml +++ b/deploy/configs/inference_cls.yaml @@ -1,7 +1,7 @@ Global: infer_imgs: "../docs/images/whl/demo.jpg" - inference_model_dir: "./MobileNetV1_infer/" + inference_model_dir: "../inference/" batch_size: 1 use_gpu: True enable_mkldnn: True @@ -27,4 +27,4 @@ PreProcess: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" \ No newline at end of file + class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt" \ No newline at end of file diff --git a/deploy/configs/inference_inshop.yaml b/deploy/configs/inference_inshop.yaml index 2714afc89be3ed9ac2b065f19c53c024585c9b2c..1e7db144d1223f0c5b969d0d27f9597665f505ad 100644 --- a/deploy/configs/inference_inshop.yaml +++ b/deploy/configs/inference_inshop.yaml @@ -1,11 +1,11 @@ Global: - infer_imgs: "./inshop/demo/01_3_back.jpg" - det_inference_model_dir: "./inshop/det/" - rec_inference_model_dir: "./inshop/rec/" + infer_imgs: "./dataset/product_demo_data_v1.0/query" + det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer" + rec_inference_model_dir: "./models/product_ResNet50_vd_Inshop_v1.0_infer" batch_size: 1 image_shape: [3, 640, 640] threshold: 0.0 - max_det_results: 3 + max_det_results: 1 labe_list: - foreground @@ -48,7 +48,7 @@ RecPostProcess: null # indexing engine config IndexProcess: - index_path: "./inshop/inshop_index" + index_path: "./dataset/product_demo_data_v1.0/index" search_budget: 100 - return_k: 10 + return_k: 5 dist_type: "IP" diff --git a/deploy/configs/inference_logo.yaml b/deploy/configs/inference_logo.yaml index 717cb7bacdbdf0910382a86b202650369eeacfce..719f036d391f929a9f2e48a4af84cc55781e0857 100644 --- a/deploy/configs/inference_logo.yaml +++ b/deploy/configs/inference_logo.yaml @@ -1,7 +1,7 @@ Global: - infer_imgs: "./logo/demo/logo_APK.jpg" - det_inference_model_dir: "./logo/det/" - rec_inference_model_dir: "./logo/rec/" + infer_imgs: "./dataset/logo_demo_data_v1.0/query/logo_AKG.jpg" + det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer/" + rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/" batch_size: 1 image_shape: [3, 640, 640] threshold: 0.5 @@ -48,7 +48,7 @@ RecPostProcess: null # indexing engine config IndexProcess: - index_path: "./logo_index/" + index_path: "./dataset/logo_demo_data_v1.0/index/" search_budget: 100 - return_k: 10 + return_k: 5 dist_type: "IP" diff --git a/deploy/configs/inference_vehicle.yaml b/deploy/configs/inference_vehicle.yaml index 7bc3b889d5576fdaa862f03a739e1be5be45d30a..357b722d57508c04dcefa1e287846b13a01bf958 100644 --- a/deploy/configs/inference_vehicle.yaml +++ b/deploy/configs/inference_vehicle.yaml @@ -1,7 +1,7 @@ Global: - infer_imgs: "./vehicle/demo/2e3521935c280c.jpg" - det_inference_model_dir: "./det/" - rec_inference_model_dir: "./vehicle/rec/" + infer_imgs: "./dataset/vehicle_demo_data_v1.0/query/" + det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer/" + rec_inference_model_dir: "./models/vehicle_cls_ResNet50_CompCars_v1.0_infer/" batch_size: 1 image_shape: [3, 640, 640] threshold: 0.5 @@ -50,7 +50,7 @@ RecPostProcess: null # indexing engine config IndexProcess: - index_path: "./vehicle_index/" + index_path: "./dataset/vehicle_demo_data_v1.0/index/" search_budget: 100 - return_k: 10 + return_k: 5 dist_type: "IP" diff --git a/tools/test_hubserving.py b/deploy/hubserving/test_hubserving.py similarity index 100% rename from tools/test_hubserving.py rename to deploy/hubserving/test_hubserving.py diff --git a/tools/serving/image_http_client.py b/deploy/paddleserving/image_http_client.py similarity index 100% rename from tools/serving/image_http_client.py rename to deploy/paddleserving/image_http_client.py diff --git a/tools/serving/image_service_cpu.py b/deploy/paddleserving/image_service_cpu.py similarity index 100% rename from tools/serving/image_service_cpu.py rename to deploy/paddleserving/image_service_cpu.py diff --git a/tools/serving/image_service_gpu.py b/deploy/paddleserving/image_service_gpu.py similarity index 100% rename from tools/serving/image_service_gpu.py rename to deploy/paddleserving/image_service_gpu.py diff --git a/tools/serving/utils.py b/deploy/paddleserving/utils.py similarity index 100% rename from tools/serving/utils.py rename to deploy/paddleserving/utils.py diff --git a/docs/en/models/DLA.md b/docs/en/models/DLA.md new file mode 100644 index 0000000000000000000000000000000000000000..176d6d1af77631ffb455ab0ad8bd3d4fbe47555c --- /dev/null +++ b/docs/en/models/DLA.md @@ -0,0 +1,21 @@ +# DLA series + +## Overview + +DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484) + + +## Accuracy, FLOPS and Parameters + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:-----------------:|:----------:|:---------:|:---------:|:---------:| +| DLA34 | 15.8 | 3.1 | 76.03 | 92.98 | +| DLA46_c | 1.3 | 0.5 | 63.21 | 85.30 | +| DLA46x_c | 1.1 | 0.5 | 64.36 | 86.01 | +| DLA60 | 22.0 | 4.2 | 76.10 | 92.92 | +| DLA60x | 17.4 | 3.5 | 77.53 | 93.78 | +| DLA60x_c | 1.3 | 0.6 | 66.45 | 87.54 | +| DLA102 | 33.3 | 7.2 | 78.93 | 94.52 | +| DLA102x | 26.4 | 5.9 | 78.10 | 94.00 | +| DLA102x2 | 41.4 | 9.3 | 78.85 | 94.45 | +| DLA169 | 53.5 | 11.6 | 78.09 | 94.09 | diff --git a/docs/en/models/HarDNet.md b/docs/en/models/HarDNet.md new file mode 100644 index 0000000000000000000000000000000000000000..4201cdba289bc992053061c12395a1223fb21090 --- /dev/null +++ b/docs/en/models/HarDNet.md @@ -0,0 +1,14 @@ +# HarDNet series + +## Overview + +HarDNet(Harmonic DenseNet)is a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948). + +## Accuracy, FLOPS and Parameters + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:---------------------:|:----------:|:---------:|:---------:|:---------:| +| HarDNet68 | 17.6 | 4.3 | 75.46 | 92.65 | +| HarDNet85 | 36.7 | 9.1 | 77.44 | 93.55 | +| HarDNet39_ds | 3.5 | 0.4 | 71.33 | 89.98 | +| HarDNet68_ds | 4.2 | 0.8 | 73.62 | 91.52 | \ No newline at end of file diff --git a/docs/en/models/RedNet.md b/docs/en/models/RedNet.md new file mode 100644 index 0000000000000000000000000000000000000000..b93607f2724c375674b8ffa7d579cce6a4dc11a4 --- /dev/null +++ b/docs/en/models/RedNet.md @@ -0,0 +1,16 @@ +# RedNet series + +## Overview + +In the backbone of ResNet and in all bottleneck positions of backbone, the convolution is replaced by Involution, but all convolutions are reserved for channel mapping and fusion. These carefully redesigned entities combine to form a new efficient backbone network, called Rednet. [paper](https://arxiv.org/abs/2103.06255). + + +## Accuracy, FLOPS and Parameters + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:---------------------:|:----------:|:---------:|:---------:|:---------:| +| RedNet26 | 9.2 | 1.7 | 75.95 | 93.19 | +| RedNet38 | 12.4 | 2.2 | 77.47 | 93.56 | +| RedNet50 | 15.5 | 2.7 | 78.33 | 94.17 | +| RedNet101 | 25.7 | 4.7 | 78.94 | 94.36 | +| RedNet152 | 34.0 | 6.8 | 79.17 | 94.40 | \ No newline at end of file diff --git a/docs/en/models/TNT.md b/docs/en/models/TNT.md new file mode 100644 index 0000000000000000000000000000000000000000..7e20edab4d5309653e15c7fbd84004e49bb83d81 --- /dev/null +++ b/docs/en/models/TNT.md @@ -0,0 +1,13 @@ +# TNT series + +## Overview + +TNT(Transformer-iN-Transformer) series models were proposed by Huawei-Noah in 2021 for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost. [Paper](https://arxiv.org/abs/2103.00112). + + + +## Accuracy, FLOPS and Parameters + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:---------------------:|:----------:|:---------:|:---------:|:---------:| +| TNT_small | 23.8 | 5.2 | 81.12 | 95.56 | \ No newline at end of file diff --git a/docs/images/logo/logodet3k.jpg b/docs/images/logo/logodet3k.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4c3e719d2702c24c574ddb975a91e081549e3e09 Binary files /dev/null and b/docs/images/logo/logodet3k.jpg differ diff --git a/docs/zh_CN/ImageNet_models_cn.md b/docs/zh_CN/ImageNet_models_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..d135c9572179ef844e923294d872d64ba45ce1da --- /dev/null +++ b/docs/zh_CN/ImageNet_models_cn.md @@ -0,0 +1,388 @@ +简体中文 | [English](README.md) + + +## ImageNet预训练模型库 + + +### 模型库概览图 + +基于ImageNet1k分类数据集,PaddleClas支持24种系列分类网络结构以及对应的122个图像分类预训练模型,训练技巧、每个系列网络结构的简单介绍和性能评估将在相应章节展现,下面所有的速度指标评估环境如下: +* CPU的评估环境基于骁龙855(SD855)。 +* GPU评估环境基于T4机器,在FP32+TensorRT配置下运行500次测得(去除前10次的warmup时间)。 + +常见服务器端模型的精度指标与其预测耗时的变化曲线如下图所示。 + +![](./docs/images/models/T4_benchmark/t4.fp32.bs1.main_fps_top1.png) + + +常见移动端模型的精度指标与其预测耗时、模型存储大小的变化曲线如下图所示。 + +![](./docs/images/models/mobile_arm_storage.png) + +![](./docs/images/models/mobile_arm_top1.png) + + + +### SSLD知识蒸馏预训练模型 +基于SSLD知识蒸馏的预训练模型列表如下所示,更多关于SSLD知识蒸馏方案的介绍可以参考:[SSLD知识蒸馏文档](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)。 + +* 服务器端知识蒸馏模型 + +| 模型 | Top-1 Acc | Reference
Top-1 Acc | Acc gain | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------| +| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) | +| ResNet50_vd_
ssld | 0.824 | 0.791 | 0.033 | 3.531 | 8.090 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) | +| ResNet50_vd_
ssld_v2 | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) | +| ResNet101_vd_
ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) | +| Res2Net50_vd_
26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) | +| Res2Net101_vd_
26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) | +| Res2Net200_vd_
26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) | +| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) | +| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 13.707 | 34.435 | 34.58 | 77.47 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) | +| SE_HRNet_W64_C_ssld | 0.848 | - | - | 31.697 | 94.995 | 57.83 | 128.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) | + + +* 端侧知识蒸馏模型 + +| 模型 | Top-1 Acc | Reference
Top-1 Acc | Acc gain | SD855 time(ms)
bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址 | +|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------| +| MobileNetV1_
ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) | +| MobileNetV2_
ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) | +| MobileNetV3_
small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) | +| MobileNetV3_
large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) | +| MobileNetV3_small_
x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) | +| GhostNet_
x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) | + + +* 注: `Reference Top-1 Acc`表示PaddleClas基于ImageNet1k数据集训练得到的预训练模型精度。 + + +### ResNet及其Vd系列 + +ResNet及其Vd系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[ResNet及其Vd系列模型文档](./docs/zh_CN/models/ResNet_and_vd.md)。 + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------| +| ResNet18 | 0.7098 | 0.8992 | 1.45606 | 3.56305 | 3.66 | 11.69 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams) | +| ResNet18_vd | 0.7226 | 0.9080 | 1.54557 | 3.85363 | 4.14 | 11.71 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams) | +| ResNet34 | 0.7457 | 0.9214 | 2.34957 | 5.89821 | 7.36 | 21.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams) | +| ResNet34_vd | 0.7598 | 0.9298 | 2.43427 | 6.22257 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams) | +| ResNet34_vd_ssld | 0.7972 | 0.9490 | 2.43427 | 6.22257 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) | +| ResNet50 | 0.7650 | 0.9300 | 3.47712 | 7.84421 | 8.19 | 25.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) | +| ResNet50_vc | 0.7835 | 0.9403 | 3.52346 | 8.10725 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams) | +| ResNet50_vd | 0.7912 | 0.9444 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams) | +| ResNet50_vd_v2 | 0.7984 | 0.9493 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_v2_pretrained.pdparams) | +| ResNet101 | 0.7756 | 0.9364 | 6.07125 | 13.40573 | 15.52 | 44.55 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams) | +| ResNet101_vd | 0.8017 | 0.9497 | 6.11704 | 13.76222 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams) | +| ResNet152 | 0.7826 | 0.9396 | 8.50198 | 19.17073 | 23.05 | 60.19 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams) | +| ResNet152_vd | 0.8059 | 0.9530 | 8.54376 | 19.52157 | 23.53 | 60.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams) | +| ResNet200_vd | 0.8093 | 0.9533 | 10.80619 | 25.01731 | 30.53 | 74.74 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams) | +| ResNet50_vd_
ssld | 0.8239 | 0.9610 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) | +| ResNet50_vd_
ssld_v2 | 0.8300 | 0.9640 | 3.53131 | 8.09057 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) | +| ResNet101_vd_
ssld | 0.8373 | 0.9669 | 6.11704 | 13.76222 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) | + + + +### 移动端系列 + +移动端系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[移动端系列模型文档](./docs/zh_CN/models/Mobile.md)。 + +| 模型 | Top-1 Acc | Top-5 Acc | SD855 time(ms)
bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址 | +|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------| +| MobileNetV1_
x0_25 | 0.5143 | 0.7546 | 3.21985 | 0.07 | 0.46 | 1.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams) | +| MobileNetV1_
x0_5 | 0.6352 | 0.8473 | 9.579599 | 0.28 | 1.31 | 5.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams) | +| MobileNetV1_
x0_75 | 0.6881 | 0.8823 | 19.436399 | 0.63 | 2.55 | 10 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams) | +| MobileNetV1 | 0.7099 | 0.8968 | 32.523048 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams) | +| MobileNetV1_
ssld | 0.7789 | 0.9394 | 32.523048 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) | +| MobileNetV2_
x0_25 | 0.5321 | 0.7652 | 3.79925 | 0.05 | 1.5 | 6.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams) | +| MobileNetV2_
x0_5 | 0.6503 | 0.8572 | 8.7021 | 0.17 | 1.93 | 7.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams) | +| MobileNetV2_
x0_75 | 0.6983 | 0.8901 | 15.531351 | 0.35 | 2.58 | 10 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams) | +| MobileNetV2 | 0.7215 | 0.9065 | 23.317699 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams) | +| MobileNetV2_
x1_5 | 0.7412 | 0.9167 | 45.623848 | 1.32 | 6.76 | 26 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams) | +| MobileNetV2_
x2_0 | 0.7523 | 0.9258 | 74.291649 | 2.32 | 11.13 | 43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams) | +| MobileNetV2_
ssld | 0.7674 | 0.9339 | 23.317699 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) | +| MobileNetV3_
large_x1_25 | 0.7641 | 0.9295 | 28.217701 | 0.714 | 7.44 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams) | +| MobileNetV3_
large_x1_0 | 0.7532 | 0.9231 | 19.30835 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams) | +| MobileNetV3_
large_x0_75 | 0.7314 | 0.9108 | 13.5646 | 0.296 | 3.91 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams) | +| MobileNetV3_
large_x0_5 | 0.6924 | 0.8852 | 7.49315 | 0.138 | 2.67 | 11 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams) | +| MobileNetV3_
large_x0_35 | 0.6432 | 0.8546 | 5.13695 | 0.077 | 2.1 | 8.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams) | +| MobileNetV3_
small_x1_25 | 0.7067 | 0.8951 | 9.2745 | 0.195 | 3.62 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams) | +| MobileNetV3_
small_x1_0 | 0.6824 | 0.8806 | 6.5463 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams) | +| MobileNetV3_
small_x0_75 | 0.6602 | 0.8633 | 5.28435 | 0.088 | 2.37 | 9.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams) | +| MobileNetV3_
small_x0_5 | 0.5921 | 0.8152 | 3.35165 | 0.043 | 1.9 | 7.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams) | +| MobileNetV3_
small_x0_35 | 0.5303 | 0.7637 | 2.6352 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams) | +| MobileNetV3_
small_x0_35_ssld | 0.5555 | 0.7771 | 2.6352 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) | +| MobileNetV3_
large_x1_0_ssld | 0.7896 | 0.9448 | 19.30835 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) | +| MobileNetV3_small_
x1_0_ssld | 0.7129 | 0.9010 | 6.5463 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) | +| ShuffleNetV2 | 0.6880 | 0.8845 | 10.941 | 0.28 | 2.26 | 9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams) | +| ShuffleNetV2_
x0_25 | 0.4990 | 0.7379 | 2.329 | 0.03 | 0.6 | 2.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams) | +| ShuffleNetV2_
x0_33 | 0.5373 | 0.7705 | 2.64335 | 0.04 | 0.64 | 2.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams) | +| ShuffleNetV2_
x0_5 | 0.6032 | 0.8226 | 4.2613 | 0.08 | 1.36 | 5.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams) | +| ShuffleNetV2_
x1_5 | 0.7163 | 0.9015 | 19.3522 | 0.58 | 3.47 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams) | +| ShuffleNetV2_
x2_0 | 0.7315 | 0.9120 | 34.770149 | 1.12 | 7.32 | 28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams) | +| ShuffleNetV2_
swish | 0.7003 | 0.8917 | 16.023151 | 0.29 | 2.26 | 9.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams) | +| GhostNet_
x0_5 | 0.6688 | 0.8695 | 5.7143 | 0.082 | 2.6 | 10 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams) | +| GhostNet_
x1_0 | 0.7402 | 0.9165 | 13.5587 | 0.294 | 5.2 | 20 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams) | +| GhostNet_
x1_3 | 0.7579 | 0.9254 | 19.9825 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams) | +| GhostNet_
x1_3_ssld | 0.7938 | 0.9449 | 19.9825 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) | + + + +### SEResNeXt与Res2Net系列 + +SEResNeXt与Res2Net系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[SEResNeXt与Res2Net系列模型文档](./docs/zh_CN/models/SEResNext_and_Res2Net.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------| +| Res2Net50_
26w_4s | 0.7933 | 0.9457 | 4.47188 | 9.65722 | 8.52 | 25.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams) | +| Res2Net50_vd_
26w_4s | 0.7975 | 0.9491 | 4.52712 | 9.93247 | 8.37 | 25.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams) | +| Res2Net50_
14w_8s | 0.7946 | 0.9470 | 5.4026 | 10.60273 | 9.01 | 25.72 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams) | +| Res2Net101_vd_
26w_4s | 0.8064 | 0.9522 | 8.08729 | 17.31208 | 16.67 | 45.22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams) | +| Res2Net200_vd_
26w_4s | 0.8121 | 0.9571 | 14.67806 | 32.35032 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams) | +| Res2Net200_vd_
26w_4s_ssld | 0.8513 | 0.9742 | 14.67806 | 32.35032 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) | +| ResNeXt50_
32x4d | 0.7775 | 0.9382 | 7.56327 | 10.6134 | 8.02 | 23.64 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams) | +| ResNeXt50_vd_
32x4d | 0.7956 | 0.9462 | 7.62044 | 11.03385 | 8.5 | 23.66 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams) | +| ResNeXt50_
64x4d | 0.7843 | 0.9413 | 13.80962 | 18.4712 | 15.06 | 42.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams) | +| ResNeXt50_vd_
64x4d | 0.8012 | 0.9486 | 13.94449 | 18.88759 | 15.54 | 42.38 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams) | +| ResNeXt101_
32x4d | 0.7865 | 0.9419 | 16.21503 | 19.96568 | 15.01 | 41.54 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams) | +| ResNeXt101_vd_
32x4d | 0.8033 | 0.9512 | 16.28103 | 20.25611 | 15.49 | 41.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams) | +| ResNeXt101_
64x4d | 0.7835 | 0.9452 | 30.4788 | 36.29801 | 29.05 | 78.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams) | +| ResNeXt101_vd_
64x4d | 0.8078 | 0.9520 | 30.40456 | 36.77324 | 29.53 | 78.14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams) | +| ResNeXt152_
32x4d | 0.7898 | 0.9433 | 24.86299 | 29.36764 | 22.01 | 56.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams) | +| ResNeXt152_vd_
32x4d | 0.8072 | 0.9520 | 25.03258 | 30.08987 | 22.49 | 56.3 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams) | +| ResNeXt152_
64x4d | 0.7951 | 0.9471 | 46.7564 | 56.34108 | 43.03 | 107.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams) | +| ResNeXt152_vd_
64x4d | 0.8108 | 0.9534 | 47.18638 | 57.16257 | 43.52 | 107.59 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams) | +| SE_ResNet18_vd | 0.7333 | 0.9138 | 1.7691 | 4.19877 | 4.14 | 11.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams) | +| SE_ResNet34_vd | 0.7651 | 0.9320 | 2.88559 | 7.03291 | 7.84 | 21.98 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams) | +| SE_ResNet50_vd | 0.7952 | 0.9475 | 4.28393 | 10.38846 | 8.67 | 28.09 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams) | +| SE_ResNeXt50_
32x4d | 0.7844 | 0.9396 | 8.74121 | 13.563 | 8.02 | 26.16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams) | +| SE_ResNeXt50_vd_
32x4d | 0.8024 | 0.9489 | 9.17134 | 14.76192 | 10.76 | 26.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams) | +| SE_ResNeXt101_
32x4d | 0.7939 | 0.9443 | 18.82604 | 25.31814 | 15.02 | 46.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams) | +| SENet154_vd | 0.8140 | 0.9548 | 53.79794 | 66.31684 | 45.83 | 114.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams) | + + + +### DPN与DenseNet系列 + +DPN与DenseNet系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[DPN与DenseNet系列模型文档](./docs/zh_CN/models/DPN_DenseNet.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------| +| DenseNet121 | 0.7566 | 0.9258 | 4.40447 | 9.32623 | 5.69 | 7.98 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams) | +| DenseNet161 | 0.7857 | 0.9414 | 10.39152 | 22.15555 | 15.49 | 28.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams) | +| DenseNet169 | 0.7681 | 0.9331 | 6.43598 | 12.98832 | 6.74 | 14.15 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams) | +| DenseNet201 | 0.7763 | 0.9366 | 8.20652 | 17.45838 | 8.61 | 20.01 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams) | +| DenseNet264 | 0.7796 | 0.9385 | 12.14722 | 26.27707 | 11.54 | 33.37 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams) | +| DPN68 | 0.7678 | 0.9343 | 11.64915 | 12.82807 | 4.03 | 10.78 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams) | +| DPN92 | 0.7985 | 0.9480 | 18.15746 | 23.87545 | 12.54 | 36.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams) | +| DPN98 | 0.8059 | 0.9510 | 21.18196 | 33.23925 | 22.22 | 58.46 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams) | +| DPN107 | 0.8089 | 0.9532 | 27.62046 | 52.65353 | 35.06 | 82.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams) | +| DPN131 | 0.8070 | 0.9514 | 28.33119 | 46.19439 | 30.51 | 75.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams) | + + + + +### HRNet系列 + +HRNet系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[HRNet系列模型文档](./docs/zh_CN/models/HRNet.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------| +| HRNet_W18_C | 0.7692 | 0.9339 | 7.40636 | 13.29752 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams) | +| HRNet_W18_C_ssld | 0.81162 | 0.95804 | 7.40636 | 13.29752 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) | +| HRNet_W30_C | 0.7804 | 0.9402 | 9.57594 | 17.35485 | 16.23 | 37.71 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams) | +| HRNet_W32_C | 0.7828 | 0.9424 | 9.49807 | 17.72921 | 17.86 | 41.23 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams) | +| HRNet_W40_C | 0.7877 | 0.9447 | 12.12202 | 25.68184 | 25.41 | 57.55 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams) | +| HRNet_W44_C | 0.7900 | 0.9451 | 13.19858 | 32.25202 | 29.79 | 67.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams) | +| HRNet_W48_C | 0.7895 | 0.9442 | 13.70761 | 34.43572 | 34.58 | 77.47 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams) | +| HRNet_W48_C_ssld | 0.8363 | 0.9682 | 13.70761 | 34.43572 | 34.58 | 77.47 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) | +| HRNet_W64_C | 0.7930 | 0.9461 | 17.57527 | 47.9533 | 57.83 | 128.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams) | +| SE_HRNet_W64_C_ssld | 0.8475 | 0.9726 | 31.69770 | 94.99546 | 57.83 | 128.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) | + + + +### Inception系列 + +Inception系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[Inception系列模型文档](./docs/zh_CN/models/Inception.md)。 + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------| +| GoogLeNet | 0.7070 | 0.8966 | 1.88038 | 4.48882 | 2.88 | 8.46 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams) | +| Xception41 | 0.7930 | 0.9453 | 4.96939 | 17.01361 | 16.74 | 22.69 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams) | +| Xception41_deeplab | 0.7955 | 0.9438 | 5.33541 | 17.55938 | 18.16 | 26.73 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams) | +| Xception65 | 0.8100 | 0.9549 | 7.26158 | 25.88778 | 25.95 | 35.48 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams) | +| Xception65_deeplab | 0.8032 | 0.9449 | 7.60208 | 26.03699 | 27.37 | 39.52 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) | +| Xception71 | 0.8111 | 0.9545 | 8.72457 | 31.55549 | 31.77 | 37.28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams) | +| InceptionV3 | 0.7914 | 0.9459 | 6.64054 | 13.53630 | 11.46 | 23.83 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams) | +| InceptionV4 | 0.8077 | 0.9526 | 12.99342 | 25.23416 | 24.57 | 42.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams) | + + + +### EfficientNet与ResNeXt101_wsl系列 + +EfficientNet与ResNeXt101_wsl系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[EfficientNet与ResNeXt101_wsl系列模型文档](./docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------| +| ResNeXt101_
32x8d_wsl | 0.8255 | 0.9674 | 18.52528 | 34.25319 | 29.14 | 78.44 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams) | +| ResNeXt101_
32x16d_wsl | 0.8424 | 0.9726 | 25.60395 | 71.88384 | 57.55 | 152.66 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x16d_wsl_pretrained.pdparams) | +| ResNeXt101_
32x32d_wsl | 0.8497 | 0.9759 | 54.87396 | 160.04337 | 115.17 | 303.11 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams) | +| ResNeXt101_
32x48d_wsl | 0.8537 | 0.9769 | 99.01698256 | 315.91261 | 173.58 | 456.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams) | +| Fix_ResNeXt101_
32x48d_wsl | 0.8626 | 0.9797 | 160.0838242 | 595.99296 | 354.23 | 456.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Fix_ResNeXt101_32x48d_wsl_pretrained.pdparams) | +| EfficientNetB0 | 0.7738 | 0.9331 | 3.442 | 6.11476 | 0.72 | 5.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams) | +| EfficientNetB1 | 0.7915 | 0.9441 | 5.3322 | 9.41795 | 1.27 | 7.52 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams) | +| EfficientNetB2 | 0.7985 | 0.9474 | 6.29351 | 10.95702 | 1.85 | 8.81 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams) | +| EfficientNetB3 | 0.8115 | 0.9541 | 7.67749 | 16.53288 | 3.43 | 11.84 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams) | +| EfficientNetB4 | 0.8285 | 0.9623 | 12.15894 | 30.94567 | 8.29 | 18.76 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams) | +| EfficientNetB5 | 0.8362 | 0.9672 | 20.48571 | 61.60252 | 19.51 | 29.61 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams) | +| EfficientNetB6 | 0.8400 | 0.9688 | 32.62402 | - | 36.27 | 42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams) | +| EfficientNetB7 | 0.8430 | 0.9689 | 53.93823 | - | 72.35 | 64.92 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams) | +| EfficientNetB0_
small | 0.7580 | 0.9258 | 2.3076 | 4.71886 | 0.72 | 4.65 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams) | + + + +### ResNeSt与RegNet系列 + +ResNeSt与RegNet系列模型的精度、速度指标如下表所示,更多关于该系列的模型介绍可以参考:[ResNeSt与RegNet系列模型文档](./docs/zh_CN/models/ResNeSt_RegNet.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| +| ResNeSt50_
fast_1s1x64d | 0.8035 | 0.9528 | 3.45405 | 8.72680 | 8.68 | 26.3 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams) | +| ResNeSt50 | 0.8083 | 0.9542 | 6.69042 | 8.01664 | 10.78 | 27.5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams) | +| RegNetX_4GF | 0.785 | 0.9416 | 6.46478 | 11.19862 | 8 | 22.1 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams) | + + + +### ViT_and_DeiT系列 + +ViT(Vision Transformer)与DeiT(Data-efficient Image Transformers)系列模型的精度、速度指标如下表所示. 更多关于该系列模型的介绍可以参考: [ViT_and_DeiT系列模型文档](./docs/zh_CN/models/ViT_and_DeiT.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------| +| ViT_small_
patch16_224 | 0.7769 | 0.9342 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | +| ViT_base_
patch16_224 | 0.8195 | 0.9617 | - | - | | 86 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | +| ViT_base_
patch16_384 | 0.8414 | 0.9717 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) | +| ViT_base_
patch32_384 | 0.8176 | 0.9613 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) | +| ViT_large_
patch16_224 | 0.8323 | 0.9650 | - | - | | 307 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | +| ViT_large_
patch16_384 | 0.8513 | 0.9736 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) | +| ViT_large_
patch32_384 | 0.8153 | 0.9608 | - | - | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) | +| | | | | | | | | + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------| +| DeiT_tiny_
patch16_224 | 0.718 | 0.910 | - | - | | 5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | +| DeiT_small_
patch16_224 | 0.796 | 0.949 | - | - | | 22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | +| DeiT_base_
patch16_224 | 0.817 | 0.957 | - | - | | 86 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | +| DeiT_base_
patch16_384 | 0.830 | 0.962 | - | - | | 87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | +| DeiT_tiny_
distilled_patch16_224 | 0.741 | 0.918 | - | - | | 6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | +| DeiT_small_
distilled_patch16_224 | 0.809 | 0.953 | - | - | | 22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | +| DeiT_base_
distilled_patch16_224 | 0.831 | 0.964 | - | - | | 87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | +| DeiT_base_
distilled_patch16_384 | 0.851 | 0.973 | - | - | | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | +| | | | | | | | | + + + +### RepVGG系列 + +关于RepVGG系列模型的精度、速度指标如下表所示,更多介绍可以参考:[RepVGG系列模型文档](./docs/zh_CN/models/RepVGG.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| +| RepVGG_A0 | 0.7131 | 0.9016 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams) | +| RepVGG_A1 | 0.7380 | 0.9146 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams) | +| RepVGG_A2 | 0.7571 | 0.9264 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams) | +| RepVGG_B0 | 0.7450 | 0.9213 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams) | +| RepVGG_B1 | 0.7773 | 0.9385 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams) | +| RepVGG_B2 | 0.7813 | 0.9410 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams) | +| RepVGG_B1g2 | 0.7732 | 0.9359 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams) | +| RepVGG_B1g4 | 0.7675 | 0.9335 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams) | +| RepVGG_B2g4 | 0.7881 | 0.9448 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams) | +| RepVGG_B3g4 | 0.7965 | 0.9485 | | | | | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams) | + + + +### MixNet系列 + +关于MixNet系列模型的精度、速度指标如下表所示,更多介绍可以参考:[MixNet系列模型文档](./docs/zh_CN/models/MixNet.md)。 + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(M) | Params(M) | 下载地址 | +| -------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | +| MixNet_S | 0.7628 | 0.9299 | | | 252.977 | 4.167 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams) | +| MixNet_M | 0.7767 | 0.9364 | | | 357.119 | 5.065 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams) | +| MixNet_L | 0.7860 | 0.9437 | | | 579.017 | 7.384 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams) | + + + +### ReXNet系列 + +关于ReXNet系列模型的精度、速度指标如下表所示,更多介绍可以参考:[ReXNet系列模型文档](./docs/zh_CN/models/ReXNet.md)。 + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | +| ReXNet_1_0 | 0.7746 | 0.9370 | | | 0.415 | 4.838 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams) | +| ReXNet_1_3 | 0.7913 | 0.9464 | | | 0.683 | 7.611 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams) | +| ReXNet_1_5 | 0.8006 | 0.9512 | | | 0.900 | 9.791 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_pretrained.pdparams) | +| ReXNet_2_0 | 0.8122 | 0.9536 | | | 1.561 | 16.449 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams) | +| ReXNet_3_0 | 0.8209 | 0.9612 | | | 3.445 | 34.833 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams) | + + + +### SwinTransformer系列 + +关于SwinTransformer系列模型的精度、速度指标如下表所示,更多介绍可以参考:[SwinTransformer系列模型文档](./docs/zh_CN/models/SwinTransformer.md)。 + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | +| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | | | 4.5 | 28 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | +| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | | | 8.7 | 50 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | +| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | | | 15.4 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | +| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | | | 47.1 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | +| SwinTransformer_base_patch4_window7_224[1] | 0.8487 | 0.9746 | | | 15.4 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | +| SwinTransformer_base_patch4_window12_384[1] | 0.8642 | 0.9807 | | | 47.1 | 88 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | +| SwinTransformer_large_patch4_window7_224[1] | 0.8596 | 0.9783 | | | 34.5 | 197 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | +| SwinTransformer_large_patch4_window12_384[1] | 0.8719 | 0.9823 | | | 103.9 | 197 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | + +[1]:基于ImageNet22k数据集预训练,然后在ImageNet1k数据集迁移学习得到。 + + + +### 其他模型 + +关于AlexNet、SqueezeNet系列、VGG系列、DarkNet53等模型的精度、速度指标如下表所示,更多介绍可以参考:[其他模型文档](./docs/zh_CN/models/Others.md)。 + + +| 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | Flops(G) | Params(M) | 下载地址 | +|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| +| AlexNet | 0.567 | 0.792 | 1.44993 | 2.46696 | 1.370 | 61.090 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams) | +| SqueezeNet1_0 | 0.596 | 0.817 | 0.96736 | 2.53221 | 1.550 | 1.240 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams) | +| SqueezeNet1_1 | 0.601 | 0.819 | 0.76032 | 1.877 | 0.690 | 1.230 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams) | +| VGG11 | 0.693 | 0.891 | 3.90412 | 9.51147 | 15.090 | 132.850 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG11_pretrained.pdparams) | +| VGG13 | 0.700 | 0.894 | 4.64684 | 12.61558 | 22.480 | 133.030 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG13_pretrained.pdparams) | +| VGG16 | 0.720 | 0.907 | 5.61769 | 16.40064 | 30.810 | 138.340 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG16_pretrained.pdparams) | +| VGG19 | 0.726 | 0.909 | 6.65221 | 20.4334 | 39.130 | 143.650 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG19_pretrained.pdparams) | +| DarkNet53 | 0.780 | 0.941 | 4.10829 | 12.1714 | 18.580 | 41.600 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams) | + + + +## 许可证书 +本项目的发布受Apache 2.0 license许可认证。 + + + +## 贡献代码 +我们非常欢迎你为PaddleClas贡献代码,也十分感谢你的反馈。 + +- 非常感谢[nblib](https://github.com/nblib)修正了PaddleClas中RandErasing的数据增广配置文件。 +- 非常感谢[chenpy228](https://github.com/chenpy228)修正了PaddleClas文档中的部分错别字。 +- 非常感谢[jm12138](https://github.com/jm12138)为PaddleClas添加ViT,DeiT系列模型和RepVGG系列模型。 +- 非常感谢[FutureSI](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/76563)对PaddleClas代码的解析与总结。 + +我们非常欢迎你为PaddleClas贡献代码,也十分感谢你的反馈。 diff --git a/docs/zh_CN/application/logo_recognition.md b/docs/zh_CN/application/logo_recognition.md new file mode 100644 index 0000000000000000000000000000000000000000..813c0627050970aef76478b50b801792ee1d3b44 --- /dev/null +++ b/docs/zh_CN/application/logo_recognition.md @@ -0,0 +1,184 @@ +# Logo识别 + + Logo识别技术,是现实生活中应用很广的一个领域,比如一张照片中是否出现了Adidas或者Nike的商标Logo,或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时,往往采用检测+识别两阶段方式,检测模块负责检测出潜在的Logo区域,根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍,内容包括: + +- 数据集及预处理方式 +- Backbone的具体设置 +- Loss函数的相关设置 + +全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml) + +## 数据集及预处理 + +### LogoDet-3K数据集 + + + +LogoDet-3K数据集是具有完整标注的Logo数据集,有3000个标识类别,约20万个高质量的人工标注的标识对象和158652张图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359) + +## 数据预处理 + +由于原始的数据集中,图像包含标注的检测框,在识别阶段只考虑检测器抠图后的logo区域,因此采用原始的标注框抠出Logo区域图像构成训练集,排除背景在识别阶段的影响。对数据集进行划分,产生155427张训练集,覆盖3000个logo类别(同时作为测试时gallery图库),3225张测试集,用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359) +- 图像`Resize`到224 +- 随机水平翻转 +- [AugMix](https://arxiv.org/abs/1912.02781v1) +- Normlize:归一化到0~1 +- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf) + +在配置文件中设置如下,详见`transform_ops`部分: + +```yaml +DataLoader: + Train: + dataset: + # 具体使用的Dataset的的名称 + name: "LogoDataset" + # 使用此数据集的具体参数 + image_root: "dataset/LogoDet-3K-crop/train/" + cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt" + # 图像增广策略:ResizeImage、RandFlipImage等 + transform_ops: + - ResizeImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - AugMix: + prob: 0.5 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - RandomErasing: + EPSILON: 0.5 + sampler: + name: DistributedRandomIdentitySampler + batch_size: 128 + num_instances: 2 + drop_last: False + shuffle: True + loader: + num_workers: 6 + use_shared_memory: False +``` + +## Backbone的具体设置 + +具体是用`ResNet50`作为backbone,主要做了如下修改: + + - 使用ImageNet预训练模型 + + - last stage stride=1, 保持最后输出特征图尺寸14x14 + + - 在最后加入一个embedding 卷积层,特征维度为512 + + 具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py) + +在配置文件中Backbone设置如下: + +```yaml +Arch: + # 使用RecModel模型进行训练,目前支持普通ImageNet和RecModel两个方式 + name: "RecModel" + # 导出inference model的具体配置 + infer_output_key: "features" + infer_add_softmax: False + # 使用的Backbone + Backbone: + name: "ResNet50_last_stage_stride1" + pretrained: True + # 使用此层作为Backbone的feature输出,name为具体层的full_name + BackboneStopLayer: + name: "adaptive_avg_pool2d_0" + # Backbone的基础上,新增网络层。此模型添加1x1的卷积层(embedding) + Neck: + name: "VehicleNeck" + in_channels: 2048 + out_channels: 512 + # 增加CircleMargin head + Head: + name: "CircleMargin" + margin: 0.35 + scale: 64 + embedding_size: 512 +``` + +## Loss的设置 + +在Logo识别中,使用了[Pairwise Cosface + CircleMargin](https://arxiv.org/abs/2002.10857) 联合训练,其中权重比例为1:1 + +具体代码详见:[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py) 、[CircleMargin](../../../ppcls/arch/gears/circlemargin.py) + +在配置文件中设置如下: + +```yaml +Loss: + Train: + - CELoss: + weight: 1.0 + - PairwiseCosface: + margin: 0.35 + gamma: 64 + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 +``` + +## 其他相关设置 + +### Optimizer设置 + +```yaml +Optimizer: + # 使用的优化器名称 + name: Momentum + # 优化器具体参数 + momentum: 0.9 + lr: + # 使用的学习率调节具体名称 + name: Cosine + # 学习率调节算法具体参数 + learning_rate: 0.01 + regularizer: + name: 'L2' + coeff: 0.0001 +``` + +### Eval Metric设置 + +```yaml +Metric: + Eval: + # 使用Recallk和mAP两种评价指标 + - Recallk: + topk: [1, 5] + - mAP: {} +``` + +### 其他超参数设置 + +```yaml +Global: + # 如为null则从头开始训练。若指定中间训练保存的状态地址,则继续训练 + checkpoints: null + pretrained_model: null + output_dir: "./output/" + device: "gpu" + class_num: 3000 + # 保存模型的粒度,每个epoch保存一次 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + # 训练的epoch数 + epochs: 120 + # log输出频率 + print_batch_step: 10 + # 是否使用visualdl库 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: "./inference" + # 使用retrival的方式进行评测 + eval_mode: "retrieval" +``` diff --git a/docs/zh_CN/faq_series/faq_2020_s1.md b/docs/zh_CN/faq_series/faq_2020_s1.md index 796c1fd2c3ae978c87b03d6f7ef007af4c72d6e0..e0f3c98c986947bf45266012d4c648fa2e4b3b08 100644 --- a/docs/zh_CN/faq_series/faq_2020_s1.md +++ b/docs/zh_CN/faq_series/faq_2020_s1.md @@ -128,8 +128,8 @@ ResNet系列模型中,相比于其他模型,ResNet_vd模型在预测速度 **A**: -* 对于单张图像的增广,可以参考[基于单张图片的数据增广脚本](../../../ppcls/data/imaug/operators.py),参考`ResizeImage`或者`CropImage`等数据算子的写法,创建一个新的类,然后在`__call__`中,实现对应的增广方法即可。 -* 对于一个batch图像的增广,可以参考[基于batch数据的数据增广脚本](../../../ppcls/data/imaug/batch_operators.py),参考`MixupOperator`或者`CutmixOperator`等数据算子的写法,创建一个新的类,然后在`__call__`中,实现对应的增广方法即可。 +* 对于单张图像的增广,可以参考[基于单张图片的数据增广脚本](../../../ppcls/data/preprocess/ops),参考`ResizeImage`或者`CropImage`等数据算子的写法,创建一个新的类,然后在`__call__`中,实现对应的增广方法即可。 +* 对于一个batch图像的增广,可以参考[基于batch数据的数据增广脚本](../../../ppcls/data/preprocess/batch_ops),参考`MixupOperator`或者`CutmixOperator`等数据算子的写法,创建一个新的类,然后在`__call__`中,实现对应的增广方法即可。 ## Q3.5: 怎么进一步加速模型训练过程呢? diff --git a/docs/zh_CN/models/DLA.md b/docs/zh_CN/models/DLA.md new file mode 100644 index 0000000000000000000000000000000000000000..a2f7210e47df0652aadef1a75396babdbe000f0a --- /dev/null +++ b/docs/zh_CN/models/DLA.md @@ -0,0 +1,21 @@ +# DLA系列 + +## 概述 + +DLA (Deep Layer Aggregation)。 视觉识别需要丰富的表示形式,其范围从低到高,范围从小到大,分辨率从精细到粗糙。即使卷积网络中的要素深度很深,仅靠隔离层还是不够的:将这些表示法进行复合和聚合可改善对内容和位置的推断。尽管已合并了残差连接以组合各层,但是这些连接本身是“浅”的,并且只能通过简单的一步操作来融合。作者通过更深层的聚合来增强标准体系结构,以更好地融合各层的信息。Deep Layer Aggregation 结构迭代地和分层地合并了特征层次结构,以使网络具有更高的准确性和更少的参数。跨体系结构和任务的实验表明,与现有的分支和合并方案相比,Deep Layer Aggregation 可提高识别和分辨率。[论文地址](https://arxiv.org/abs/1707.06484)。 + + +## 精度、FLOPS和参数量 + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:-----------------:|:----------:|:---------:|:---------:|:---------:| +| DLA34 | 15.8 | 3.1 | 76.03 | 92.98 | +| DLA46_c | 1.3 | 0.5 | 63.21 | 85.30 | +| DLA46x_c | 1.1 | 0.5 | 64.36 | 86.01 | +| DLA60 | 22.0 | 4.2 | 76.10 | 92.92 | +| DLA60x | 17.4 | 3.5 | 77.53 | 93.78 | +| DLA60x_c | 1.3 | 0.6 | 66.45 | 87.54 | +| DLA102 | 33.3 | 7.2 | 78.93 | 94.52 | +| DLA102x | 26.4 | 5.9 | 78.10 | 94.00 | +| DLA102x2 | 41.4 | 9.3 | 78.85 | 94.45 | +| DLA169 | 53.5 | 11.6 | 78.09 | 94.09 | \ No newline at end of file diff --git a/docs/zh_CN/models/HarDNet.md b/docs/zh_CN/models/HarDNet.md new file mode 100644 index 0000000000000000000000000000000000000000..5cb1d514ffb7a674bf7cbc62a397c2c46b51b738 --- /dev/null +++ b/docs/zh_CN/models/HarDNet.md @@ -0,0 +1,14 @@ +# HarDNet系列 + +## 概述 + +HarDNet(Harmonic DenseNet)是 2019 年由国立清华大学提出的一种全新的神经网络,在低 MAC 和内存流量的条件下实现了高效率。与 FC-DenseNet-103,DenseNet-264,ResNet-50,ResNet-152 和SSD-VGG 相比,新网络的推理时间减少了 35%,36%,30%,32% 和 45%。我们使用了包括Nvidia Profiler 和 ARM Scale-Sim 在内的工具来测量内存流量,并验证推理延迟确实与内存流量消耗成正比,并且所提议的网络消耗的内存流量很低。[论文地址](https://arxiv.org/abs/1909.00948)。 + +## 精度、FLOPS和参数量 + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:---------------------:|:----------:|:---------:|:---------:|:---------:| +| HarDNet68 | 17.6 | 4.3 | 75.46 | 92.65 | +| HarDNet85 | 36.7 | 9.1 | 77.44 | 93.55 | +| HarDNet39_ds | 3.5 | 0.4 | 71.33 | 89.98 | +| HarDNet68_ds | 4.2 | 0.8 | 73.62 | 91.52 | diff --git a/docs/zh_CN/models/LeViT.md b/docs/zh_CN/models/LeViT.md new file mode 100644 index 0000000000000000000000000000000000000000..19cdf289a0c9721422979c1d9673dd18c9a86ef0 --- /dev/null +++ b/docs/zh_CN/models/LeViT.md @@ -0,0 +1,17 @@ +# LeViT + +## 概述 +LeViT是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与Transformer体系更好的结合方式,并且提出了attention-based方法,用于整合Transformer中的位置信息编码。[论文地址](https://arxiv.org/abs/2104.01136)。 + +## 精度、FLOPS和参数量 + +| Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(M) | Params
(M) | +|:--:|:--:|:--:|:--:|:--:|:--:|:--:| +| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305 | 7.8 | +| LeViT-128 | 0.7810 | 0.9371 | 0.786 | 0.940 | 406 | 9.2 | +| LeViT-192 | 0.7934 | 0.9446 | 0.800 | 0.947 | 658 | 11 | +| LeViT-256 | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 | +| LeViT-384 | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 | + + +**注**:与Reference的精度差异源于数据预处理不同及未使用蒸馏的head作为输出。 diff --git a/docs/zh_CN/models/RedNet.md b/docs/zh_CN/models/RedNet.md new file mode 100644 index 0000000000000000000000000000000000000000..904ee8cd675add94f1497920ebb6d4d6591f74b2 --- /dev/null +++ b/docs/zh_CN/models/RedNet.md @@ -0,0 +1,16 @@ +# RedNet系列 + +## 概述 + +在 ResNet 的 Backbone 和 Backbone 的所有 Bottleneck 位置上使用 Involution 替换掉了卷积,但保留了所有的卷积用于通道映射和融合。这些精心重新设计的实体联合起来,形成了一种新的高效 Backbone 网络,称为 RedNet。[论文地址](https://arxiv.org/abs/2103.06255)。 + + +## 精度、FLOPS和参数量 + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:---------------------:|:----------:|:---------:|:---------:|:---------:| +| RedNet26 | 9.2 | 1.7 | 75.95 | 93.19 | +| RedNet38 | 12.4 | 2.2 | 77.47 | 93.56 | +| RedNet50 | 15.5 | 2.7 | 78.33 | 94.17 | +| RedNet101 | 25.7 | 4.7 | 78.94 | 94.36 | +| RedNet152 | 34.0 | 6.8 | 79.17 | 94.40 | \ No newline at end of file diff --git a/docs/zh_CN/models/TNT.md b/docs/zh_CN/models/TNT.md new file mode 100644 index 0000000000000000000000000000000000000000..13e86fd4c1dd5b9710ba3a81580d742de95915aa --- /dev/null +++ b/docs/zh_CN/models/TNT.md @@ -0,0 +1,12 @@ +# TNT系列 + +## 概述 + +TNT(Transformer-iN-Transformer)系列模型由华为诺亚于2021年提出,用于对 patch 级别和 pixel 级别的表示进行建模。在每个 TNT 块中,outer transformer block 用于处理 patch 嵌入,inner transformer block 从 pixel 嵌入中提取局部特征。通过线性变换层将 pixel 级特征投影到 patch 嵌入空间,然后加入到 patch 中。通过对 TNT 块的叠加,建立了用于图像识别的 TNT 模型。在ImageNet 基准测试和下游任务上的实验证明了该 TNT 体系结构的优越性和有效性。例如,在计算量相当的情况下 TNT 能在 ImageNet 上达到 81.3% 的 top-1 精度,比 DeiT 高 1.5%。[论文地址](https://arxiv.org/abs/2103.00112)。 + + +## 精度、FLOPS和参数量 + +| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) | +|:---------------------:|:----------:|:---------:|:---------:|:---------:| +| TNT_small | 23.8 | 5.2 | 81.21 | 95.63 | diff --git a/docs/zh_CN/models/Twins.md b/docs/zh_CN/models/Twins.md new file mode 100644 index 0000000000000000000000000000000000000000..424f3985df00216c048e026632c43f9e720f4542 --- /dev/null +++ b/docs/zh_CN/models/Twins.md @@ -0,0 +1,17 @@ +# Twins + +## 概述 +Twins网络包括Twins-PCPVT和Twins-SVT,其重点对空间注意力机制进行了精心设计,得到了简单却更为有效的方案。由于该体系结构仅涉及矩阵乘法,而目前的深度学习框架中对矩阵乘法有较高的优化程度,因此该体系结构十分高效且易于实现。并且,该体系结构在图像分类、目标检测和语义分割等多种下游视觉任务中都能够取得优异的性能。[论文地址](https://arxiv.org/abs/2104.13840)。 + +## 精度、FLOPS和参数量 + +| Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | +|:--:|:--:|:--:|:--:|:--:|:--:|:--:| +| pcpvt_small | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1 | +| pcpvt_base | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8 | +| pcpvt_large | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9 | +| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8 | 24 | +| alt_gvt_base | 0.8294 | 0.9621 | 0.832 | - | 8.3 | 56 | +| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2 | + +**注**:与Reference的精度差异源于数据预处理不同。 diff --git a/docs/zh_CN/tutorials/quick_start_recognition.md b/docs/zh_CN/tutorials/quick_start_recognition.md new file mode 100644 index 0000000000000000000000000000000000000000..f523088309ba883a8c5231db1858261099068ec5 --- /dev/null +++ b/docs/zh_CN/tutorials/quick_start_recognition.md @@ -0,0 +1,181 @@ +# 图像识别快速开始 + +图像识别主要包含3个部分:主体检测得到检测框、识别提取特征、根据特征进行检索。 + +## 1. 环境配置 + +* 请先参考[快速安装](./installation.md)配置PaddleClas运行环境。 + + +注意: + +**本部分内容需要在`deploy`文件夹下运行,在PaddleClas代码的根目录下,可以通过以下方法进入该文件夹** + +```shell +cd deploy +``` + + +## 2. inference 模型和数据下载 + +检测模型与4个方向(Logo、动漫人物、车辆、商品)的识别inference模型以及测试数据下载方法如下。。 + +| 模型简介 | 推荐场景 | 测试数据地址 | inference模型 | +| ------------ | ------------- | ------- | -------- | +| 通用主体检测模型 | 通用场景 | - |[下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | +| Logo识别模型 | Logo场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | +| 动漫人物识别模型 | 动漫人物场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/cartoon_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | +| 车辆细分类模型 | 车辆场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/vehicle_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | +| 商品识别模型 | 商品场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_Inshop_v1.0_infer.tar) | + + +**注意**:windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下 + + +* 下载并解压数据与模型 + +```shell +mkdir dataset +cd dataset +# 下载demo数据并解压 +wget {url/of/data} && tar -xf {name/of/data/package} +cd .. + +mkdir models +cd models +# 下载识别inference模型并解压 +wget {url/of/inference model} && tar -xf {name/of/inference model/package} +cd .. +``` + + +### 2.1 下载通用检测模型 + +```shell +mkdir models +cd models +# 下载通用检测inference模型并解压 +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar +cd .. +``` + + +### 2.1 Logo识别 + +以Logo识别demo为例,按照下面的命令下载demo数据与模型。 + +```shell +mkdir dataset +cd dataset +# 下载demo数据并解压 +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar +cd .. + +mkdir models +cd models +# 下载识别inference模型并解压 +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar +cd .. +``` + +解压完毕后,`dataset`文件夹下应有如下文件结构: + +``` +├── logo_demo_data_v1.0 +│ ├── data_file.txt +│ ├── gallery +│ ├── index +│ └── query +├── ... +``` + +`models`文件夹下应有如下文件结构: + +``` +├── logo_rec_ResNet50_Logo3K_v1.0_infer +│ ├── inference.pdiparams +│ ├── inference.pdiparams.info +│ └── inference.pdmodel +├── ppyolov2_r50vd_dcn_mainbody_v1.0_infer +│ ├── inference.pdiparams +│ ├── inference.pdiparams.info +│ └── inference.pdmodel +``` + +按照下面的方式可以完成对于图片的检索 + +```shell +python3.7 python/predict_system.py -c configs/inference_logo.yaml +``` + +配置文件中,部分关键字段解释如下 + +```yaml +Global: + infer_imgs: "./dataset/logo_demo_data_v1.0/query/" # 预测图像 + det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer/" # 检测inference模型文件夹 + rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/" # 识别inference模型文件夹 + batch_size: 1 # 预测的批大小 + image_shape: [3, 640, 640] # 检测的图像尺寸 + threshold: 0.5 # 检测的阈值,得分超过该阈值的检测框才会被检出 + max_det_results: 1 # 用于图像识别的检测框数量,符合阈值条件的检测框中,根据得分,最多对其中的max_det_results个检测框做后续的识别 + +# indexing engine config +IndexProcess: + index_path: "./dataset/logo_demo_data_v1.0/index/" # 索引文件夹,用于识别特征提取之后的索引 + search_budget: 100 + return_k: 5 # 从底库中反馈return_k个数量的最相似内容 + dist_type: "IP" +``` + + + +最终输出结果如下 + +``` +[{'bbox': [25, 21, 483, 382], 'rec_docs': ['AKG', 'AKG', 'AKG', 'AKG', 'AKG'], 'rec_scores': array([2.32288337, 2.31903863, 2.28398442, 2.16804123, 2.10190272])}] +``` + +其中bbox表示检测出的主体所在位置,rec_docs表示底库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。 + +如果希望预测文件夹内的图像,可以直接修改配置文件,也可以通过下面的`-o`参数修改对应的配置。 + +```shell +python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query" +``` + +如果希望在底库中新增图像,重新构建idnex,可以使用下面的命令重新构建index。 + +```shell +python3.7 python/build_gallery.py -c configs/build_logo.yaml +``` + +其中index相关配置如下。 + +```yaml +# indexing engine config +IndexProcess: + index_path: "./dataset/logo_demo_data_v1.0/index/" # 保存的索引地址 + image_root: "./dataset/logo_demo_data_v1.0/" # 图像的根目录 + data_file: "./dataset/logo_demo_data_v1.0/data_file.txt" # 图像的数据list文本,每一行包含图像的文件名与标签信息 + delimiter: "\t" + dist_type: "IP" + pq_size: 100 + embedding_size: 512 # 特征维度 +``` + +需要改动的内容为: +1. 在图像根目录下面添加对应的图像内容(也可以在其子文件夹下面,保证最终根目录与数据list文本中添加的文件名合并之后,图像存在即可) +2. 图像的数据list文本中添加图像新的内容,每行包含图像文件名以及对应的标签信息。 + + +### 2.2 其他任务的识别 + +如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测。 + + +| 场景 | 预测配置文件 | 构建底库的配置文件 | +| ---- | ----- | ----- | +| 动漫人物 | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) | +| 车辆 | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) | +| 商品 | [inference_inshop.yaml](../../../deploy/configs/) | [build_inshop.yaml](../../../deploy/configs/build_inshop.yaml) | diff --git a/ppcls/arch/__init__.py b/ppcls/arch/__init__.py index 18004a77df5ff28c0cb91146d2749501adffc9ac..23bcc630dfa0c22ea958b7c1999f5e7fc7056864 100644 --- a/ppcls/arch/__init__.py +++ b/ppcls/arch/__init__.py @@ -21,8 +21,9 @@ from . import backbone, gears from .backbone import * from .gears import build_gear from .utils import * +from ppcls.utils.save_load import load_dygraph_pretrain -__all__ = ["build_model", "RecModel"] +__all__ = ["build_model", "RecModel", "DistillationModel"] def build_model(config): @@ -62,3 +63,48 @@ class RecModel(nn.Layer): else: y = None return {"features": x, "logits": y} + + +class DistillationModel(nn.Layer): + def __init__(self, + models=None, + pretrained_list=None, + freeze_params_list=None, + **kargs): + super().__init__() + assert isinstance(models, list) + self.model_list = [] + self.model_name_list = [] + if pretrained_list is not None: + assert len(pretrained_list) == len(models) + + if freeze_params_list is None: + freeze_params_list = [False] * len(models) + assert len(freeze_params_list) == len(models) + for idx, model_config in enumerate(models): + assert len(model_config) == 1 + key = list(model_config.keys())[0] + model_config = model_config[key] + model_name = model_config.pop("name") + model = eval(model_name)(**model_config) + + if freeze_params_list[idx]: + for param in model.parameters(): + param.trainable = False + self.model_list.append(self.add_sublayer(key, model)) + self.model_name_list.append(key) + + if pretrained_list is not None: + for idx, pretrained in enumerate(pretrained_list): + if pretrained is not None: + load_dygraph_pretrain( + self.model_name_list[idx], path=pretrained) + + def forward(self, x, label=None): + result_dict = dict() + for idx, model_name in enumerate(self.model_name_list): + if label is None: + result_dict[model_name] = self.model_list[idx](x) + else: + result_dict[model_name] = self.model_list[idx](x, label) + return result_dict diff --git a/ppcls/arch/backbone/__init__.py b/ppcls/arch/backbone/__init__.py index de00c2a2a18f335ffa1ed043283a8ea6e5bfe80f..a519811bdb9c56b5f5610e592add4d9d4f6c52b9 100644 --- a/ppcls/arch/backbone/__init__.py +++ b/ppcls/arch/backbone/__init__.py @@ -1,4 +1,4 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,11 +19,12 @@ from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19 from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3 from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C -from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet18_vc, ResNet34_vc, ResNet50_vc, ResNet101_vc, ResNet152_vc +from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d -from ppcls.arch.backbone.model_zoo.res2net import Res2Net50_48w_2s, Res2Net50_26w_4s, Res2Net50_14w_8s, Res2Net50_48w_2s, Res2Net50_26w_6s, Res2Net50_26w_8s, Res2Net101_26w_4s, Res2Net152_26w_4s, Res2Net200_26w_4s -from ppcls.arch.backbone.model_zoo.res2net_vd import Res2Net50_vd_48w_2s, Res2Net50_vd_26w_4s, Res2Net50_vd_14w_8s, Res2Net50_vd_48w_2s, Res2Net50_vd_26w_6s, Res2Net50_vd_26w_8s, Res2Net101_vd_26w_4s, Res2Net152_vd_26w_4s, Res2Net200_vd_26w_4s -from ppcls.arch.backbone.model_zoo.se_resnet_vd import SE_ResNet18_vd, SE_ResNet34_vd, SE_ResNet50_vd, SE_ResNet101_vd, SE_ResNet152_vd, SE_ResNet200_vd +from ppcls.arch.backbone.model_zoo.resnext_vd import ResNeXt50_vd_32x4d, ResNeXt50_vd_64x4d, ResNeXt101_vd_32x4d, ResNeXt101_vd_64x4d, ResNeXt152_vd_32x4d, ResNeXt152_vd_64x4d +from ppcls.arch.backbone.model_zoo.res2net import Res2Net50_26w_4s, Res2Net50_14w_8s +from ppcls.arch.backbone.model_zoo.res2net_vd import Res2Net50_vd_26w_4s, Res2Net101_vd_26w_4s, Res2Net200_vd_26w_4s +from ppcls.arch.backbone.model_zoo.se_resnet_vd import SE_ResNet18_vd, SE_ResNet34_vd, SE_ResNet50_vd from ppcls.arch.backbone.model_zoo.se_resnext_vd import SE_ResNeXt50_vd_32x4d, SE_ResNeXt50_vd_32x4d, SENet154_vd from ppcls.arch.backbone.model_zoo.se_resnext import SE_ResNeXt50_32x4d, SE_ResNeXt101_32x4d, SE_ResNeXt152_64x4d from ppcls.arch.backbone.model_zoo.dpn import DPN68, DPN92, DPN98, DPN107, DPN131 @@ -33,10 +34,11 @@ from ppcls.arch.backbone.model_zoo.resnest import ResNeSt50_fast_1s1x64d, ResNeS from ppcls.arch.backbone.model_zoo.googlenet import GoogLeNet from ppcls.arch.backbone.model_zoo.mobilenet_v2 import MobileNetV2_x0_25, MobileNetV2_x0_5, MobileNetV2_x0_75, MobileNetV2, MobileNetV2_x1_5, MobileNetV2_x2_0 from ppcls.arch.backbone.model_zoo.shufflenet_v2 import ShuffleNetV2_x0_25, ShuffleNetV2_x0_33, ShuffleNetV2_x0_5, ShuffleNetV2_x1_0, ShuffleNetV2_x1_5, ShuffleNetV2_x2_0, ShuffleNetV2_swish +from ppcls.arch.backbone.model_zoo.ghostnet import GhostNet_x0_5, GhostNet_x1_0, GhostNet_x1_3 from ppcls.arch.backbone.model_zoo.alexnet import AlexNet from ppcls.arch.backbone.model_zoo.inception_v4 import InceptionV4 from ppcls.arch.backbone.model_zoo.xception import Xception41, Xception65, Xception71 -from ppcls.arch.backbone.model_zoo.xception_deeplab import Xception41_deeplab, Xception65_deeplab, Xception71_deeplab +from ppcls.arch.backbone.model_zoo.xception_deeplab import Xception41_deeplab, Xception65_deeplab from ppcls.arch.backbone.model_zoo.resnext101_wsl import ResNeXt101_32x8d_wsl, ResNeXt101_32x16d_wsl, ResNeXt101_32x32d_wsl, ResNeXt101_32x48d_wsl from ppcls.arch.backbone.model_zoo.squeezenet import SqueezeNet1_0, SqueezeNet1_1 from ppcls.arch.backbone.model_zoo.darknet import DarkNet53 @@ -47,4 +49,10 @@ from ppcls.arch.backbone.model_zoo.distillation_models import ResNet50_vd_distil from ppcls.arch.backbone.model_zoo.swin_transformer import SwinTransformer_tiny_patch4_window7_224, SwinTransformer_small_patch4_window7_224, SwinTransformer_base_patch4_window7_224, SwinTransformer_base_patch4_window12_384, SwinTransformer_large_patch4_window7_224, SwinTransformer_large_patch4_window12_384 from ppcls.arch.backbone.model_zoo.mixnet import MixNet_S, MixNet_M, MixNet_L from ppcls.arch.backbone.model_zoo.rexnet import ReXNet_1_0, ReXNet_1_3, ReXNet_1_5, ReXNet_2_0, ReXNet_3_0 +from ppcls.arch.backbone.model_zoo.gvt import pcpvt_small, pcpvt_base, pcpvt_large, alt_gvt_small, alt_gvt_base, alt_gvt_large +from ppcls.arch.backbone.model_zoo.levit import LeViT_128S, LeViT_128, LeViT_192, LeViT_256, LeViT_384 +from ppcls.arch.backbone.model_zoo.dla import DLA34, DLA46_c, DLA46x_c, DLA60, DLA60x, DLA60x_c, DLA102, DLA102x, DLA102x2, DLA169 +from ppcls.arch.backbone.model_zoo.rednet import RedNet26, RedNet38, RedNet50, RedNet101, RedNet152 +from ppcls.arch.backbone.model_zoo.tnt import TNT_small +from ppcls.arch.backbone.model_zoo.hardnet import HarDNet68, HarDNet85, HarDNet39_ds, HarDNet68_ds from ppcls.arch.backbone.variant_models.resnet_variant import ResNet50_last_stage_stride1 diff --git a/ppcls/arch/backbone/model_zoo/alexnet.py b/ppcls/arch/backbone/model_zoo/alexnet.py index a9155ca4a650f9412bfd983acbc20a8d741fdd9a..3e1d1aa526565b6b18b0e192128961ceaf053074 100644 --- a/ppcls/arch/backbone/model_zoo/alexnet.py +++ b/ppcls/arch/backbone/model_zoo/alexnet.py @@ -1,3 +1,17 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import paddle from paddle import ParamAttr import paddle.nn as nn @@ -7,8 +21,11 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform import math -__all__ = ["AlexNet"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"AlexNet": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams"} +__all__ = list(MODEL_URLS.keys()) class ConvPoolLayer(nn.Layer): def __init__(self, @@ -126,7 +143,19 @@ class AlexNetDY(nn.Layer): x = self._fc8(x) return x +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) -def AlexNet(**args): - model = AlexNetDY(**args) +def AlexNet(pretrained=False, use_ssld=False, **kwargs): + model = AlexNetDY(**kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["AlexNet"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/darknet.py b/ppcls/arch/backbone/model_zoo/darknet.py index 5aef16d7ccf81483b458ba63c68938ed1531e9f2..16b4b8600866d2bf78e45a1e1d3c4747a398c346 100644 --- a/ppcls/arch/backbone/model_zoo/darknet.py +++ b/ppcls/arch/backbone/model_zoo/darknet.py @@ -1,3 +1,17 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import paddle from paddle import ParamAttr import paddle.nn as nn @@ -7,8 +21,11 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform import math -__all__ = ["DarkNet53"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"DarkNet53": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams"} +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): def __init__(self, @@ -155,7 +172,19 @@ class DarkNet(nn.Layer): x = self._out(x) return x - -def DarkNet53(**args): - model = DarkNet(**args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + +def DarkNet53(pretrained=False, use_ssld=False, **kwargs): + model = DarkNet(**kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DarkNet53"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/densenet.py b/ppcls/arch/backbone/model_zoo/densenet.py index df58f3551b3907b6896e3ba7c4d46c89edbe1c26..190959b80827abe86aa413afc83cc6af41eba5e9 100644 --- a/ppcls/arch/backbone/model_zoo/densenet.py +++ b/ppcls/arch/backbone/model_zoo/densenet.py @@ -1,4 +1,4 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -26,9 +26,16 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "DenseNet121", "DenseNet161", "DenseNet169", "DenseNet201", "DenseNet264" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"DenseNet121": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams", + "DenseNet161": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams", + "DenseNet169": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams", + "DenseNet201": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams", + "DenseNet264": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class BNACConvLayer(nn.Layer): @@ -282,27 +289,43 @@ class DenseNet(nn.Layer): y = self.out(y) return y - -def DenseNet121(**args): - model = DenseNet(layers=121, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + +def DenseNet121(pretrained=False, use_ssld=False, **kwargs): + model = DenseNet(layers=121, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DenseNet121"], use_ssld=use_ssld) return model -def DenseNet161(**args): - model = DenseNet(layers=161, **args) +def DenseNet161(pretrained=False, use_ssld=False, **kwargs): + model = DenseNet(layers=161, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DenseNet161"], use_ssld=use_ssld) return model -def DenseNet169(**args): - model = DenseNet(layers=169, **args) +def DenseNet169(pretrained=False, use_ssld=False, **kwargs): + model = DenseNet(layers=169, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DenseNet169"], use_ssld=use_ssld) return model -def DenseNet201(**args): - model = DenseNet(layers=201, **args) +def DenseNet201(pretrained=False, use_ssld=False, **kwargs): + model = DenseNet(layers=201, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DenseNet201"], use_ssld=use_ssld) return model -def DenseNet264(**args): - model = DenseNet(layers=264, **args) +def DenseNet264(pretrained=False, use_ssld=False, **kwargs): + model = DenseNet(layers=264, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DenseNet264"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/distilled_vision_transformer.py b/ppcls/arch/backbone/model_zoo/distilled_vision_transformer.py index 48fd25050629aab56b7ec59aeef06f77c0da7bea..b7c36192c1043a7a81fb7034b98636bea4c8cda8 100644 --- a/ppcls/arch/backbone/model_zoo/distilled_vision_transformer.py +++ b/ppcls/arch/backbone/model_zoo/distilled_vision_transformer.py @@ -16,12 +16,20 @@ import paddle import paddle.nn as nn from .vision_transformer import VisionTransformer, Identity, trunc_normal_, zeros_ -__all__ = [ - 'DeiT_tiny_patch16_224', 'DeiT_small_patch16_224', 'DeiT_base_patch16_224', - 'DeiT_tiny_distilled_patch16_224', 'DeiT_small_distilled_patch16_224', - 'DeiT_base_distilled_patch16_224', 'DeiT_base_patch16_384', - 'DeiT_base_distilled_patch16_384' -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "DeiT_tiny_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams", + "DeiT_small_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams", + "DeiT_base_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams", + "DeiT_tiny_distilled_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams", + "DeiT_small_distilled_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams", + "DeiT_base_distilled_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams", + "DeiT_base_patch16_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams", + "DeiT_base_distilled_patch16_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class DistilledVisionTransformer(VisionTransformer): @@ -90,7 +98,20 @@ class DistilledVisionTransformer(VisionTransformer): return (x + x_dist) / 2 -def DeiT_tiny_patch16_224(**kwargs): +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def DeiT_tiny_patch16_224(pretrained=False, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=192, @@ -100,10 +121,11 @@ def DeiT_tiny_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_tiny_patch16_224"], use_ssld=use_ssld) return model -def DeiT_small_patch16_224(**kwargs): +def DeiT_small_patch16_224(pretrained=False, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=384, @@ -113,10 +135,11 @@ def DeiT_small_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_small_patch16_224"], use_ssld=use_ssld) return model -def DeiT_base_patch16_224(**kwargs): +def DeiT_base_patch16_224(pretrained=False, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=768, @@ -126,10 +149,11 @@ def DeiT_base_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_base_patch16_224"], use_ssld=use_ssld) return model -def DeiT_tiny_distilled_patch16_224(**kwargs): +def DeiT_tiny_distilled_patch16_224(pretrained=False, use_ssld=False, **kwargs): model = DistilledVisionTransformer( patch_size=16, embed_dim=192, @@ -139,10 +163,11 @@ def DeiT_tiny_distilled_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_tiny_distilled_patch16_224"], use_ssld=use_ssld) return model -def DeiT_small_distilled_patch16_224(**kwargs): +def DeiT_small_distilled_patch16_224(pretrained=False, use_ssld=False, **kwargs): model = DistilledVisionTransformer( patch_size=16, embed_dim=384, @@ -152,10 +177,11 @@ def DeiT_small_distilled_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_small_distilled_patch16_224"], use_ssld=use_ssld) return model -def DeiT_base_distilled_patch16_224(**kwargs): +def DeiT_base_distilled_patch16_224(pretrained=False, use_ssld=False, **kwargs): model = DistilledVisionTransformer( patch_size=16, embed_dim=768, @@ -165,10 +191,11 @@ def DeiT_base_distilled_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_base_distilled_patch16_224"], use_ssld=use_ssld) return model -def DeiT_base_patch16_384(**kwargs): +def DeiT_base_patch16_384(pretrained=False, use_ssld=False, **kwargs): model = VisionTransformer( img_size=384, patch_size=16, @@ -179,10 +206,11 @@ def DeiT_base_patch16_384(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_base_patch16_384"], use_ssld=use_ssld) return model -def DeiT_base_distilled_patch16_384(**kwargs): +def DeiT_base_distilled_patch16_384(pretrained=False, use_ssld=False, **kwargs): model = DistilledVisionTransformer( img_size=384, patch_size=16, @@ -193,4 +221,5 @@ def DeiT_base_distilled_patch16_384(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DeiT_base_distilled_patch16_384"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/dla.py b/ppcls/arch/backbone/model_zoo/dla.py new file mode 100644 index 0000000000000000000000000000000000000000..51151710ebb48c7fdd625e3a2b5d9086b1d57f9e --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/dla.py @@ -0,0 +1,465 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddle.nn.initializer import Normal, Constant + +from ppcls.arch.backbone.base.theseus_layer import Identity +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + + +MODEL_URLS = { + "DLA34": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA34_pretrained.pdparams", + "DLA46_c": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA46_c_pretrained.pdparams", + "DLA46x_c": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA46x_c_pretrained.pdparams", + "DLA60": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60_pretrained.pdparams", + "DLA60x": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60x_pretrained.pdparams", + "DLA60x_c": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60x_c_pretrained.pdparams", + "DLA102": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102_pretrained.pdparams", + "DLA102x": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102x_pretrained.pdparams", + "DLA102x2": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102x2_pretrained.pdparams", + "DLA169": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA169_pretrained.pdparams" +} + + +__all__ = MODEL_URLS.keys() + + +zeros_ = Constant(value=0.) +ones_ = Constant(value=1.) + + +class DlaBasic(nn.Layer): + def __init__(self, inplanes, planes, stride=1, dilation=1, **cargs): + super(DlaBasic, self).__init__() + self.conv1 = nn.Conv2D( + inplanes, planes, kernel_size=3, stride=stride, + padding=dilation, bias_attr=False, dilation=dilation + ) + self.bn1 = nn.BatchNorm2D(planes) + self.relu = nn.ReLU() + self.conv2 = nn.Conv2D( + planes, planes, kernel_size=3, stride=1, + padding=dilation, bias_attr=False, dilation=dilation + ) + self.bn2 = nn.BatchNorm2D(planes) + self.stride = stride + + def forward(self, x, residual=None): + if residual is None: + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + + out += residual + out = self.relu(out) + + return out + + +class DlaBottleneck(nn.Layer): + expansion = 2 + + def __init__(self, inplanes, outplanes, stride=1, + dilation=1, cardinality=1, base_width=64): + super(DlaBottleneck, self).__init__() + self.stride = stride + mid_planes = int(math.floor( + outplanes * (base_width / 64)) * cardinality) + mid_planes = mid_planes // self.expansion + + self.conv1 = nn.Conv2D(inplanes, mid_planes, kernel_size=1, bias_attr=False) + self.bn1 = nn.BatchNorm2D(mid_planes) + self.conv2 = nn.Conv2D( + mid_planes, mid_planes, kernel_size=3, + stride=stride, padding=dilation, bias_attr=False, + dilation=dilation, groups=cardinality + ) + self.bn2 = nn.BatchNorm2D(mid_planes) + self.conv3 = nn.Conv2D(mid_planes, outplanes, kernel_size=1, bias_attr=False) + self.bn3 = nn.BatchNorm2D(outplanes) + self.relu = nn.ReLU() + + def forward(self, x, residual=None): + if residual is None: + residual = x + + out = self.conv1(x) + out = self.bn1(out) + out = self.relu(out) + + out = self.conv2(out) + out = self.bn2(out) + out = self.relu(out) + + out = self.conv3(out) + out = self.bn3(out) + + out += residual + out = self.relu(out) + + return out + + +class DlaRoot(nn.Layer): + def __init__(self, in_channels, out_channels, kernel_size, residual): + super(DlaRoot, self).__init__() + self.conv = nn.Conv2D( + in_channels, out_channels, 1, stride=1, + bias_attr=False, padding=(kernel_size - 1) // 2 + ) + self.bn = nn.BatchNorm2D(out_channels) + self.relu = nn.ReLU() + self.residual = residual + + def forward(self, *x): + children = x + x = self.conv(paddle.concat(x, 1)) + x = self.bn(x) + if self.residual: + x += children[0] + x = self.relu(x) + + return x + + +class DlaTree(nn.Layer): + def __init__(self, levels, block, in_channels, out_channels, + stride=1,dilation=1, cardinality=1, base_width=64, + level_root=False, root_dim=0, root_kernel_size=1, + root_residual=False): + super(DlaTree, self).__init__() + if root_dim == 0: + root_dim = 2 * out_channels + if level_root: + root_dim += in_channels + + self.downsample = nn.MaxPool2D( + stride, stride=stride) if stride > 1 else Identity() + self.project = Identity() + cargs = dict(dilation=dilation, cardinality=cardinality, base_width=base_width) + + if levels == 1: + self.tree1 = block(in_channels, out_channels, stride, **cargs) + self.tree2 = block(out_channels, out_channels, 1, **cargs) + if in_channels != out_channels: + self.project = nn.Sequential( + nn.Conv2D(in_channels, out_channels, kernel_size=1, stride=1, bias_attr=False), + nn.BatchNorm2D(out_channels)) + else: + cargs.update(dict(root_kernel_size=root_kernel_size, root_residual=root_residual)) + self.tree1 = DlaTree( + levels - 1, block, in_channels, + out_channels, stride, root_dim=0, **cargs + ) + self.tree2 = DlaTree( + levels - 1, block, out_channels, + out_channels, root_dim=root_dim + out_channels, **cargs + ) + + if levels == 1: + self.root = DlaRoot(root_dim, out_channels, root_kernel_size, root_residual) + + self.level_root = level_root + self.root_dim = root_dim + self.levels = levels + + def forward(self, x, residual=None, children=None): + children = [] if children is None else children + bottom = self.downsample(x) + residual = self.project(bottom) + + if self.level_root: + children.append(bottom) + x1 = self.tree1(x, residual) + + if self.levels == 1: + x2 = self.tree2(x1) + x = self.root(x2, x1, *children) + else: + children.append(x1) + x = self.tree2(x1, children=children) + return x + + +class DLA(nn.Layer): + def __init__(self, levels, channels, in_chans=3, cardinality=1, + base_width=64, block=DlaBottleneck, residual_root=False, + drop_rate=0.0, class_dim=1000, with_pool=True): + super(DLA, self).__init__() + self.channels = channels + self.class_dim = class_dim + self.with_pool = with_pool + self.cardinality = cardinality + self.base_width = base_width + self.drop_rate = drop_rate + + self.base_layer = nn.Sequential( + nn.Conv2D( + in_chans, channels[0], kernel_size=7, + stride=1, padding=3, bias_attr=False + ), + nn.BatchNorm2D(channels[0]), + nn.ReLU()) + + self.level0 = self._make_conv_level(channels[0], channels[0], levels[0]) + self.level1 = self._make_conv_level(channels[0], channels[1], levels[1], stride=2) + + cargs = dict( + cardinality=cardinality, + base_width=base_width, + root_residual=residual_root + ) + + self.level2 = DlaTree( + levels[2], block, channels[1], + channels[2], 2, level_root=False, **cargs + ) + self.level3 = DlaTree( + levels[3], block, channels[2], + channels[3], 2, level_root=True, **cargs + ) + self.level4 = DlaTree( + levels[4], block, channels[3], + channels[4], 2, level_root=True, **cargs + ) + self.level5 = DlaTree( + levels[5], block, channels[4], + channels[5], 2, level_root=True, **cargs + ) + + self.feature_info = [ + # rare to have a meaningful stride 1 level + dict(num_chs=channels[0], reduction=1, module='level0'), + dict(num_chs=channels[1], reduction=2, module='level1'), + dict(num_chs=channels[2], reduction=4, module='level2'), + dict(num_chs=channels[3], reduction=8, module='level3'), + dict(num_chs=channels[4], reduction=16, module='level4'), + dict(num_chs=channels[5], reduction=32, module='level5'), + ] + + self.num_features = channels[-1] + + if with_pool: + self.global_pool = nn.AdaptiveAvgPool2D(1) + + if class_dim > 0: + self.fc = nn.Conv2D(self.num_features, class_dim, 1) + + for m in self.sublayers(): + if isinstance(m, nn.Conv2D): + n = m._kernel_size[0] * m._kernel_size[1] * m._out_channels + normal_ = Normal(mean=0.0, std=math.sqrt(2. / n)) + normal_(m.weight) + elif isinstance(m, nn.BatchNorm2D): + ones_(m.weight) + zeros_(m.bias) + + def _make_conv_level(self, inplanes, planes, convs, stride=1, dilation=1): + modules = [] + for i in range(convs): + modules.extend([ + nn.Conv2D( + inplanes, planes, kernel_size=3, + stride=stride if i == 0 else 1, + padding=dilation, bias_attr=False, dilation=dilation + ), + nn.BatchNorm2D(planes), + nn.ReLU()]) + inplanes = planes + return nn.Sequential(*modules) + + def forward_features(self, x): + x = self.base_layer(x) + + x = self.level0(x) + x = self.level1(x) + x = self.level2(x) + x = self.level3(x) + x = self.level4(x) + x = self.level5(x) + + return x + + def forward(self, x): + x = self.forward_features(x) + + if self.with_pool: + x = self.global_pool(x) + + if self.drop_rate > 0.: + x = F.dropout(x, p=self.drop_rate, training=self.training) + + if self.class_dim > 0: + x = self.fc(x) + x = x.flatten(1) + + return x + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def DLA34(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 2, 2, 1), + channels=(16, 32, 64, 128, 256, 512), + block=DlaBasic, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA34"]) + return model + + +def DLA46_c(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 2, 2, 1), + channels=(16, 32, 64, 64, 128, 256), + block=DlaBottleneck, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA46_c"]) + return model + + +def DLA46x_c(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 2, 2, 1), + channels=(16, 32, 64, 64, 128, 256), + block=DlaBottleneck, + cardinality=32, + base_width=4, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA46x_c"]) + return model + + +def DLA60(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 2, 3, 1), + channels=(16, 32, 128, 256, 512, 1024), + block=DlaBottleneck, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA60"]) + return model + + +def DLA60x(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 2, 3, 1), + channels=(16, 32, 128, 256, 512, 1024), + block=DlaBottleneck, + cardinality=32, + base_width=4, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA60x"]) + return model + + +def DLA60x_c(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 2, 3, 1), + channels=(16, 32, 64, 64, 128, 256), + block=DlaBottleneck, + cardinality=32, + base_width=4, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA60x_c"]) + return model + + +def DLA102(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 3, 4, 1), + channels=(16, 32, 128, 256, 512, 1024), + block=DlaBottleneck, + residual_root=True, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA102"]) + return model + + +def DLA102x(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 3, 4, 1), + channels=(16, 32, 128, 256, 512, 1024), + block=DlaBottleneck, + cardinality=32, + base_width=4, + residual_root=True, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA102x"]) + return model + + +def DLA102x2(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 1, 3, 4, 1), + channels=(16, 32, 128, 256, 512, 1024), + block=DlaBottleneck, + cardinality=64, + base_width=4, + residual_root=True, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA102x2"]) + return model + + +def DLA169(pretrained=False, **kwargs): + model = DLA( + levels=(1, 1, 2, 3, 5, 1), + channels=(16, 32, 128, 256, 512, 1024), + block=DlaBottleneck, + residual_root=True, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["DLA169"]) + return model diff --git a/ppcls/arch/backbone/model_zoo/dpn.py b/ppcls/arch/backbone/model_zoo/dpn.py index becdc8ca60d9f49b3528c765ac30e65051ab0603..7741eb7ce7fdc9b3fcb58d60fbacd9e1dcd86e24 100644 --- a/ppcls/arch/backbone/model_zoo/dpn.py +++ b/ppcls/arch/backbone/model_zoo/dpn.py @@ -27,14 +27,16 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "DPN", - "DPN68", - "DPN92", - "DPN98", - "DPN107", - "DPN131", -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"DPN68": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams", + "DPN92": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams", + "DPN98": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams", + "DPN107": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams", + "DPN131": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -398,28 +400,45 @@ class DPN(nn.Layer): net_arg['init_padding'] = init_padding return net_arg - - -def DPN68(**args): - model = DPN(layers=68, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def DPN68(pretrained=False, use_ssld=False, **kwargs): + model = DPN(layers=68, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DPN68"]) return model -def DPN92(**args): - model = DPN(layers=92, **args) +def DPN92(pretrained=False, use_ssld=False, **kwargs): + model = DPN(layers=92, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DPN92"]) return model -def DPN98(**args): - model = DPN(layers=98, **args) +def DPN98(pretrained=False, use_ssld=False, **kwargs): + model = DPN(layers=98, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DPN98"]) return model -def DPN107(**args): - model = DPN(layers=107, **args) +def DPN107(pretrained=False, use_ssld=False, **kwargs): + model = DPN(layers=107, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DPN107"]) return model -def DPN131(**args): - model = DPN(layers=131, **args) - return model +def DPN131(pretrained=False, use_ssld=False, **kwargs): + model = DPN(layers=131, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["DPN131"]) + return model \ No newline at end of file diff --git a/ppcls/arch/backbone/model_zoo/efficientnet.py b/ppcls/arch/backbone/model_zoo/efficientnet.py index dd7ef86e1588ad746534be71e473067671c9783d..de2d5245902d0f4e96a0f9ed72d5a112df125704 100644 --- a/ppcls/arch/backbone/model_zoo/efficientnet.py +++ b/ppcls/arch/backbone/model_zoo/efficientnet.py @@ -9,11 +9,20 @@ import collections import re import copy -__all__ = [ - 'EfficientNet', 'EfficientNetB0_small', 'EfficientNetB0', 'EfficientNetB1', - 'EfficientNetB2', 'EfficientNetB3', 'EfficientNetB4', 'EfficientNetB5', - 'EfficientNetB6', 'EfficientNetB7' -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"EfficientNetB0_small": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams", + "EfficientNetB0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams", + "EfficientNetB1": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams", + "EfficientNetB2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams", + "EfficientNetB3": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams", + "EfficientNetB4": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams", + "EfficientNetB5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams", + "EfficientNetB6": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams", + "EfficientNetB7": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) GlobalParams = collections.namedtuple('GlobalParams', [ 'batch_norm_momentum', @@ -783,119 +792,159 @@ class EfficientNet(nn.Layer): x = self._fc(x) return x + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + def EfficientNetB0_small(padding_type='DYNAMIC', override_params=None, use_se=False, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b0', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB0_small"]) return model def EfficientNetB0(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b0', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB0"]) return model def EfficientNetB1(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b1', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB1"]) return model def EfficientNetB2(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b2', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB2"]) return model def EfficientNetB3(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b3', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB3"]) return model def EfficientNetB4(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b4', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB4"]) return model def EfficientNetB5(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b5', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB5"]) return model def EfficientNetB6(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b6', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB6"]) return model def EfficientNetB7(padding_type='SAME', override_params=None, use_se=True, - **args): + pretrained=False, + use_ssld=False, + **kwargs): model = EfficientNet( name='b7', padding_type=padding_type, override_params=override_params, use_se=use_se, - **args) - return model + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["EfficientNetB7"]) + return model \ No newline at end of file diff --git a/ppcls/arch/backbone/model_zoo/ghostnet.py b/ppcls/arch/backbone/model_zoo/ghostnet.py index 0a47bc274ef794baf94038a374bd010f13eefde4..e557e0f9fa13929fd686ae3036010d066811b946 100644 --- a/ppcls/arch/backbone/model_zoo/ghostnet.py +++ b/ppcls/arch/backbone/model_zoo/ghostnet.py @@ -1,4 +1,4 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,14 @@ from paddle.nn import Conv2D, BatchNorm, AdaptiveAvgPool2D, Linear from paddle.regularizer import L2Decay from paddle.nn.initializer import Uniform, KaimingNormal -__all__ = ["GhostNet_x0_5", "GhostNet_x1_0", "GhostNet_x1_3"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"GhostNet_x0_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams", + "GhostNet_x1_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams", + "GhostNet_x1_3": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -315,17 +322,33 @@ class GhostNet(nn.Layer): new_v += divisor return new_v + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + -def GhostNet_x0_5(**args): - model = GhostNet(scale=0.5) +def GhostNet_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = GhostNet(scale=0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["GhostNet_x0_5"], use_ssld=use_ssld) return model -def GhostNet_x1_0(**args): - model = GhostNet(scale=1.0) +def GhostNet_x1_0(pretrained=False, use_ssld=False, **kwargs): + model = GhostNet(scale=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["GhostNet_x1_0"], use_ssld=use_ssld) return model -def GhostNet_x1_3(**args): - model = GhostNet(scale=1.3) +def GhostNet_x1_3(pretrained=False, use_ssld=False, **kwargs): + model = GhostNet(scale=1.3, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["GhostNet_x1_3"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/googlenet.py b/ppcls/arch/backbone/model_zoo/googlenet.py index 534c6ff01515b01bdf58b1aa8a32f8168f3b87e2..7ef35a9649fdbadf0a7c3c39dfec25376ff94e7b 100644 --- a/ppcls/arch/backbone/model_zoo/googlenet.py +++ b/ppcls/arch/backbone/model_zoo/googlenet.py @@ -8,7 +8,12 @@ from paddle.nn.initializer import Uniform import math -__all__ = ['GoogLeNet'] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"GoogLeNet": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) def xavier(channels, filter_size, name): @@ -200,8 +205,22 @@ class GoogLeNetDY(nn.Layer): x = self._drop_o2(x) out2 = self._out2(x) return [out, out1, out2] - - -def GoogLeNet(**args): - model = GoogLeNetDY(**args) + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def GoogLeNet(pretrained=False, use_ssld=False, **kwargs): + model = GoogLeNetDY(**kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["GoogLeNet"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/gvt.py b/ppcls/arch/backbone/model_zoo/gvt.py new file mode 100644 index 0000000000000000000000000000000000000000..659be4964cac2136a4778195fdbb27d725123c58 --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/gvt.py @@ -0,0 +1,680 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from functools import partial + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.regularizer import L2Decay + +from .vision_transformer import trunc_normal_, normal_, zeros_, ones_, to_2tuple, DropPath, Identity, Mlp +from .vision_transformer import Block as ViTBlock + +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "pcpvt_small": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams", + "pcpvt_base": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams", + "pcpvt_large": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams", + "alt_gvt_small": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams", + "alt_gvt_base": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams", + "alt_gvt_large": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams" + } + +__all__ = list(MODEL_URLS.keys()) + + + +class GroupAttention(nn.Layer): + """LSA: self attention within a group. + """ + + def __init__(self, + dim, + num_heads=8, + qkv_bias=False, + qk_scale=None, + attn_drop=0., + proj_drop=0., + ws=1): + super().__init__() + if ws == 1: + raise Exception(f"ws {ws} should not be 1") + if dim % num_heads != 0: + raise Exception( + f"dim {dim} should be divided by num_heads {num_heads}.") + + self.dim = dim + self.num_heads = num_heads + head_dim = dim // num_heads + self.scale = qk_scale or head_dim**-0.5 + + self.qkv = nn.Linear(dim, dim * 3, bias_attr=qkv_bias) + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(dim, dim) + self.proj_drop = nn.Dropout(proj_drop) + self.ws = ws + + def forward(self, x, H, W): + B, N, C = x.shape + h_group, w_group = H // self.ws, W // self.ws + total_groups = h_group * w_group + x = x.reshape([B, h_group, self.ws, w_group, self.ws, C]).transpose( + [0, 1, 3, 2, 4, 5]) + qkv = self.qkv(x).reshape( + [B, total_groups, -1, 3, self.num_heads, + C // self.num_heads]).transpose([3, 0, 1, 4, 2, 5]) + q, k, v = qkv[0], qkv[1], qkv[2] + attn = (q @k.transpose([0, 1, 2, 4, 3])) * self.scale + + attn = nn.Softmax(axis=-1)(attn) + attn = self.attn_drop(attn) + attn = (attn @v).transpose([0, 1, 3, 2, 4]).reshape( + [B, h_group, w_group, self.ws, self.ws, C]) + + x = attn.transpose([0, 1, 3, 2, 4, 5]).reshape([B, N, C]) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class Attention(nn.Layer): + """GSA: using a key to summarize the information for a group to be efficient. + """ + + def __init__(self, + dim, + num_heads=8, + qkv_bias=False, + qk_scale=None, + attn_drop=0., + proj_drop=0., + sr_ratio=1): + super().__init__() + assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}." + + self.dim = dim + self.num_heads = num_heads + head_dim = dim // num_heads + self.scale = qk_scale or head_dim**-0.5 + + self.q = nn.Linear(dim, dim, bias_attr=qkv_bias) + self.kv = nn.Linear(dim, dim * 2, bias_attr=qkv_bias) + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(dim, dim) + self.proj_drop = nn.Dropout(proj_drop) + + self.sr_ratio = sr_ratio + if sr_ratio > 1: + self.sr = nn.Conv2D( + dim, dim, kernel_size=sr_ratio, stride=sr_ratio) + self.norm = nn.LayerNorm(dim) + + def forward(self, x, H, W): + B, N, C = x.shape + q = self.q(x).reshape( + [B, N, self.num_heads, C // self.num_heads]).transpose( + [0, 2, 1, 3]) + + if self.sr_ratio > 1: + x_ = x.transpose([0, 2, 1]).reshape([B, C, H, W]) + x_ = self.sr(x_).reshape([B, C, -1]).transpose([0, 2, 1]) + x_ = self.norm(x_) + kv = self.kv(x_).reshape( + [B, -1, 2, self.num_heads, C // self.num_heads]).transpose( + [2, 0, 3, 1, 4]) + else: + kv = self.kv(x).reshape( + [B, -1, 2, self.num_heads, C // self.num_heads]).transpose( + [2, 0, 3, 1, 4]) + k, v = kv[0], kv[1] + + attn = (q @k.transpose([0, 1, 3, 2])) * self.scale + attn = nn.Softmax(axis=-1)(attn) + attn = self.attn_drop(attn) + + x = (attn @v).transpose([0, 2, 1, 3]).reshape([B, N, C]) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class Block(nn.Layer): + def __init__(self, + dim, + num_heads, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + act_layer=nn.GELU, + norm_layer=nn.LayerNorm, + sr_ratio=1): + super().__init__() + self.norm1 = norm_layer(dim) + self.attn = Attention( + dim, + num_heads=num_heads, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + attn_drop=attn_drop, + proj_drop=drop, + sr_ratio=sr_ratio) + self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity() + self.norm2 = norm_layer(dim) + mlp_hidden_dim = int(dim * mlp_ratio) + self.mlp = Mlp(in_features=dim, + hidden_features=mlp_hidden_dim, + act_layer=act_layer, + drop=drop) + + def forward(self, x, H, W): + x = x + self.drop_path(self.attn(self.norm1(x), H, W)) + x = x + self.drop_path(self.mlp(self.norm2(x))) + return x + + +class SBlock(ViTBlock): + def __init__(self, + dim, + num_heads, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + act_layer=nn.GELU, + norm_layer=nn.LayerNorm, + sr_ratio=1): + super().__init__(dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, + attn_drop, drop_path, act_layer, norm_layer) + + def forward(self, x, H, W): + return super().forward(x) + + +class GroupBlock(ViTBlock): + def __init__(self, + dim, + num_heads, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + act_layer=nn.GELU, + norm_layer=nn.LayerNorm, + sr_ratio=1, + ws=1): + super().__init__(dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, + attn_drop, drop_path, act_layer, norm_layer) + del self.attn + if ws == 1: + self.attn = Attention(dim, num_heads, qkv_bias, qk_scale, + attn_drop, drop, sr_ratio) + else: + self.attn = GroupAttention(dim, num_heads, qkv_bias, qk_scale, + attn_drop, drop, ws) + + def forward(self, x, H, W): + x = x + self.drop_path(self.attn(self.norm1(x), H, W)) + x = x + self.drop_path(self.mlp(self.norm2(x))) + return x + + +class PatchEmbed(nn.Layer): + """ Image to Patch Embedding. + """ + + def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768): + super().__init__() + if img_size % patch_size != 0: + raise Exception( + f"img_size {img_size} should be divided by patch_size {patch_size}." + ) + + img_size = to_2tuple(img_size) + patch_size = to_2tuple(patch_size) + + self.img_size = img_size + self.patch_size = patch_size + self.H, self.W = img_size[0] // patch_size[0], img_size[ + 1] // patch_size[1] + self.num_patches = self.H * self.W + self.proj = nn.Conv2D( + in_chans, embed_dim, kernel_size=patch_size, stride=patch_size) + self.norm = nn.LayerNorm(embed_dim) + + def forward(self, x): + B, C, H, W = x.shape + x = self.proj(x).flatten(2).transpose([0, 2, 1]) + x = self.norm(x) + H, W = H // self.patch_size[0], W // self.patch_size[1] + return x, (H, W) + + +# borrow from PVT https://github.com/whai362/PVT.git +class PyramidVisionTransformer(nn.Layer): + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + num_classes=1000, + embed_dims=[64, 128, 256, 512], + num_heads=[1, 2, 4, 8], + mlp_ratios=[4, 4, 4, 4], + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer=nn.LayerNorm, + depths=[3, 4, 6, 3], + sr_ratios=[8, 4, 2, 1], + block_cls=Block): + super().__init__() + self.num_classes = num_classes + self.depths = depths + + # patch_embed + self.patch_embeds = nn.LayerList() + self.pos_embeds = nn.ParameterList() + self.pos_drops = nn.LayerList() + self.blocks = nn.LayerList() + + for i in range(len(depths)): + if i == 0: + self.patch_embeds.append( + PatchEmbed(img_size, patch_size, in_chans, embed_dims[i])) + else: + self.patch_embeds.append( + PatchEmbed(img_size // patch_size // 2**(i - 1), 2, + embed_dims[i - 1], embed_dims[i])) + patch_num = self.patch_embeds[i].num_patches + 1 if i == len( + embed_dims) - 1 else self.patch_embeds[i].num_patches + self.pos_embeds.append( + self.create_parameter( + shape=[1, patch_num, embed_dims[i]], + default_initializer=zeros_)) + self.add_parameter(f"pos_embeds_{i}", self.pos_embeds[i]) + self.pos_drops.append(nn.Dropout(p=drop_rate)) + + dpr = [ + x.numpy()[0] + for x in paddle.linspace(0, drop_path_rate, sum(depths)) + ] # stochastic depth decay rule + + cur = 0 + for k in range(len(depths)): + _block = nn.LayerList([ + block_cls( + dim=embed_dims[k], + num_heads=num_heads[k], + mlp_ratio=mlp_ratios[k], + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[cur + i], + norm_layer=norm_layer, + sr_ratio=sr_ratios[k]) for i in range(depths[k]) + ]) + self.blocks.append(_block) + cur += depths[k] + + self.norm = norm_layer(embed_dims[-1]) + + # cls_token + self.cls_token = self.create_parameter( + shape=[1, 1, embed_dims[-1]], + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + self.add_parameter("cls_token", self.cls_token) + + # classification head + self.head = nn.Linear(embed_dims[-1], + num_classes) if num_classes > 0 else Identity() + + # init weights + for pos_emb in self.pos_embeds: + trunc_normal_(pos_emb) + self.apply(self._init_weights) + + def _init_weights(self, m): + if isinstance(m, nn.Linear): + trunc_normal_(m.weight) + if isinstance(m, nn.Linear) and m.bias is not None: + zeros_(m.bias) + elif isinstance(m, nn.LayerNorm): + zeros_(m.bias) + ones_(m.weight) + + def forward_features(self, x): + B = x.shape[0] + for i in range(len(self.depths)): + x, (H, W) = self.patch_embeds[i](x) + if i == len(self.depths) - 1: + cls_tokens = self.cls_token.expand([B, -1, -1]) + x = paddle.concat([cls_tokens, x], dim=1) + x = x + self.pos_embeds[i] + x = self.pos_drops[i](x) + for blk in self.blocks[i]: + x = blk(x, H, W) + if i < len(self.depths) - 1: + x = x.reshape([B, H, W, -1]).transpose( + [0, 3, 1, 2]).contiguous() + x = self.norm(x) + return x[:, 0] + + def forward(self, x): + x = self.forward_features(x) + x = self.head(x) + return x + + +# PEG from https://arxiv.org/abs/2102.10882 +class PosCNN(nn.Layer): + def __init__(self, in_chans, embed_dim=768, s=1): + super().__init__() + self.proj = nn.Sequential( + nn.Conv2D( + in_chans, + embed_dim, + 3, + s, + 1, + bias_attr=paddle.ParamAttr(regularizer=L2Decay(0.0)), + groups=embed_dim, + weight_attr=paddle.ParamAttr(regularizer=L2Decay(0.0)), )) + self.s = s + + def forward(self, x, H, W): + B, N, C = x.shape + feat_token = x + cnn_feat = feat_token.transpose([0, 2, 1]).reshape([B, C, H, W]) + if self.s == 1: + x = self.proj(cnn_feat) + cnn_feat + else: + x = self.proj(cnn_feat) + x = x.flatten(2).transpose([0, 2, 1]) + return x + + +class CPVTV2(PyramidVisionTransformer): + """ + Use useful results from CPVT. PEG and GAP. + Therefore, cls token is no longer required. + PEG is used to encode the absolute position on the fly, which greatly affects the performance when input resolution + changes during the training (such as segmentation, detection) + """ + + def __init__(self, + img_size=224, + patch_size=4, + in_chans=3, + num_classes=1000, + embed_dims=[64, 128, 256, 512], + num_heads=[1, 2, 4, 8], + mlp_ratios=[4, 4, 4, 4], + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer=nn.LayerNorm, + depths=[3, 4, 6, 3], + sr_ratios=[8, 4, 2, 1], + block_cls=Block): + super().__init__(img_size, patch_size, in_chans, num_classes, + embed_dims, num_heads, mlp_ratios, qkv_bias, qk_scale, + drop_rate, attn_drop_rate, drop_path_rate, norm_layer, + depths, sr_ratios, block_cls) + del self.pos_embeds + del self.cls_token + self.pos_block = nn.LayerList( + [PosCNN(embed_dim, embed_dim) for embed_dim in embed_dims]) + self.apply(self._init_weights) + + def _init_weights(self, m): + import math + if isinstance(m, nn.Linear): + trunc_normal_(m.weight) + if isinstance(m, nn.Linear) and m.bias is not None: + zeros_(m.bias) + elif isinstance(m, nn.LayerNorm): + zeros_(m.bias) + ones_(m.weight) + elif isinstance(m, nn.Conv2D): + fan_out = m._kernel_size[0] * m._kernel_size[1] * m._out_channels + fan_out //= m._groups + normal_(0, math.sqrt(2.0 / fan_out))(m.weight) + if m.bias is not None: + zeros_(m.bias) + elif isinstance(m, nn.BatchNorm2D): + m.weight.data.fill_(1.0) + m.bias.data.zero_() + + def forward_features(self, x): + B = x.shape[0] + + for i in range(len(self.depths)): + x, (H, W) = self.patch_embeds[i](x) + x = self.pos_drops[i](x) + + for j, blk in enumerate(self.blocks[i]): + x = blk(x, H, W) + if j == 0: + x = self.pos_block[i](x, H, W) # PEG here + + if i < len(self.depths) - 1: + x = x.reshape([B, H, W, -1]).transpose([0, 3, 1, 2]) + + x = self.norm(x) + return x.mean(axis=1) # GAP here + + +class PCPVT(CPVTV2): + def __init__(self, + img_size=224, + patch_size=4, + in_chans=3, + num_classes=1000, + embed_dims=[64, 128, 256], + num_heads=[1, 2, 4], + mlp_ratios=[4, 4, 4], + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer=nn.LayerNorm, + depths=[4, 4, 4], + sr_ratios=[4, 2, 1], + block_cls=SBlock): + super().__init__(img_size, patch_size, in_chans, num_classes, + embed_dims, num_heads, mlp_ratios, qkv_bias, qk_scale, + drop_rate, attn_drop_rate, drop_path_rate, norm_layer, + depths, sr_ratios, block_cls) + + +class ALTGVT(PCPVT): + """ + alias Twins-SVT + """ + + def __init__(self, + img_size=224, + patch_size=4, + in_chans=3, + class_dim=1000, + embed_dims=[64, 128, 256], + num_heads=[1, 2, 4], + mlp_ratios=[4, 4, 4], + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer=nn.LayerNorm, + depths=[4, 4, 4], + sr_ratios=[4, 2, 1], + block_cls=GroupBlock, + wss=[7, 7, 7]): + super().__init__(img_size, patch_size, in_chans, class_dim, embed_dims, + num_heads, mlp_ratios, qkv_bias, qk_scale, drop_rate, + attn_drop_rate, drop_path_rate, norm_layer, depths, + sr_ratios, block_cls) + del self.blocks + self.wss = wss + # transformer encoder + dpr = [ + x.numpy()[0] + for x in paddle.linspace(0, drop_path_rate, sum(depths)) + ] # stochastic depth decay rule + cur = 0 + self.blocks = nn.LayerList() + for k in range(len(depths)): + _block = nn.LayerList([ + block_cls( + dim=embed_dims[k], + num_heads=num_heads[k], + mlp_ratio=mlp_ratios[k], + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[cur + i], + norm_layer=norm_layer, + sr_ratio=sr_ratios[k], + ws=1 if i % 2 == 1 else wss[k]) for i in range(depths[k]) + ]) + self.blocks.append(_block) + cur += depths[k] + self.apply(self._init_weights) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def pcpvt_small(pretrained=False, use_ssld=False, **kwargs): + model = CPVTV2( + patch_size=4, + embed_dims=[64, 128, 320, 512], + num_heads=[1, 2, 5, 8], + mlp_ratios=[8, 8, 4, 4], + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + depths=[3, 4, 6, 3], + sr_ratios=[8, 4, 2, 1], + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["pcpvt_small"], use_ssld=use_ssld) + return model + + +def pcpvt_base(pretrained=False, use_ssld=False, **kwargs): + model = CPVTV2( + patch_size=4, + embed_dims=[64, 128, 320, 512], + num_heads=[1, 2, 5, 8], + mlp_ratios=[8, 8, 4, 4], + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + depths=[3, 4, 18, 3], + sr_ratios=[8, 4, 2, 1], + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["pcpvt_base"], use_ssld=use_ssld) + return model + + +def pcpvt_large(pretrained=False, use_ssld=False, **kwargs): + model = CPVTV2( + patch_size=4, + embed_dims=[64, 128, 320, 512], + num_heads=[1, 2, 5, 8], + mlp_ratios=[8, 8, 4, 4], + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + depths=[3, 8, 27, 3], + sr_ratios=[8, 4, 2, 1], + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["pcpvt_large"], use_ssld=use_ssld) + return model + + +def alt_gvt_small(pretrained=False, use_ssld=False, **kwargs): + model = ALTGVT( + patch_size=4, + embed_dims=[64, 128, 256, 512], + num_heads=[2, 4, 8, 16], + mlp_ratios=[4, 4, 4, 4], + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + depths=[2, 2, 10, 4], + wss=[7, 7, 7, 7], + sr_ratios=[8, 4, 2, 1], + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["alt_gvt_small"], use_ssld=use_ssld) + return model + + +def alt_gvt_base(pretrained=False, use_ssld=False, **kwargs): + model = ALTGVT( + patch_size=4, + embed_dims=[96, 192, 384, 768], + num_heads=[3, 6, 12, 24], + mlp_ratios=[4, 4, 4, 4], + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + depths=[2, 2, 18, 2], + wss=[7, 7, 7, 7], + sr_ratios=[8, 4, 2, 1], + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["alt_gvt_base"], use_ssld=use_ssld) + return model + + +def alt_gvt_large(pretrained=False, use_ssld=False, **kwargs): + model = ALTGVT( + patch_size=4, + embed_dims=[128, 256, 512, 1024], + num_heads=[4, 8, 16, 32], + mlp_ratios=[4, 4, 4, 4], + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + depths=[2, 2, 18, 2], + wss=[7, 7, 7, 7], + sr_ratios=[8, 4, 2, 1], + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["alt_gvt_large"], use_ssld=use_ssld) + return model diff --git a/ppcls/arch/backbone/model_zoo/hardnet.py b/ppcls/arch/backbone/model_zoo/hardnet.py new file mode 100644 index 0000000000000000000000000000000000000000..b3d5f9a45ce315e003fd17e11fa5a8db2700f2b1 --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/hardnet.py @@ -0,0 +1,265 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn + +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + + +MODEL_URLS = { + 'HarDNet39_ds': + 'https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet39_ds_pretrained.pdparams', + 'HarDNet68_ds': + 'https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet68_ds_pretrained.pdparams', + 'HarDNet68': + 'https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet68_pretrained.pdparams', + 'HarDNet85': + 'https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet85_pretrained.pdparams' +} + + +__all__ = MODEL_URLS.keys() + + +def ConvLayer(in_channels, out_channels, kernel_size=3, stride=1, bias_attr=False): + layer = nn.Sequential( + ('conv', nn.Conv2D( + in_channels, out_channels, kernel_size=kernel_size, + stride=stride, padding=kernel_size//2, groups=1, bias_attr=bias_attr + )), + ('norm', nn.BatchNorm2D(out_channels)), + ('relu', nn.ReLU6()) + ) + return layer + + +def DWConvLayer(in_channels, out_channels, kernel_size=3, stride=1, bias_attr=False): + layer = nn.Sequential( + ('dwconv', nn.Conv2D( + in_channels, out_channels, kernel_size=kernel_size, + stride=stride, padding=1, groups=out_channels, bias_attr=bias_attr + )), + ('norm', nn.BatchNorm2D(out_channels)) + ) + return layer + + +def CombConvLayer(in_channels, out_channels, kernel_size=1, stride=1): + layer = nn.Sequential( + ('layer1', ConvLayer(in_channels, out_channels, kernel_size=kernel_size)), + ('layer2', DWConvLayer(out_channels, out_channels, stride=stride)) + ) + return layer + + +class HarDBlock(nn.Layer): + def __init__(self, in_channels, growth_rate, grmul, n_layers, + keepBase=False, residual_out=False, dwconv=False): + super().__init__() + self.keepBase = keepBase + self.links = [] + layers_ = [] + self.out_channels = 0 # if upsample else in_channels + for i in range(n_layers): + outch, inch, link = self.get_link(i+1, in_channels, growth_rate, grmul) + self.links.append(link) + if dwconv: + layers_.append(CombConvLayer(inch, outch)) + else: + layers_.append(ConvLayer(inch, outch)) + + if (i % 2 == 0) or (i == n_layers - 1): + self.out_channels += outch + # print("Blk out =",self.out_channels) + self.layers = nn.LayerList(layers_) + + def get_link(self, layer, base_ch, growth_rate, grmul): + if layer == 0: + return base_ch, 0, [] + out_channels = growth_rate + + link = [] + for i in range(10): + dv = 2 ** i + if layer % dv == 0: + k = layer - dv + link.append(k) + if i > 0: + out_channels *= grmul + + out_channels = int(int(out_channels + 1) / 2) * 2 + in_channels = 0 + + for i in link: + ch, _, _ = self.get_link(i, base_ch, growth_rate, grmul) + in_channels += ch + + return out_channels, in_channels, link + + def forward(self, x): + layers_ = [x] + + for layer in range(len(self.layers)): + link = self.links[layer] + tin = [] + for i in link: + tin.append(layers_[i]) + if len(tin) > 1: + x = paddle.concat(tin, 1) + else: + x = tin[0] + out = self.layers[layer](x) + layers_.append(out) + + t = len(layers_) + out_ = [] + for i in range(t): + if (i == 0 and self.keepBase) or (i == t-1) or (i % 2 == 1): + out_.append(layers_[i]) + out = paddle.concat(out_, 1) + + return out + + +class HarDNet(nn.Layer): + def __init__(self, depth_wise=False, arch=85, + class_dim=1000, with_pool=True): + super().__init__() + first_ch = [32, 64] + second_kernel = 3 + max_pool = True + grmul = 1.7 + drop_rate = 0.1 + + # HarDNet68 + ch_list = [128, 256, 320, 640, 1024] + gr = [14, 16, 20, 40, 160] + n_layers = [8, 16, 16, 16, 4] + downSamp = [1, 0, 1, 1, 0] + + if arch == 85: + # HarDNet85 + first_ch = [48, 96] + ch_list = [192, 256, 320, 480, 720, 1280] + gr = [24, 24, 28, 36, 48, 256] + n_layers = [8, 16, 16, 16, 16, 4] + downSamp = [1, 0, 1, 0, 1, 0] + drop_rate = 0.2 + + elif arch == 39: + # HarDNet39 + first_ch = [24, 48] + ch_list = [96, 320, 640, 1024] + grmul = 1.6 + gr = [16, 20, 64, 160] + n_layers = [4, 16, 8, 4] + downSamp = [1, 1, 1, 0] + + if depth_wise: + second_kernel = 1 + max_pool = False + drop_rate = 0.05 + + blks = len(n_layers) + self.base = nn.LayerList([]) + + # First Layer: Standard Conv3x3, Stride=2 + self.base.append( + ConvLayer(in_channels=3, out_channels=first_ch[0], kernel_size=3, + stride=2, bias_attr=False)) + + # Second Layer + self.base.append( + ConvLayer(first_ch[0], first_ch[1], kernel_size=second_kernel)) + + # Maxpooling or DWConv3x3 downsampling + if max_pool: + self.base.append(nn.MaxPool2D(kernel_size=3, stride=2, padding=1)) + else: + self.base.append(DWConvLayer(first_ch[1], first_ch[1], stride=2)) + + # Build all HarDNet blocks + ch = first_ch[1] + for i in range(blks): + blk = HarDBlock(ch, gr[i], grmul, n_layers[i], dwconv=depth_wise) + ch = blk.out_channels + self.base.append(blk) + + if i == blks-1 and arch == 85: + self.base.append(nn.Dropout(0.1)) + + self.base.append(ConvLayer(ch, ch_list[i], kernel_size=1)) + ch = ch_list[i] + if downSamp[i] == 1: + if max_pool: + self.base.append(nn.MaxPool2D(kernel_size=2, stride=2)) + else: + self.base.append(DWConvLayer(ch, ch, stride=2)) + + ch = ch_list[blks-1] + + layers = [] + + if with_pool: + layers.append(nn.AdaptiveAvgPool2D((1, 1))) + + if class_dim > 0: + layers.append(nn.Flatten()) + layers.append(nn.Dropout(drop_rate)) + layers.append(nn.Linear(ch, class_dim)) + + self.base.append(nn.Sequential(*layers)) + + def forward(self, x): + for layer in self.base: + x = layer(x) + return x + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def HarDNet39_ds(pretrained=False, **kwargs): + model = HarDNet(arch=39, depth_wise=True, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["HarDNet39_ds"]) + return model + + +def HarDNet68_ds(pretrained=False, **kwargs): + model = HarDNet(arch=68, depth_wise=True, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["HarDNet68_ds"]) + return model + + +def HarDNet68(pretrained=False, **kwargs): + model = HarDNet(arch=68, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["HarDNet68"]) + return model + + +def HarDNet85(pretrained=False, **kwargs): + model = HarDNet(arch=85, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["HarDNet85"]) + return model diff --git a/ppcls/arch/backbone/model_zoo/hrnet.py b/ppcls/arch/backbone/model_zoo/hrnet.py index e69827666cf658e98a13f4ef12edd2224d5938da..1566a00a823f10aa2c4901355f051b62d54d8d16 100644 --- a/ppcls/arch/backbone/model_zoo/hrnet.py +++ b/ppcls/arch/backbone/model_zoo/hrnet.py @@ -27,24 +27,18 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "HRNet_W18_C", - "HRNet_W30_C", - "HRNet_W32_C", - "HRNet_W40_C", - "HRNet_W44_C", - "HRNet_W48_C", - "HRNet_W60_C", - "HRNet_W64_C", - "SE_HRNet_W18_C", - "SE_HRNet_W30_C", - "SE_HRNet_W32_C", - "SE_HRNet_W40_C", - "SE_HRNet_W44_C", - "SE_HRNet_W48_C", - "SE_HRNet_W60_C", - "SE_HRNet_W64_C", -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"HRNet_W18_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams", + "HRNet_W30_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams", + "HRNet_W32_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams", + "HRNet_W40_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams", + "HRNet_W44_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams", + "HRNet_W48_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams", + "HRNet_W64_C": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -661,82 +655,62 @@ class HRNet(nn.Layer): y = self.out(y) return y - -def HRNet_W18_C(**args): - model = HRNet(width=18, **args) - return model - - -def HRNet_W30_C(**args): - model = HRNet(width=30, **args) - return model - - -def HRNet_W32_C(**args): - model = HRNet(width=32, **args) - return model - - -def HRNet_W40_C(**args): - model = HRNet(width=40, **args) - return model - - -def HRNet_W44_C(**args): - model = HRNet(width=44, **args) - return model - - -def HRNet_W48_C(**args): - model = HRNet(width=48, **args) - return model - - -def HRNet_W60_C(**args): - model = HRNet(width=60, **args) - return model - - -def HRNet_W64_C(**args): - model = HRNet(width=64, **args) - return model - - -def SE_HRNet_W18_C(**args): - model = HRNet(width=18, has_se=True, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def HRNet_W18_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=18, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W18_C"], use_ssld=use_ssld) return model -def SE_HRNet_W30_C(**args): - model = HRNet(width=30, has_se=True, **args) +def HRNet_W30_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=30, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W30_C"], use_ssld=use_ssld) return model -def SE_HRNet_W32_C(**args): - model = HRNet(width=32, has_se=True, **args) +def HRNet_W32_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=32, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W32_C"], use_ssld=use_ssld) return model -def SE_HRNet_W40_C(**args): - model = HRNet(width=40, has_se=True, **args) +def HRNet_W40_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=40, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W40_C"], use_ssld=use_ssld) return model -def SE_HRNet_W44_C(**args): - model = HRNet(width=44, has_se=True, **args) +def HRNet_W44_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=44, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W44_C"], use_ssld=use_ssld) return model -def SE_HRNet_W48_C(**args): - model = HRNet(width=48, has_se=True, **args) +def HRNet_W48_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=48, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W48_C"], use_ssld=use_ssld) return model -def SE_HRNet_W60_C(**args): - model = HRNet(width=60, has_se=True, **args) +def HRNet_W64_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=64, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["HRNet_W64_C"], use_ssld=use_ssld) return model -def SE_HRNet_W64_C(**args): - model = HRNet(width=64, has_se=True, **args) +def SE_HRNet_W64_C(pretrained=False, use_ssld=False, **kwarg): + model = HRNet(width=64, **kwarg) + _load_pretrained(pretrained, model, MODEL_URLS["SE_HRNet_W64_C"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/inception_v3.py b/ppcls/arch/backbone/model_zoo/inception_v3.py index 8f93ca68371a939bdc2db47a0a37a0a991485deb..d8a9f1d8867df1016dc4947f7db1ddc386980f27 100644 --- a/ppcls/arch/backbone/model_zoo/inception_v3.py +++ b/ppcls/arch/backbone/model_zoo/inception_v3.py @@ -1,4 +1,4 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -26,7 +26,11 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform import math -__all__ = ["InceptionV3"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"InceptionV3": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -425,9 +429,9 @@ class InceptionE(nn.Layer): return outputs -class InceptionV3(nn.Layer): +class Inception_V3(nn.Layer): def __init__(self, class_dim=1000): - super(InceptionV3, self).__init__() + super(Inception_V3, self).__init__() self.inception_a_list = [[192, 256, 288], [32, 64, 64]] self.inception_c_list = [[768, 768, 768, 768], [128, 160, 160, 192]] @@ -472,10 +476,28 @@ class InceptionV3(nn.Layer): def forward(self, x): y = self.inception_stem(x) for inception_block in self.inception_block_list: - y = inception_block(y) + y = inception_block(y) y = self.gap(y) y = paddle.reshape(y, shape=[-1, 2048]) y = self.drop(y) y = self.out(y) return y + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + +def InceptionV3(pretrained=False, use_ssld=False, **kwargs): + model = Inception_V3(**kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["InceptionV3"], use_ssld=use_ssld) + return model + diff --git a/ppcls/arch/backbone/model_zoo/inception_v4.py b/ppcls/arch/backbone/model_zoo/inception_v4.py index b8ba098490ce8c705ddb0efda0452b3a0900e401..37cef5c20f6b444d078b0f3a69ce45357c093c5c 100644 --- a/ppcls/arch/backbone/model_zoo/inception_v4.py +++ b/ppcls/arch/backbone/model_zoo/inception_v4.py @@ -21,7 +21,11 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform import math -__all__ = ["InceptionV4"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"InceptionV4": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -450,6 +454,19 @@ class InceptionV4DY(nn.Layer): return x -def InceptionV4(**args): - model = InceptionV4DY(**args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + +def InceptionV4(pretrained=False, use_ssld=False, **kwargs): + model = InceptionV4DY(**kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["InceptionV4"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/levit.py b/ppcls/arch/backbone/model_zoo/levit.py new file mode 100644 index 0000000000000000000000000000000000000000..bb74e00c63f7f77d8cef2ac22b8fecc46549e816 --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/levit.py @@ -0,0 +1,547 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import itertools +import math +import warnings + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import TruncatedNormal, Constant +from paddle.regularizer import L2Decay + +from .vision_transformer import trunc_normal_, zeros_, ones_, Identity + +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "LeViT_128S": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128S_pretrained.pdparams", + "LeViT_128": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams", + "LeViT_192": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_192_pretrained.pdparams", + "LeViT_256": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_256_pretrained.pdparams", + "LeViT_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_384_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) + + +def cal_attention_biases(attention_biases, attention_bias_idxs): + gather_list = [] + attention_bias_t = paddle.transpose(attention_biases, (1, 0)) + for idx in attention_bias_idxs: + gather = paddle.gather(attention_bias_t, idx) + gather_list.append(gather) + shape0, shape1 = attention_bias_idxs.shape + return paddle.transpose(paddle.concat(gather_list), (1, 0)).reshape( + (0, shape0, shape1)) + + +class Conv2d_BN(nn.Sequential): + def __init__(self, + a, + b, + ks=1, + stride=1, + pad=0, + dilation=1, + groups=1, + bn_weight_init=1, + resolution=-10000): + super().__init__() + self.add_sublayer( + 'c', + nn.Conv2D( + a, b, ks, stride, pad, dilation, groups, bias_attr=False)) + bn = nn.BatchNorm2D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + +class Linear_BN(nn.Sequential): + def __init__(self, a, b, bn_weight_init=1): + super().__init__() + self.add_sublayer('c', nn.Linear(a, b, bias_attr=False)) + bn = nn.BatchNorm1D(b) + ones_(bn.weight) + zeros_(bn.bias) + self.add_sublayer('bn', bn) + + def forward(self, x): + l, bn = self._sub_layers.values() + x = l(x) + return paddle.reshape(bn(x.flatten(0, 1)), x.shape) + + +class BN_Linear(nn.Sequential): + def __init__(self, a, b, bias=True, std=0.02): + super().__init__() + self.add_sublayer('bn', nn.BatchNorm1D(a)) + l = nn.Linear(a, b, bias_attr=bias) + trunc_normal_(l.weight) + if bias: + zeros_(l.bias) + self.add_sublayer('l', l) + + +def b16(n, activation, resolution=224): + return nn.Sequential( + Conv2d_BN( + 3, n // 8, 3, 2, 1, resolution=resolution), + activation(), + Conv2d_BN( + n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), + activation(), + Conv2d_BN( + n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), + activation(), + Conv2d_BN( + n // 2, n, 3, 2, 1, resolution=resolution // 8)) + + +class Residual(nn.Layer): + def __init__(self, m, drop): + super().__init__() + self.m = m + self.drop = drop + + def forward(self, x): + if self.training and self.drop > 0: + return x + self.m(x) * paddle.rand( + x.size(0), 1, 1, + device=x.device).ge_(self.drop).div(1 - self.drop).detach() + else: + return x + self.m(x) + + +class Attention(nn.Layer): + def __init__(self, + dim, + key_dim, + num_heads=8, + attn_ratio=4, + activation=None, + resolution=14): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * num_heads + self.attn_ratio = attn_ratio + self.h = self.dh + nh_kd * 2 + self.qkv = Linear_BN(dim, self.h) + self.proj = nn.Sequential( + activation(), Linear_BN( + self.dh, dim, bn_weight_init=0)) + points = list(itertools.product(range(resolution), range(resolution))) + N = len(points) + attention_offsets = {} + idxs = [] + for p1 in points: + for p2 in points: + offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1])) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter( + shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + tensor_idxs = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', + paddle.reshape(tensor_idxs, [N, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, + self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + qkv = self.qkv(x) + qkv = paddle.reshape(qkv, + [B, N, self.num_heads, self.h // self.num_heads]) + q, k, v = paddle.split( + qkv, [self.key_dim, self.key_dim, self.d], axis=3) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, + self.attention_bias_idxs) + else: + attention_biases = self.ab + attn = ((q @k_transpose) * self.scale + attention_biases) + attn = F.softmax(attn) + x = paddle.transpose(attn @v, perm=[0, 2, 1, 3]) + x = paddle.reshape(x, [B, N, self.dh]) + x = self.proj(x) + return x + + +class Subsample(nn.Layer): + def __init__(self, stride, resolution): + super().__init__() + self.stride = stride + self.resolution = resolution + + def forward(self, x): + B, N, C = x.shape + x = paddle.reshape(x, [B, self.resolution, self.resolution, + C])[:, ::self.stride, ::self.stride] + x = paddle.reshape(x, [B, -1, C]) + return x + + +class AttentionSubsample(nn.Layer): + def __init__(self, + in_dim, + out_dim, + key_dim, + num_heads=8, + attn_ratio=2, + activation=None, + stride=2, + resolution=14, + resolution_=7): + super().__init__() + self.num_heads = num_heads + self.scale = key_dim**-0.5 + self.key_dim = key_dim + self.nh_kd = nh_kd = key_dim * num_heads + self.d = int(attn_ratio * key_dim) + self.dh = int(attn_ratio * key_dim) * self.num_heads + self.attn_ratio = attn_ratio + self.resolution_ = resolution_ + self.resolution_2 = resolution_**2 + self.training = True + h = self.dh + nh_kd + self.kv = Linear_BN(in_dim, h) + + self.q = nn.Sequential( + Subsample(stride, resolution), Linear_BN(in_dim, nh_kd)) + self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim)) + + self.stride = stride + self.resolution = resolution + points = list(itertools.product(range(resolution), range(resolution))) + points_ = list( + itertools.product(range(resolution_), range(resolution_))) + + N = len(points) + N_ = len(points_) + attention_offsets = {} + idxs = [] + i = 0 + j = 0 + for p1 in points_: + i += 1 + for p2 in points: + j += 1 + size = 1 + offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), + abs(p1[1] * stride - p2[1] + (size - 1) / 2)) + if offset not in attention_offsets: + attention_offsets[offset] = len(attention_offsets) + idxs.append(attention_offsets[offset]) + self.attention_biases = self.create_parameter( + shape=(num_heads, len(attention_offsets)), + default_initializer=zeros_, + attr=paddle.ParamAttr(regularizer=L2Decay(0.0))) + + tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64') + self.register_buffer('attention_bias_idxs', + paddle.reshape(tensor_idxs_, [N_, N])) + + @paddle.no_grad() + def train(self, mode=True): + if mode: + super().train() + else: + super().eval() + if mode and hasattr(self, 'ab'): + del self.ab + else: + self.ab = cal_attention_biases(self.attention_biases, + self.attention_bias_idxs) + + def forward(self, x): + self.training = True + B, N, C = x.shape + kv = self.kv(x) + kv = paddle.reshape(kv, [B, N, self.num_heads, -1]) + k, v = paddle.split(kv, [self.key_dim, self.d], axis=3) + k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC + v = paddle.transpose(v, perm=[0, 2, 1, 3]) + q = paddle.reshape( + self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim]) + q = paddle.transpose(q, perm=[0, 2, 1, 3]) + + if self.training: + attention_biases = cal_attention_biases(self.attention_biases, + self.attention_bias_idxs) + else: + attention_biases = self.ab + + attn = (q @paddle.transpose( + k, perm=[0, 1, 3, 2])) * self.scale + attention_biases + attn = F.softmax(attn) + + x = paddle.reshape( + paddle.transpose( + (attn @v), perm=[0, 2, 1, 3]), [B, -1, self.dh]) + x = self.proj(x) + return x + + +class LeViT(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_dim=1000, + embed_dim=[192], + key_dim=[64], + depth=[12], + num_heads=[3], + attn_ratio=[2], + mlp_ratio=[2], + hybrid_backbone=None, + down_ops=[], + attention_activation=nn.Hardswish, + mlp_activation=nn.Hardswish, + distillation=True, + drop_path=0): + super().__init__() + + self.class_dim = class_dim + self.num_features = embed_dim[-1] + self.embed_dim = embed_dim + self.distillation = distillation + + self.patch_embed = hybrid_backbone + + self.blocks = [] + down_ops.append(['']) + resolution = img_size // patch_size + for i, (ed, kd, dpth, nh, ar, mr, do) in enumerate( + zip(embed_dim, key_dim, depth, num_heads, attn_ratio, + mlp_ratio, down_ops)): + for _ in range(dpth): + self.blocks.append( + Residual( + Attention( + ed, + kd, + nh, + attn_ratio=ar, + activation=attention_activation, + resolution=resolution, ), + drop_path)) + if mr > 0: + h = int(ed * mr) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(ed, h), + mlp_activation(), + Linear_BN( + h, ed, bn_weight_init=0), ), + drop_path)) + if do[0] == 'Subsample': + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + resolution_ = (resolution - 1) // do[5] + 1 + self.blocks.append( + AttentionSubsample( + *embed_dim[i:i + 2], + key_dim=do[1], + num_heads=do[2], + attn_ratio=do[3], + activation=attention_activation, + stride=do[5], + resolution=resolution, + resolution_=resolution_)) + resolution = resolution_ + if do[4] > 0: # mlp_ratio + h = int(embed_dim[i + 1] * do[4]) + self.blocks.append( + Residual( + nn.Sequential( + Linear_BN(embed_dim[i + 1], h), + mlp_activation(), + Linear_BN( + h, embed_dim[i + 1], bn_weight_init=0), ), + drop_path)) + self.blocks = nn.Sequential(*self.blocks) + + # Classifier head + self.head = BN_Linear(embed_dim[-1], + class_dim) if class_dim > 0 else Identity() + if distillation: + self.head_dist = BN_Linear( + embed_dim[-1], class_dim) if class_dim > 0 else Identity() + + def forward(self, x): + x = self.patch_embed(x) + x = x.flatten(2) + x = paddle.transpose(x, perm=[0, 2, 1]) + x = self.blocks(x) + x = x.mean(1) + if self.distillation: + x = self.head(x), self.head_dist(x) + if not self.training: + x = (x[0] + x[1]) / 2 + else: + x = self.head(x) + return x + + +def model_factory(C, D, X, N, drop_path, class_dim, distillation): + embed_dim = [int(x) for x in C.split('_')] + num_heads = [int(x) for x in N.split('_')] + depth = [int(x) for x in X.split('_')] + act = nn.Hardswish + model = LeViT( + patch_size=16, + embed_dim=embed_dim, + num_heads=num_heads, + key_dim=[D] * 3, + depth=depth, + attn_ratio=[2, 2, 2], + mlp_ratio=[2, 2, 2], + down_ops=[ + #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride) + ['Subsample', D, embed_dim[0] // D, 4, 2, 2], + ['Subsample', D, embed_dim[1] // D, 4, 2, 2], + ], + attention_activation=act, + mlp_activation=act, + hybrid_backbone=b16(embed_dim[0], activation=act), + class_dim=class_dim, + drop_path=drop_path, + distillation=distillation) + + return model + + +specification = { + 'LeViT_128S': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_6_8', + 'X': '2_3_4', + 'drop_path': 0 + }, + 'LeViT_128': { + 'C': '128_256_384', + 'D': 16, + 'N': '4_8_12', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_192': { + 'C': '192_288_384', + 'D': 32, + 'N': '3_5_6', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_256': { + 'C': '256_384_512', + 'D': 32, + 'N': '4_6_8', + 'X': '4_4_4', + 'drop_path': 0 + }, + 'LeViT_384': { + 'C': '384_512_768', + 'D': 32, + 'N': '6_9_12', + 'X': '4_4_4', + 'drop_path': 0.1 + }, +} + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def LeViT_128S(pretrained=False, use_ssld=False, class_dim=1000, distillation=False, **kwargs): + model = model_factory( + **specification['LeViT_128S'], + class_dim=class_dim, + distillation=distillation) + _load_pretrained(pretrained, model, MODEL_URLS["LeViT_128S"], use_ssld=use_ssld) + return model + + +def LeViT_128(pretrained=False, use_ssld=False, class_dim=1000, distillation=False, **kwargs): + model = model_factory( + **specification['LeViT_128'], + class_dim=class_dim, + distillation=distillation) + _load_pretrained(pretrained, model, MODEL_URLS["LeViT_128"], use_ssld=use_ssld) + return model + + +def LeViT_192(pretrained=False, use_ssld=False, class_dim=1000, distillation=False, **kwargs): + model = model_factory( + **specification['LeViT_192'], + class_dim=class_dim, + distillation=distillation) + _load_pretrained(pretrained, model, MODEL_URLS["LeViT_192"], use_ssld=use_ssld) + return model + + +def LeViT_256(pretrained=False, use_ssld=False, class_dim=1000, distillation=False, **kwargs): + model = model_factory( + **specification['LeViT_256'], + class_dim=class_dim, + distillation=distillation) + _load_pretrained(pretrained, model, MODEL_URLS["LeViT_256"], use_ssld=use_ssld) + return model + + +def LeViT_384(pretrained=False, use_ssld=False, class_dim=1000, distillation=False, **kwargs): + model = model_factory( + **specification['LeViT_384'], + class_dim=class_dim, + distillation=distillation) + _load_pretrained(pretrained, model, MODEL_URLS["LeViT_384"], use_ssld=use_ssld) + return model diff --git a/ppcls/arch/backbone/model_zoo/mixnet.py b/ppcls/arch/backbone/model_zoo/mixnet.py index f195694741292977e07b084a3c8efa8303f810a1..13582acb861cbf29f0964ae7dbf103e3a3b19b80 100644 --- a/ppcls/arch/backbone/model_zoo/mixnet.py +++ b/ppcls/arch/backbone/model_zoo/mixnet.py @@ -17,14 +17,20 @@ https://arxiv.org/abs/1907.09595. """ -__all__ = ['MixNet_S', 'MixNet_M', 'MixNet_L'] - import os from inspect import isfunction from functools import reduce import paddle import paddle.nn as nn +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"MixNet_S": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams", + "MixNet_M": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams", + "MixNet_L": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) + class Identity(nn.Layer): """ @@ -755,13 +761,33 @@ def get_mixnet(version, width_scale, model_name=None, **kwargs): return net +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + +def MixNet_S(pretrained=False, use_ssld=False, **kwargs): + model = InceptionV4DY(**kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["InceptionV4"], use_ssld=use_ssld) + return model + + def MixNet_S(**kwargs): """ MixNet-S model from 'MixConv: Mixed Depthwise Convolutional Kernels,' https://arxiv.org/abs/1907.09595. """ - return get_mixnet( + model = get_mixnet( version="s", width_scale=1.0, model_name="MixNet_S", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MixNet_S"], use_ssld=use_ssld) + return model def MixNet_M(**kwargs): @@ -769,14 +795,19 @@ def MixNet_M(**kwargs): MixNet-M model from 'MixConv: Mixed Depthwise Convolutional Kernels,' https://arxiv.org/abs/1907.09595. """ - return get_mixnet( + model = get_mixnet( version="m", width_scale=1.0, model_name="MixNet_M", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MixNet_M"], use_ssld=use_ssld) + return model def MixNet_L(**kwargs): """ - MixNet-L model from 'MixConv: Mixed Depthwise Convolutional Kernels,' + MixNet-S model from 'MixConv: Mixed Depthwise Convolutional Kernels,' https://arxiv.org/abs/1907.09595. """ - return get_mixnet( + model = get_mixnet( version="m", width_scale=1.3, model_name="MixNet_L", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MixNet_L"], use_ssld=use_ssld) + return model + diff --git a/ppcls/arch/backbone/model_zoo/mobilenet_v1.py b/ppcls/arch/backbone/model_zoo/mobilenet_v1.py index 751ca552aace540d3fc6099ef9dafcc137322a9d..575dbf26d8519fe8e3dcb10c81f8b76d436b8b44 100644 --- a/ppcls/arch/backbone/model_zoo/mobilenet_v1.py +++ b/ppcls/arch/backbone/model_zoo/mobilenet_v1.py @@ -26,9 +26,14 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import KaimingNormal import math -__all__ = [ - "MobileNetV1_x0_25", "MobileNetV1_x0_5", "MobileNetV1_x0_75", "MobileNetV1" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"MobileNetV1_x0_25": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams", + "MobileNetV1_x0_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams", + "MobileNetV1_x0_75": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams", + "MobileNetV1": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -245,22 +250,39 @@ class MobileNet(nn.Layer): y = self.out(y) return y - -def MobileNetV1_x0_25(**args): - model = MobileNet(scale=0.25, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def MobileNetV1_x0_25(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=0.25, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV1_x0_25"], use_ssld=use_ssld) return model -def MobileNetV1_x0_5(**args): - model = MobileNet(scale=0.5, **args) +def MobileNetV1_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV1_x0_5"], use_ssld=use_ssld) return model -def MobileNetV1_x0_75(**args): - model = MobileNet(scale=0.75, **args) +def MobileNetV1_x0_75(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=0.75, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV1_x0_75"], use_ssld=use_ssld) return model -def MobileNetV1(**args): - model = MobileNet(scale=1.0, **args) - return model +def MobileNetV1(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV1"], use_ssld=use_ssld) + return model \ No newline at end of file diff --git a/ppcls/arch/backbone/model_zoo/mobilenet_v2.py b/ppcls/arch/backbone/model_zoo/mobilenet_v2.py index e2f0043b5d3d263d0d5839e40cfdb779d9673452..4cafd1461d51eea724655a38eb18c056d7944526 100644 --- a/ppcls/arch/backbone/model_zoo/mobilenet_v2.py +++ b/ppcls/arch/backbone/model_zoo/mobilenet_v2.py @@ -26,10 +26,16 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D import math -__all__ = [ - "MobileNetV2_x0_25", "MobileNetV2_x0_5", "MobileNetV2_x0_75", - "MobileNetV2", "MobileNetV2_x1_5", "MobileNetV2_x2_0" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"MobileNetV2_x0_25": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams", + "MobileNetV2_x0_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams", + "MobileNetV2_x0_75": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams", + "MobileNetV2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams", + "MobileNetV2_x1_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams", + "MobileNetV2_x2_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -149,7 +155,7 @@ class InvresiBlocks(nn.Layer): class MobileNet(nn.Layer): - def __init__(self, class_dim=1000, scale=1.0, prefix_name="", **args): + def __init__(self, class_dim=1000, scale=1.0, prefix_name=""): super(MobileNet, self).__init__() self.scale = scale self.class_dim = class_dim @@ -216,33 +222,52 @@ class MobileNet(nn.Layer): y = paddle.flatten(y, start_axis=1, stop_axis=-1) y = self.out(y) return y - - -def MobileNetV2_x0_25(**args): - model = MobileNet(scale=0.25, **args) + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def MobileNetV2_x0_25(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=0.25, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV2_x0_25"], use_ssld=use_ssld) return model -def MobileNetV2_x0_5(**args): - model = MobileNet(scale=0.5, **args) +def MobileNetV2_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV2_x0_5"], use_ssld=use_ssld) return model -def MobileNetV2_x0_75(**args): - model = MobileNet(scale=0.75, **args) +def MobileNetV2_x0_75(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=0.75, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV2_x0_75"], use_ssld=use_ssld) return model -def MobileNetV2(**args): - model = MobileNet(scale=1.0, **args) +def MobileNetV2(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV2"], use_ssld=use_ssld) return model -def MobileNetV2_x1_5(**args): - model = MobileNet(scale=1.5, **args) +def MobileNetV2_x1_5(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=1.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV2_x1_5"], use_ssld=use_ssld) return model -def MobileNetV2_x2_0(**args): - model = MobileNet(scale=2.0, **args) +def MobileNetV2_x2_0(pretrained=False, use_ssld=False, **kwargs): + model = MobileNet(scale=2.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV2_x2_0"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/mobilenet_v3.py b/ppcls/arch/backbone/model_zoo/mobilenet_v3.py index 42954c8e889002390378e3905e6d8ea19a1d97ec..2169f7a2093390e537a7d39f2e414e5dd52a1c03 100644 --- a/ppcls/arch/backbone/model_zoo/mobilenet_v3.py +++ b/ppcls/arch/backbone/model_zoo/mobilenet_v3.py @@ -28,13 +28,20 @@ from paddle.regularizer import L2Decay import math -__all__ = [ - "MobileNetV3_small_x0_35", "MobileNetV3_small_x0_5", - "MobileNetV3_small_x0_75", "MobileNetV3_small_x1_0", - "MobileNetV3_small_x1_25", "MobileNetV3_large_x0_35", - "MobileNetV3_large_x0_5", "MobileNetV3_large_x0_75", - "MobileNetV3_large_x1_0", "MobileNetV3_large_x1_25" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"MobileNetV3_small_x0_35": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams", + "MobileNetV3_small_x0_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams", + "MobileNetV3_small_x0_75": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams", + "MobileNetV3_small_x1_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams", + "MobileNetV3_small_x1_25": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams", + "MobileNetV3_large_x0_35": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams", + "MobileNetV3_large_x0_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams", + "MobileNetV3_large_x0_75": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams", + "MobileNetV3_large_x1_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams", + "MobileNetV3_large_x1_25": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) def make_divisible(v, divisor=8, min_value=None): @@ -308,52 +315,75 @@ class SEModule(nn.Layer): outputs = hardsigmoid(outputs, slope=0.2, offset=0.5) return paddle.multiply(x=inputs, y=outputs) - -def MobileNetV3_small_x0_35(**args): - model = MobileNetV3(model_name="small", scale=0.35, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def MobileNetV3_small_x0_35(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="small", scale=0.35, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_small_x0_35"], use_ssld=use_ssld) return model -def MobileNetV3_small_x0_5(**args): - model = MobileNetV3(model_name="small", scale=0.5, **args) +def MobileNetV3_small_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="small", scale=0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_small_x0_5"], use_ssld=use_ssld) return model -def MobileNetV3_small_x0_75(**args): - model = MobileNetV3(model_name="small", scale=0.75, **args) +def MobileNetV3_small_x0_75(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="small", scale=0.75, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_small_x0_75"], use_ssld=use_ssld) return model -def MobileNetV3_small_x1_0(**args): - model = MobileNetV3(model_name="small", scale=1.0, **args) +def MobileNetV3_small_x1_0(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="small", scale=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_small_x1_0"], use_ssld=use_ssld) return model -def MobileNetV3_small_x1_25(**args): - model = MobileNetV3(model_name="small", scale=1.25, **args) +def MobileNetV3_small_x1_25(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="small", scale=1.25, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_small_x1_25"], use_ssld=use_ssld) return model - -def MobileNetV3_large_x0_35(**args): - model = MobileNetV3(model_name="large", scale=0.35, **args) + +def MobileNetV3_large_x0_35(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="large", scale=0.35, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_large_x0_35"], use_ssld=use_ssld) return model -def MobileNetV3_large_x0_5(**args): - model = MobileNetV3(model_name="large", scale=0.5, **args) +def MobileNetV3_large_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="large", scale=0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_large_x0_5"], use_ssld=use_ssld) return model -def MobileNetV3_large_x0_75(**args): - model = MobileNetV3(model_name="large", scale=0.75, **args) +def MobileNetV3_large_x0_75(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="large", scale=0.75, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_large_x0_75"], use_ssld=use_ssld) return model -def MobileNetV3_large_x1_0(**args): - model = MobileNetV3(model_name="large", scale=1.0, **args) +def MobileNetV3_large_x1_0(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="large", scale=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_large_x1_0"], use_ssld=use_ssld) return model -def MobileNetV3_large_x1_25(**args): - model = MobileNetV3(model_name="large", scale=1.25, **args) +def MobileNetV3_large_x1_25(pretrained=False, use_ssld=False, **kwargs): + model = MobileNetV3(model_name="large", scale=1.25, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["MobileNetV3_large_x1_25"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/rednet.py b/ppcls/arch/backbone/model_zoo/rednet.py new file mode 100644 index 0000000000000000000000000000000000000000..a113a32aca5e6d000cb95157690a3b56f85c7cb4 --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/rednet.py @@ -0,0 +1,206 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn + +from paddle.vision.models import resnet + +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + + +MODEL_URLS = { + "RedNet26": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet26_pretrained.pdparams", + "RedNet38": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet38_pretrained.pdparams", + "RedNet50": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet50_pretrained.pdparams", + "RedNet101": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet101_pretrained.pdparams", + "RedNet152": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet152_pretrained.pdparams" +} + + +__all__ = MODEL_URLS.keys() + + +class Involution(nn.Layer): + def __init__(self, channels, kernel_size, stride): + super(Involution, self).__init__() + self.kernel_size = kernel_size + self.stride = stride + self.channels = channels + reduction_ratio = 4 + self.group_channels = 16 + self.groups = self.channels // self.group_channels + self.conv1 = nn.Sequential( + ('conv', nn.Conv2D( + in_channels=channels, + out_channels=channels // reduction_ratio, + kernel_size=1, + bias_attr=False + )), + ('bn', nn.BatchNorm2D(channels // reduction_ratio)), + ('activate', nn.ReLU()) + ) + self.conv2 = nn.Sequential( + ('conv', nn.Conv2D( + in_channels=channels // reduction_ratio, + out_channels=kernel_size**2 * self.groups, + kernel_size=1, + stride=1 + )) + ) + if stride > 1: + self.avgpool = nn.AvgPool2D(stride, stride) + + def forward(self, x): + weight = self.conv2(self.conv1(x if self.stride == 1 else self.avgpool(x))) + b, c, h, w = weight.shape + weight = weight.reshape((b, self.groups, self.kernel_size**2, h, w)).unsqueeze(2) + + out = nn.functional.unfold(x, self.kernel_size, self.stride, (self.kernel_size-1)//2, 1) + out = out.reshape((b, self.groups, self.group_channels, self.kernel_size**2, h, w)) + out = (weight * out).sum(axis=3).reshape((b, self.channels, h, w)) + return out + + +class BottleneckBlock(resnet.BottleneckBlock): + def __init__(self, inplanes, planes, stride=1, downsample=None, + groups=1, base_width=64, dilation=1, norm_layer=None): + super(BottleneckBlock, self).__init__( + inplanes, planes, stride, downsample, + groups, base_width, dilation, norm_layer + ) + width = int(planes * (base_width / 64.)) * groups + self.conv2 = Involution(width, 7, stride) + + +class RedNet(resnet.ResNet): + def __init__(self, block, depth, class_dim=1000, with_pool=True): + super(RedNet, self).__init__( + block=block, depth=50, + num_classes=class_dim, with_pool=with_pool + ) + layer_cfg = { + 26: [1, 2, 4, 1], + 38: [2, 3, 5, 2], + 50: [3, 4, 6, 3], + 101: [3, 4, 23, 3], + 152: [3, 8, 36, 3] + } + layers = layer_cfg[depth] + + self.conv1 = None + self.bn1 = None + self.relu = None + self.inplanes = 64 + self.class_dim = class_dim + self.stem = nn.Sequential( + nn.Sequential( + ('conv', nn.Conv2D( + in_channels=3, + out_channels=self.inplanes // 2, + kernel_size=3, + stride=2, + padding=1, + bias_attr=False + )), + ('bn', nn.BatchNorm2D(self.inplanes // 2)), + ('activate', nn.ReLU()) + ), + Involution(self.inplanes // 2, 3, 1), + nn.BatchNorm2D(self.inplanes // 2), + nn.ReLU(), + nn.Sequential( + ('conv', nn.Conv2D( + in_channels=self.inplanes // 2, + out_channels=self.inplanes, + kernel_size=3, + stride=1, + padding=1, + bias_attr=False + )), + ('bn', nn.BatchNorm2D(self.inplanes)), + ('activate', nn.ReLU()) + ) + ) + + self.layer1 = self._make_layer(block, 64, layers[0]) + self.layer2 = self._make_layer(block, 128, layers[1], stride=2) + self.layer3 = self._make_layer(block, 256, layers[2], stride=2) + self.layer4 = self._make_layer(block, 512, layers[3], stride=2) + + def forward(self, x): + x = self.stem(x) + x = self.maxpool(x) + + x = self.layer1(x) + x = self.layer2(x) + x = self.layer3(x) + x = self.layer4(x) + + if self.with_pool: + x = self.avgpool(x) + + if self.class_dim > 0: + x = paddle.flatten(x, 1) + x = self.fc(x) + + return x + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def RedNet26(pretrained=False, **kwargs): + model = RedNet(BottleneckBlock, 26, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RedNet26"]) + return model + + +def RedNet38(pretrained=False, **kwargs): + model = RedNet(BottleneckBlock, 38, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RedNet38"]) + return model + + +def RedNet50(pretrained=False, **kwargs): + model = RedNet(BottleneckBlock, 50, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RedNet50"]) + return model + + +def RedNet101(pretrained=False, **kwargs): + model = RedNet(BottleneckBlock, 101, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RedNet101"]) + return model + + +def RedNet152(pretrained=False, **kwargs): + model = RedNet(BottleneckBlock, 152, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RedNet152"]) + return model diff --git a/ppcls/arch/backbone/model_zoo/regnet.py b/ppcls/arch/backbone/model_zoo/regnet.py index 19ddaaad1b1ed3ebcc6557bcb19c753f8f33c50f..86802ee7e5736356925ecb2293b29dcf2d4b539a 100644 --- a/ppcls/arch/backbone/model_zoo/regnet.py +++ b/ppcls/arch/backbone/model_zoo/regnet.py @@ -26,10 +26,17 @@ from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform import math -__all__ = [ - "RegNetX_200MF", "RegNetX_4GF", "RegNetX_32GF", "RegNetY_200MF", - "RegNetY_4GF", "RegNetY_32GF" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"RegNetX_200MF": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_200MF_pretrained.pdparams", + "RegNetX_4GF": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams", + "RegNetX_32GF": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_32GF_pretrained.pdparams", + "RegNetY_200MF": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetY_200MF_pretrained.pdparams", + "RegNetY_4GF": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetY_4GF_pretrained.pdparams", + "RegNetY_32GF": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetY_32GF_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) def quantize_float(f, q): @@ -308,14 +315,28 @@ class RegNet(nn.Layer): y = self.out(y) return y - -def RegNetX_200MF(**args): + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def RegNetX_200MF(pretrained=False, use_ssld=False, **kwargs): model = RegNet( - w_a=36.44, w_0=24, w_m=2.49, d=13, group_w=8, bot_mul=1.0, q=8, **args) + w_a=36.44, w_0=24, w_m=2.49, d=13, group_w=8, bot_mul=1.0, q=8, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RegNetX_200MF"], use_ssld=use_ssld) return model -def RegNetX_4GF(**args): +def RegNetX_4GF(pretrained=False, use_ssld=False, **kwargs): model = RegNet( w_a=38.65, w_0=96, @@ -324,11 +345,12 @@ def RegNetX_4GF(**args): group_w=40, bot_mul=1.0, q=8, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RegNetX_4GF"], use_ssld=use_ssld) return model -def RegNetX_32GF(**args): +def RegNetX_32GF(pretrained=False, use_ssld=False, **kwargs): model = RegNet( w_a=69.86, w_0=320, @@ -337,11 +359,12 @@ def RegNetX_32GF(**args): group_w=168, bot_mul=1.0, q=8, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RegNetX_32GF"], use_ssld=use_ssld) return model -def RegNetY_200MF(**args): +def RegNetY_200MF(pretrained=False, use_ssld=False, **kwargs): model = RegNet( w_a=36.44, w_0=24, @@ -351,11 +374,12 @@ def RegNetY_200MF(**args): bot_mul=1.0, q=8, se_on=True, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RegNetX_32GF"], use_ssld=use_ssld) return model -def RegNetY_4GF(**args): +def RegNetY_4GF(pretrained=False, use_ssld=False, **kwargs): model = RegNet( w_a=31.41, w_0=96, @@ -365,11 +389,12 @@ def RegNetY_4GF(**args): bot_mul=1.0, q=8, se_on=True, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RegNetX_32GF"], use_ssld=use_ssld) return model -def RegNetY_32GF(**args): +def RegNetY_32GF(pretrained=False, use_ssld=False, **kwargs): model = RegNet( w_a=115.89, w_0=232, @@ -379,5 +404,6 @@ def RegNetY_32GF(**args): bot_mul=1.0, q=8, se_on=True, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RegNetX_32GF"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/repvgg.py b/ppcls/arch/backbone/model_zoo/repvgg.py index ec43d21d9c917859816fbd073a4fe7521cc08947..2447fbe251e9baa35cc186618422fd710366dcc8 100644 --- a/ppcls/arch/backbone/model_zoo/repvgg.py +++ b/ppcls/arch/backbone/model_zoo/repvgg.py @@ -2,22 +2,29 @@ import paddle.nn as nn import paddle import numpy as np -__all__ = [ - 'RepVGG', - 'RepVGG_A0', - 'RepVGG_A1', - 'RepVGG_A2', - 'RepVGG_B0', - 'RepVGG_B1', - 'RepVGG_B2', - 'RepVGG_B3', - 'RepVGG_B1g2', - 'RepVGG_B1g4', - 'RepVGG_B2g2', - 'RepVGG_B2g4', - 'RepVGG_B3g2', - 'RepVGG_B3g4', -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"RepVGG_A0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams", + "RepVGG_A1": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams", + "RepVGG_A2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams", + "RepVGG_B0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams", + "RepVGG_B1": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams", + "RepVGG_B2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams", + "RepVGG_B3": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3_pretrained.pdparams", + "RepVGG_B1g2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams", + "RepVGG_B1g4": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams", + "RepVGG_B2g2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g2_pretrained.pdparams", + "RepVGG_B2g4": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams", + "RepVGG_B3g2": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g2_pretrained.pdparams", + "RepVGG_B3g4": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) + + +optional_groupwise_layers = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26] +g2_map = {l: 2 for l in optional_groupwise_layers} +g4_map = {l: 4 for l in optional_groupwise_layers} class ConvBN(nn.Layer): @@ -230,110 +237,144 @@ class RepVGG(nn.Layer): return out -optional_groupwise_layers = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26] -g2_map = {l: 2 for l in optional_groupwise_layers} -g4_map = {l: 4 for l in optional_groupwise_layers} - - -def RepVGG_A0(**kwargs): - return RepVGG( +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def RepVGG_A0(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[2, 4, 14, 1], width_multiplier=[0.75, 0.75, 0.75, 2.5], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_A0"], use_ssld=use_ssld) + return model -def RepVGG_A1(**kwargs): - return RepVGG( +def RepVGG_A1(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[2, 4, 14, 1], width_multiplier=[1, 1, 1, 2.5], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_A1"], use_ssld=use_ssld) + return model -def RepVGG_A2(**kwargs): - return RepVGG( +def RepVGG_A2(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[2, 4, 14, 1], width_multiplier=[1.5, 1.5, 1.5, 2.75], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_A2"], use_ssld=use_ssld) + return model -def RepVGG_B0(**kwargs): - return RepVGG( +def RepVGG_B0(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[1, 1, 1, 2.5], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B0"], use_ssld=use_ssld) + return model -def RepVGG_B1(**kwargs): - return RepVGG( +def RepVGG_B1(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[2, 2, 2, 4], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B1"], use_ssld=use_ssld) + return model -def RepVGG_B1g2(**kwargs): - return RepVGG( +def RepVGG_B1g2(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[2, 2, 2, 4], override_groups_map=g2_map, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B1g2"], use_ssld=use_ssld) + return model -def RepVGG_B1g4(**kwargs): - return RepVGG( +def RepVGG_B1g4(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[2, 2, 2, 4], override_groups_map=g4_map, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B1g4"], use_ssld=use_ssld) + return model -def RepVGG_B2(**kwargs): - return RepVGG( +def RepVGG_B2(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B2"], use_ssld=use_ssld) + return model -def RepVGG_B2g2(**kwargs): - return RepVGG( +def RepVGG_B2g2(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5], override_groups_map=g2_map, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B2g2"], use_ssld=use_ssld) + return model -def RepVGG_B2g4(**kwargs): - return RepVGG( +def RepVGG_B2g4(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[2.5, 2.5, 2.5, 5], override_groups_map=g4_map, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B2g4"], use_ssld=use_ssld) + return model -def RepVGG_B3(**kwargs): - return RepVGG( +def RepVGG_B3(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[3, 3, 3, 5], override_groups_map=None, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B3"], use_ssld=use_ssld) + return model -def RepVGG_B3g2(**kwargs): - return RepVGG( +def RepVGG_B3g2(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[3, 3, 3, 5], override_groups_map=g2_map, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B3g2"], use_ssld=use_ssld) + return model -def RepVGG_B3g4(**kwargs): - return RepVGG( +def RepVGG_B3g4(pretrained=False, use_ssld=False, **kwargs): + model = RepVGG( num_blocks=[4, 6, 16, 1], width_multiplier=[3, 3, 3, 5], override_groups_map=g4_map, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["RepVGG_B3g4"], use_ssld=use_ssld) + return model diff --git a/ppcls/arch/backbone/model_zoo/res2net.py b/ppcls/arch/backbone/model_zoo/res2net.py index c03d34ea8bdc34b201dac047a9a0e777530e85ba..15a9427c24ce23d27d97178a83beb074635ab86d 100644 --- a/ppcls/arch/backbone/model_zoo/res2net.py +++ b/ppcls/arch/backbone/model_zoo/res2net.py @@ -27,11 +27,13 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "Res2Net50_48w_2s", "Res2Net50_26w_4s", "Res2Net50_14w_8s", - "Res2Net50_48w_2s", "Res2Net50_26w_6s", "Res2Net50_26w_8s", - "Res2Net101_26w_4s", "Res2Net152_26w_4s", "Res2Net200_26w_4s" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"Res2Net50_26w_4s": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams", + "Res2Net50_14w_8s": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -232,41 +234,26 @@ class Res2Net(nn.Layer): return y -def Res2Net50_48w_2s(**args): - model = Res2Net(layers=50, scales=2, width=48, **args) - return model - - -def Res2Net50_26w_4s(**args): - model = Res2Net(layers=50, scales=4, width=26, **args) - return model - - -def Res2Net50_14w_8s(**args): - model = Res2Net(layers=50, scales=8, width=14, **args) - return model - - -def Res2Net50_26w_6s(**args): - model = Res2Net(layers=50, scales=6, width=26, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def Res2Net50_26w_4s(pretrained=False, use_ssld=False, **kwargs): + model = Res2Net(layers=50, scales=4, width=26, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Res2Net50_26w_4s"], use_ssld=use_ssld) return model -def Res2Net50_26w_8s(**args): - model = Res2Net(layers=50, scales=8, width=26, **args) - return model - - -def Res2Net101_26w_4s(**args): - model = Res2Net(layers=101, scales=4, width=26, **args) - return model - - -def Res2Net152_26w_4s(**args): - model = Res2Net(layers=152, scales=4, width=26, **args) - return model - - -def Res2Net200_26w_4s(**args): - model = Res2Net(layers=200, scales=4, width=26, **args) - return model +def Res2Net50_14w_8s(pretrained=False, use_ssld=False, **kwargs): + model = Res2Net(layers=50, scales=8, width=14, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Res2Net50_14w_8s"], use_ssld=use_ssld) + return model \ No newline at end of file diff --git a/ppcls/arch/backbone/model_zoo/res2net_vd.py b/ppcls/arch/backbone/model_zoo/res2net_vd.py index 9961d520f4544126e94f925654a3d4a0b9e686ee..28ab03a01c1986997d750ab3183c6add6ec7019a 100644 --- a/ppcls/arch/backbone/model_zoo/res2net_vd.py +++ b/ppcls/arch/backbone/model_zoo/res2net_vd.py @@ -27,11 +27,14 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "Res2Net50_vd_48w_2s", "Res2Net50_vd_26w_4s", "Res2Net50_vd_14w_8s", - "Res2Net50_vd_48w_2s", "Res2Net50_vd_26w_6s", "Res2Net50_vd_26w_8s", - "Res2Net101_vd_26w_4s", "Res2Net152_vd_26w_4s", "Res2Net200_vd_26w_4s" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"Res2Net50_vd_26w_4s": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams", + "Res2Net101_vd_26w_4s": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams", + "Res2Net200_vd_26w_4s": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -255,41 +258,32 @@ class Res2Net_vd(nn.Layer): return y -def Res2Net50_vd_48w_2s(**args): - model = Res2Net_vd(layers=50, scales=2, width=48, **args) - return model - - -def Res2Net50_vd_26w_4s(**args): - model = Res2Net_vd(layers=50, scales=4, width=26, **args) - return model - - -def Res2Net50_vd_14w_8s(**args): - model = Res2Net_vd(layers=50, scales=8, width=14, **args) - return model - - -def Res2Net50_vd_26w_6s(**args): - model = Res2Net_vd(layers=50, scales=6, width=26, **args) - return model +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) -def Res2Net50_vd_26w_8s(**args): - model = Res2Net_vd(layers=50, scales=8, width=26, **args) +def Res2Net50_vd_26w_4s(pretrained=False, use_ssld=False, **kwargs): + model = Res2Net_vd(layers=50, scales=4, width=26, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Res2Net50_vd_26w_4s"], use_ssld=use_ssld) return model -def Res2Net101_vd_26w_4s(**args): - model = Res2Net_vd(layers=101, scales=4, width=26, **args) +def Res2Net101_vd_26w_4s(pretrained=False, use_ssld=False, **kwargs): + model = Res2Net_vd(layers=101, scales=4, width=26, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Res2Net101_vd_26w_4s"], use_ssld=use_ssld) return model -def Res2Net152_vd_26w_4s(**args): - model = Res2Net_vd(layers=152, scales=4, width=26, **args) - return model - - -def Res2Net200_vd_26w_4s(**args): - model = Res2Net_vd(layers=200, scales=4, width=26, **args) - return model +def Res2Net200_vd_26w_4s(pretrained=False, use_ssld=False, **kwargs): + model = Res2Net_vd(layers=200, scales=4, width=26, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Res2Net200_vd_26w_4s"], use_ssld=use_ssld) + return model \ No newline at end of file diff --git a/ppcls/arch/backbone/model_zoo/resnest.py b/ppcls/arch/backbone/model_zoo/resnest.py index 4eeefddf44cc1ca89ed896e9556f191a13470b4c..3160095ef52a13b0db12b1da437a4bcd9c68c6e4 100644 --- a/ppcls/arch/backbone/model_zoo/resnest.py +++ b/ppcls/arch/backbone/model_zoo/resnest.py @@ -27,7 +27,14 @@ from paddle.nn import Conv2D, BatchNorm, Linear, Dropout from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.regularizer import L2Decay -__all__ = ["ResNeSt50_fast_1s1x64d", "ResNeSt50", "ResNeSt101"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"ResNeSt50_fast_1s1x64d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams", + "ResNeSt50": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams", + "ResNeSt101": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt101_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -656,8 +663,21 @@ class ResNeSt(nn.Layer): x = self.out(x) return x - -def ResNeSt50_fast_1s1x64d(**args): + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNeSt50_fast_1s1x64d(pretrained=False, use_ssld=False, **kwargs): model = ResNeSt( layers=[3, 4, 6, 3], radix=1, @@ -669,11 +689,12 @@ def ResNeSt50_fast_1s1x64d(**args): avd=True, avd_first=True, final_drop=0.0, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeSt50_fast_1s1x64d"], use_ssld=use_ssld) return model -def ResNeSt50(**args): +def ResNeSt50(pretrained=False, use_ssld=False, **kwargs): model = ResNeSt( layers=[3, 4, 6, 3], radix=2, @@ -685,11 +706,12 @@ def ResNeSt50(**args): avd=True, avd_first=False, final_drop=0.0, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeSt50"], use_ssld=use_ssld) return model -def ResNeSt101(**args): +def ResNeSt101(pretrained=False, use_ssld=False, **kwargs): model = ResNeSt( layers=[3, 4, 23, 3], radix=2, @@ -701,5 +723,6 @@ def ResNeSt101(**args): avd=True, avd_first=False, final_drop=0.0, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeSt101"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/resnet.py b/ppcls/arch/backbone/model_zoo/resnet.py index 879bbd35509d8f9a9382114ee55d26ea7ebf29ef..5f95de99f164e5a5583a987c58a41e57c9c60295 100644 --- a/ppcls/arch/backbone/model_zoo/resnet.py +++ b/ppcls/arch/backbone/model_zoo/resnet.py @@ -27,7 +27,16 @@ from paddle.nn.initializer import Uniform import math -__all__ = ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"ResNet18": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams", + "ResNet34": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams", + "ResNet50": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams", + "ResNet101": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams", + "ResNet152": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -290,27 +299,45 @@ class ResNet(nn.Layer): y = self.out(y) return y - -def ResNet18(**args): - model = ResNet(layers=18, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNet18(pretrained=False, use_ssld=False, **kwargs): + model = ResNet(layers=18, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet18"], use_ssld=use_ssld) return model -def ResNet34(**args): - model = ResNet(layers=34, **args) +def ResNet34(pretrained=False, use_ssld=False, **kwargs): + model = ResNet(layers=34, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet34"], use_ssld=use_ssld) return model -def ResNet50(**args): - model = ResNet(layers=50, **args) +def ResNet50(pretrained=False, use_ssld=False, **kwargs): + model = ResNet(layers=50, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet50"], use_ssld=use_ssld) return model -def ResNet101(**args): - model = ResNet(layers=101, **args) +def ResNet101(pretrained=False, use_ssld=False, **kwargs): + model = ResNet(layers=101, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet101"], use_ssld=use_ssld) return model -def ResNet152(**args): - model = ResNet(layers=152, **args) +def ResNet152(pretrained=False, use_ssld=False, **kwargs): + model = ResNet(layers=152, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet152"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/resnet_vc.py b/ppcls/arch/backbone/model_zoo/resnet_vc.py index 5a9374c1584649dc886f194338c534cc3278c1cc..53b9f8d5ebb3a44330389bbe0484235371c4f160 100644 --- a/ppcls/arch/backbone/model_zoo/resnet_vc.py +++ b/ppcls/arch/backbone/model_zoo/resnet_vc.py @@ -27,9 +27,13 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "ResNet18_vc", "ResNet34_vc", "ResNet50_vc", "ResNet101_vc", "ResNet152_vc" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ResNet50_vc": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -283,27 +287,22 @@ class ResNet_vc(nn.Layer): y = self.out(y) return y - -def ResNet18_vc(**args): - model = ResNet_vc(layers=18, **args) - return model - - -def ResNet34_vc(**args): - model = ResNet_vc(layers=34, **args) - return model - - -def ResNet50_vc(**args): - model = ResNet_vc(layers=50, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNet50_vc(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vc(layers=50, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet50_vc"], use_ssld=use_ssld) return model - -def ResNet101_vc(**args): - model = ResNet_vc(layers=101, **args) - return model - - -def ResNet152_vc(**args): - model = ResNet_vc(layers=152, **args) - return model diff --git a/ppcls/arch/backbone/model_zoo/resnet_vd.py b/ppcls/arch/backbone/model_zoo/resnet_vd.py index 5867261df341d00df15fe4c2e4a20584b6b0388d..cc491ad9dad0125fb17bbcc4722cf39cb45adee8 100644 --- a/ppcls/arch/backbone/model_zoo/resnet_vd.py +++ b/ppcls/arch/backbone/model_zoo/resnet_vd.py @@ -27,9 +27,18 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "ResNet18_vd", "ResNet34_vd", "ResNet50_vd", "ResNet101_vd", "ResNet152_vd" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ResNet18_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams", + "ResNet34_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams", + "ResNet50_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams", + "ResNet101_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams", + "ResNet152_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams", + "ResNet200_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -324,31 +333,50 @@ class ResNet_vd(nn.Layer): return y -def ResNet18_vd(**args): - model = ResNet_vd(layers=18, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNet18_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vd(layers=18, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet18_vd"], use_ssld=use_ssld) return model -def ResNet34_vd(**args): - model = ResNet_vd(layers=34, **args) +def ResNet34_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vd(layers=34, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet34_vd"], use_ssld=use_ssld) return model -def ResNet50_vd(**args): - model = ResNet_vd(layers=50, **args) +def ResNet50_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vd(layers=50, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet50_vd"], use_ssld=use_ssld) return model -def ResNet101_vd(**args): - model = ResNet_vd(layers=101, **args) +def ResNet101_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vd(layers=101, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet101_vd"], use_ssld=use_ssld) return model -def ResNet152_vd(**args): - model = ResNet_vd(layers=152, **args) +def ResNet152_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vd(layers=152, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet152_vd"], use_ssld=use_ssld) return model -def ResNet200_vd(**args): - model = ResNet_vd(layers=200, **args) +def ResNet200_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNet_vd(layers=200, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNet200_vd"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/resnext.py b/ppcls/arch/backbone/model_zoo/resnext.py index 66ab40f9b4634413ce1d14d0fede8536d3905bf8..5104b4cba9cb45e866a222a5e981de6f04627f20 100644 --- a/ppcls/arch/backbone/model_zoo/resnext.py +++ b/ppcls/arch/backbone/model_zoo/resnext.py @@ -27,10 +27,18 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "ResNeXt50_32x4d", "ResNeXt50_64x4d", "ResNeXt101_32x4d", - "ResNeXt101_64x4d", "ResNeXt152_32x4d", "ResNeXt152_64x4d" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ResNeXt50_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams", + "ResNeXt50_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams", + "ResNeXt101_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams", + "ResNeXt101_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams", + "ResNeXt152_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams", + "ResNeXt152_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -222,32 +230,51 @@ class ResNeXt(nn.Layer): y = self.out(y) return y - -def ResNeXt50_32x4d(**args): - model = ResNeXt(layers=50, cardinality=32, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNeXt50_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=50, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt50_32x4d"], use_ssld=use_ssld) return model -def ResNeXt50_64x4d(**args): - model = ResNeXt(layers=50, cardinality=64, **args) +def ResNeXt50_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=50, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt50_64x4d"], use_ssld=use_ssld) return model -def ResNeXt101_32x4d(**args): - model = ResNeXt(layers=101, cardinality=32, **args) +def ResNeXt101_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=101, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_32x4d"], use_ssld=use_ssld) return model -def ResNeXt101_64x4d(**args): - model = ResNeXt(layers=101, cardinality=64, **args) +def ResNeXt101_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=101, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_64x4d"], use_ssld=use_ssld) return model -def ResNeXt152_32x4d(**args): - model = ResNeXt(layers=152, cardinality=32, **args) +def ResNeXt152_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=152, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt152_32x4d"], use_ssld=use_ssld) return model -def ResNeXt152_64x4d(**args): - model = ResNeXt(layers=152, cardinality=64, **args) +def ResNeXt152_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=152, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt152_64x4d"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/resnext101_wsl.py b/ppcls/arch/backbone/model_zoo/resnext101_wsl.py index 7130812b1092c6974e0da6042d46e0c82be57e6f..e530a9a2b750ba547ea6f1fafb520c050da5b63f 100644 --- a/ppcls/arch/backbone/model_zoo/resnext101_wsl.py +++ b/ppcls/arch/backbone/model_zoo/resnext101_wsl.py @@ -6,10 +6,18 @@ from paddle.nn import Conv2D, BatchNorm, Linear, Dropout from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform -__all__ = [ - "ResNeXt101_32x8d_wsl", "ResNeXt101_32x16d_wsl", "ResNeXt101_32x32d_wsl", - "ResNeXt101_32x48d_wsl" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ResNeXt101_32x8d_wsl": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams", + "ResNeXt101_32x16d_wsl": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x816_wsl_pretrained.pdparams", + "ResNeXt101_32x32d_wsl": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams", + "ResNeXt101_32x48d_wsl": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams", + + } + +__all__ = list(MODEL_URLS.keys()) + class ConvBNLayer(nn.Layer): @@ -426,22 +434,39 @@ class ResNeXt101WSL(nn.Layer): x = self._out(x) return x - -def ResNeXt101_32x8d_wsl(**args): - model = ResNeXt101WSL(cardinality=32, width=8, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNeXt101_32x8d_wsl(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt101WSL(cardinality=32, width=8, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_32x8d_wsl"], use_ssld=use_ssld) return model def ResNeXt101_32x16d_wsl(**args): - model = ResNeXt101WSL(cardinality=32, width=16, **args) + model = ResNeXt101WSL(cardinality=32, width=16, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_32x16d_ws"], use_ssld=use_ssld) return model def ResNeXt101_32x32d_wsl(**args): - model = ResNeXt101WSL(cardinality=32, width=32, **args) + model = ResNeXt101WSL(cardinality=32, width=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_32x32d_wsl"], use_ssld=use_ssld) return model def ResNeXt101_32x48d_wsl(**args): - model = ResNeXt101WSL(cardinality=32, width=48, **args) + model = ResNeXt101WSL(cardinality=32, width=48, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_32x48d_wsl"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/resnext_vd.py b/ppcls/arch/backbone/model_zoo/resnext_vd.py index a05b645232f788b3810205724c347c40073a7e8f..b14e265e909a3ac54c6db222063869316c5697e5 100644 --- a/ppcls/arch/backbone/model_zoo/resnext_vd.py +++ b/ppcls/arch/backbone/model_zoo/resnext_vd.py @@ -27,11 +27,18 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "ResNeXt50_vd_32x4d", "ResNeXt50_vd_64x4d", "ResNeXt101_vd_32x4d", - "ResNeXt101_vd_64x4d", "ResNeXt152_vd_32x4d", "ResNeXt152_vd_64x4d" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url +MODEL_URLS = { + "ResNeXt50_vd_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams", + "ResNeXt50_vd_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams", + "ResNeXt101_vd_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams", + "ResNeXt101_vd_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams", + "ResNeXt152_vd_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams", + "ResNeXt152_vd_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): def __init__( @@ -235,32 +242,50 @@ class ResNeXt(nn.Layer): y = self.out(y) return y - -def ResNeXt50_vd_32x4d(**args): - model = ResNeXt(layers=50, cardinality=32, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ResNeXt50_vd_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=50, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt50_vd_32x4d"], use_ssld=use_ssld) return model -def ResNeXt50_vd_64x4d(**args): - model = ResNeXt(layers=50, cardinality=64, **args) +def ResNeXt50_vd_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=50, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt50_vd_64x4d"], use_ssld=use_ssld) return model -def ResNeXt101_vd_32x4d(**args): - model = ResNeXt(layers=101, cardinality=32, **args) +def ResNeXt101_vd_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=101, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_vd_32x4d"], use_ssld=use_ssld) return model -def ResNeXt101_vd_64x4d(**args): - model = ResNeXt(layers=101, cardinality=64, **args) +def ResNeXt101_vd_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=101, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt101_vd_64x4d"], use_ssld=use_ssld) return model -def ResNeXt152_vd_32x4d(**args): - model = ResNeXt(layers=152, cardinality=32, **args) +def ResNeXt152_vd_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=152, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt152_vd_32x4d"], use_ssld=use_ssld) return model -def ResNeXt152_vd_64x4d(**args): - model = ResNeXt(layers=152, cardinality=64, **args) +def ResNeXt152_vd_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=152, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ResNeXt152_vd_64x4d"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/rexnet.py b/ppcls/arch/backbone/model_zoo/rexnet.py index ac0382374f8f7d95799e7bd30828ed60246dc240..799826c9443ac4aace47143408652b83659fec28 100644 --- a/ppcls/arch/backbone/model_zoo/rexnet.py +++ b/ppcls/arch/backbone/model_zoo/rexnet.py @@ -22,9 +22,17 @@ from paddle import ParamAttr import paddle.nn as nn from math import ceil -__all__ = [ - "ReXNet_1_0", "ReXNet_1_3", "ReXNet_1_5", "ReXNet_2_0", "ReXNet_3_0" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ReXNet_1_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams", + "ReXNet_1_3": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams", + "ReXNet_1_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_32x4d_pretrained.pdparams", + "ReXNet_2_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams", + "ReXNet_3_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) def conv_bn_act(out, @@ -220,21 +228,44 @@ class ReXNetV1(nn.Layer): return x -def ReXNet_1_0(**args): - return ReXNetV1(width_mult=1.0, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ReXNet_1_0(pretrained=False, use_ssld=False, **kwargs): + model = ReXNetV1(width_mult=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ReXNet_1_0"], use_ssld=use_ssld) + return model -def ReXNet_1_3(**args): - return ReXNetV1(width_mult=1.3, **args) +def ReXNet_1_3(pretrained=False, use_ssld=False, **kwargs): + model = ReXNetV1(width_mult=1.3, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ReXNet_1_3"], use_ssld=use_ssld) + return model -def ReXNet_1_5(**args): - return ReXNetV1(width_mult=1.5, **args) +def ReXNet_1_5(pretrained=False, use_ssld=False, **kwargs): + model = ReXNetV1(width_mult=1.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ReXNet_1_5"], use_ssld=use_ssld) + return model -def ReXNet_2_0(**args): - return ReXNetV1(width_mult=2.0, **args) +def ReXNet_2_0(pretrained=False, use_ssld=False, **kwargs): + model = ReXNetV1(width_mult=2.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ReXNet_2_0"], use_ssld=use_ssld) + return model -def ReXNet_3_0(**args): - return ReXNetV1(width_mult=3.0, **args) +def ReXNet_3_0(pretrained=False, use_ssld=False, **kwargs): + model = ReXNetV1(width_mult=3.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ReXNet_3_0"], use_ssld=use_ssld) + return model \ No newline at end of file diff --git a/ppcls/arch/backbone/model_zoo/se_resnet_vd.py b/ppcls/arch/backbone/model_zoo/se_resnet_vd.py index 5b7a587cdc5cd910dd47ef035c35ea70de166fb5..cc48f8d362f609ec8a236c5845171eb1aff43e88 100644 --- a/ppcls/arch/backbone/model_zoo/se_resnet_vd.py +++ b/ppcls/arch/backbone/model_zoo/se_resnet_vd.py @@ -26,10 +26,16 @@ from paddle.nn.initializer import Uniform import math -__all__ = [ - "SE_ResNet18_vd", "SE_ResNet34_vd", "SE_ResNet50_vd", "SE_ResNet101_vd", - "SE_ResNet152_vd", "SE_ResNet200_vd" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "SE_ResNet18_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams", + "SE_ResNet34_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams", + "SE_ResNet50_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams", + + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -347,32 +353,33 @@ class SE_ResNet_vd(nn.Layer): y = self.out(y) return y - -def SE_ResNet18_vd(**args): - model = SE_ResNet_vd(layers=18, **args) - return model - - -def SE_ResNet34_vd(**args): - model = SE_ResNet_vd(layers=34, **args) - return model - - -def SE_ResNet50_vd(**args): - model = SE_ResNet_vd(layers=50, **args) - return model - - -def SE_ResNet101_vd(**args): - model = SE_ResNet_vd(layers=101, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def SE_ResNet18_vd(pretrained=False, use_ssld=False, **kwargs): + model = SE_ResNet_vd(layers=18, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNet18_vd"], use_ssld=use_ssld) return model -def SE_ResNet152_vd(**args): - model = SE_ResNet_vd(layers=152, **args) +def SE_ResNet34_vd(pretrained=False, use_ssld=False, **kwargs): + model = SE_ResNet_vd(layers=34, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNet34_vd"], use_ssld=use_ssld) return model -def SE_ResNet200_vd(**args): - model = SE_ResNet_vd(layers=200, **args) +def SE_ResNet50_vd(pretrained=False, use_ssld=False, **kwargs): + model = SE_ResNet_vd(layers=50, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNet50_vd"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/se_resnext.py b/ppcls/arch/backbone/model_zoo/se_resnext.py index df7cb021efdfb6365604d51df7cec3115c254fef..d873d81111cc48c1055dc203f6cab674ffb91b88 100644 --- a/ppcls/arch/backbone/model_zoo/se_resnext.py +++ b/ppcls/arch/backbone/model_zoo/se_resnext.py @@ -27,7 +27,16 @@ from paddle.nn.initializer import Uniform import math -__all__ = ["SE_ResNeXt50_32x4d", "SE_ResNeXt101_32x4d", "SE_ResNeXt152_64x4d"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "SE_ResNeXt50_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams", + "SE_ResNeXt101_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams", + "SE_ResNeXt152_64x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt152_64x4d_pretrained.pdparams", + + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -301,17 +310,33 @@ class ResNeXt(nn.Layer): y = self.out(y) return y - -def SE_ResNeXt50_32x4d(**args): - model = ResNeXt(layers=50, cardinality=32, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def SE_ResNeXt50_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=50, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNeXt50_32x4d"], use_ssld=use_ssld) return model -def SE_ResNeXt101_32x4d(**args): - model = ResNeXt(layers=101, cardinality=32, **args) +def SE_ResNeXt101_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=101, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNeXt101_32x4d"], use_ssld=use_ssld) return model -def SE_ResNeXt152_64x4d(**args): - model = ResNeXt(layers=152, cardinality=64, **args) +def SE_ResNeXt152_64x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=152, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNeXt152_64x4d"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/se_resnext_vd.py b/ppcls/arch/backbone/model_zoo/se_resnext_vd.py index 56cbe83d24901f6a37a9a17a872462eadc3dc854..5e840f83d3c279e004782400388f662b3f589d39 100644 --- a/ppcls/arch/backbone/model_zoo/se_resnext_vd.py +++ b/ppcls/arch/backbone/model_zoo/se_resnext_vd.py @@ -27,7 +27,16 @@ from paddle.nn.initializer import Uniform import math -__all__ = ["SE_ResNeXt50_vd_32x4d", "SE_ResNeXt50_vd_32x4d", "SENet154_vd"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "SE_ResNeXt50_vd_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams", + "SE_ResNeXt50_vd_32x4d": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams", + "SENet154_vd": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams", + + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -269,17 +278,33 @@ class ResNeXt(nn.Layer): y = self.out(y) return y - -def SE_ResNeXt50_vd_32x4d(**args): - model = ResNeXt(layers=50, cardinality=32, **args) + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def SE_ResNeXt50_vd_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=50, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNeXt50_vd_32x4d"], use_ssld=use_ssld) return model -def SE_ResNeXt101_vd_32x4d(**args): - model = ResNeXt(layers=101, cardinality=32, **args) +def SE_ResNeXt101_vd_32x4d(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=101, cardinality=32, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SE_ResNeXt101_vd_32x4d"], use_ssld=use_ssld) return model -def SENet154_vd(**args): - model = ResNeXt(layers=152, cardinality=64, **args) +def SENet154_vd(pretrained=False, use_ssld=False, **kwargs): + model = ResNeXt(layers=152, cardinality=64, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SENet154_vd"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/shufflenet_v2.py b/ppcls/arch/backbone/model_zoo/shufflenet_v2.py index ac8898f6cbbdd0569e2163b1eb1aa0b64fff445d..29abad66ef750dafdebf5cf91607f34743756e58 100644 --- a/ppcls/arch/backbone/model_zoo/shufflenet_v2.py +++ b/ppcls/arch/backbone/model_zoo/shufflenet_v2.py @@ -22,11 +22,19 @@ from paddle.nn import Layer, Conv2D, MaxPool2D, AdaptiveAvgPool2D, BatchNorm, Li from paddle.nn.initializer import KaimingNormal from paddle.nn.functional import swish -__all__ = [ - "ShuffleNetV2_x0_25", "ShuffleNetV2_x0_33", "ShuffleNetV2_x0_5", - "ShuffleNetV2_x1_0", "ShuffleNetV2_x1_5", "ShuffleNetV2_x2_0", - "ShuffleNetV2_swish" -] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ShuffleNetV2_x0_25": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams", + "ShuffleNetV2_x0_33": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams", + "ShuffleNetV2_x0_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams", + "ShuffleNetV2_x1_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams", + "ShuffleNetV2_x1_5": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams", + "ShuffleNetV2_x2_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams", + "ShuffleNetV2_swish": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams" + } + +__all__ = list(MODEL_URLS.keys()) def channel_shuffle(x, groups): @@ -285,36 +293,56 @@ class ShuffleNet(Layer): return y -def ShuffleNetV2_x0_25(**args): - model = ShuffleNet(scale=0.25, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def ShuffleNetV2_x0_25(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=0.25, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_x0_25"], use_ssld=use_ssld) return model -def ShuffleNetV2_x0_33(**args): - model = ShuffleNet(scale=0.33, **args) +def ShuffleNetV2_x0_33(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=0.33, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_x0_33"], use_ssld=use_ssld) return model -def ShuffleNetV2_x0_5(**args): - model = ShuffleNet(scale=0.5, **args) +def ShuffleNetV2_x0_5(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_x0_5"], use_ssld=use_ssld) return model -def ShuffleNetV2_x1_0(**args): - model = ShuffleNet(scale=1.0, **args) +def ShuffleNetV2_x1_0(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=1.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_x1_0"], use_ssld=use_ssld) return model -def ShuffleNetV2_x1_5(**args): - model = ShuffleNet(scale=1.5, **args) +def ShuffleNetV2_x1_5(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=1.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_x1_5"], use_ssld=use_ssld) return model -def ShuffleNetV2_x2_0(**args): - model = ShuffleNet(scale=2.0, **args) +def ShuffleNetV2_x2_0(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=2.0, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_x2_0"], use_ssld=use_ssld) return model -def ShuffleNetV2_swish(**args): - model = ShuffleNet(scale=1.0, act="swish", **args) +def ShuffleNetV2_swish(pretrained=False, use_ssld=False, **kwargs): + model = ShuffleNet(scale=1.0, act="swish", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ShuffleNetV2_swish"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/squeezenet.py b/ppcls/arch/backbone/model_zoo/squeezenet.py index eb380f252ace998cc14072117911e7e1e8b0cc3a..a88a1bcff6acb4bc62289b739b571996892fc57d 100644 --- a/ppcls/arch/backbone/model_zoo/squeezenet.py +++ b/ppcls/arch/backbone/model_zoo/squeezenet.py @@ -1,3 +1,17 @@ +# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import paddle from paddle import ParamAttr import paddle.nn as nn @@ -5,7 +19,14 @@ import paddle.nn.functional as F from paddle.nn import Conv2D, BatchNorm, Linear, Dropout from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D -__all__ = ["SqueezeNet1_0", "SqueezeNet1_1"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "SqueezeNet1_0": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams", + "SqueezeNet1_1": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class MakeFireConv(nn.Layer): @@ -143,12 +164,26 @@ class SqueezeNet(nn.Layer): x = paddle.squeeze(x, axis=[2, 3]) return x - -def SqueezeNet1_0(**args): - model = SqueezeNet(version="1.0", **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def SqueezeNet1_0(pretrained=False, use_ssld=False, **kwargs): + model = SqueezeNet(version="1.0", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SqueezeNet1_0"], use_ssld=use_ssld) return model -def SqueezeNet1_1(**args): - model = SqueezeNet(version="1.1", **args) +def SqueezeNet1_1(pretrained=False, use_ssld=False, **kwargs): + model = SqueezeNet(version="1.1", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SqueezeNet1_1"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/swin_transformer.py b/ppcls/arch/backbone/model_zoo/swin_transformer.py index 97efbd1f341d69fd601bfb86697d4f1cb06ca022..a33bf588838ac51a4f52124cbb48bbc5fda4886d 100644 --- a/ppcls/arch/backbone/model_zoo/swin_transformer.py +++ b/ppcls/arch/backbone/model_zoo/swin_transformer.py @@ -21,6 +21,19 @@ from paddle.nn.initializer import TruncatedNormal, Constant from .vision_transformer import trunc_normal_, zeros_, ones_, to_2tuple, DropPath, Identity +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "SwinTransformer_tiny_patch4_window7_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams", + "SwinTransformer_small_patch4_window7_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams", + "SwinTransformer_base_patch4_window7_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams", + "SwinTransformer_base_patch4_window12_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams", + "SwinTransformer_large_patch4_window7_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_pretrained.pdparams", + "SwinTransformer_large_patch4_window12_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) + class Mlp(nn.Layer): def __init__(self, @@ -716,40 +729,56 @@ class SwinTransformer(nn.Layer): flops += self.num_features * self.num_classes return flops - -def SwinTransformer_tiny_patch4_window7_224(**args): + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def SwinTransformer_tiny_patch4_window7_224(pretrained=False, use_ssld=False, **kwargs): model = SwinTransformer( embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, drop_path_rate=0.2, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SwinTransformer_tiny_patch4_window7_224"], use_ssld=use_ssld) return model -def SwinTransformer_small_patch4_window7_224(**args): +def SwinTransformer_small_patch4_window7_224(pretrained=False, use_ssld=False, **kwargs): model = SwinTransformer( embed_dim=96, depths=[2, 2, 18, 2], num_heads=[3, 6, 12, 24], window_size=7, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SwinTransformer_small_patch4_window7_224"], use_ssld=use_ssld) return model -def SwinTransformer_base_patch4_window7_224(**args): +def SwinTransformer_base_patch4_window7_224(pretrained=False, use_ssld=False, **kwargs): model = SwinTransformer( embed_dim=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], window_size=7, drop_path_rate=0.5, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SwinTransformer_base_patch4_window7_224"], use_ssld=use_ssld) return model -def SwinTransformer_base_patch4_window12_384(**args): +def SwinTransformer_base_patch4_window12_384(pretrained=False, use_ssld=False, **kwargs): model = SwinTransformer( img_size=384, embed_dim=128, @@ -757,26 +786,29 @@ def SwinTransformer_base_patch4_window12_384(**args): num_heads=[4, 8, 16, 32], window_size=12, drop_path_rate=0.5, # NOTE: do not appear in offical code - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SwinTransformer_base_patch4_window12_384"], use_ssld=use_ssld) return model -def SwinTransformer_large_patch4_window7_224(**args): +def SwinTransformer_large_patch4_window7_224(pretrained=False, use_ssld=False, **kwargs): model = SwinTransformer( embed_dim=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], window_size=7, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SwinTransformer_large_patch4_window7_224"], use_ssld=use_ssld) return model -def SwinTransformer_large_patch4_window12_384(**args): +def SwinTransformer_large_patch4_window12_384(pretrained=False, use_ssld=False, **kwargs): model = SwinTransformer( img_size=384, embed_dim=192, depths=[2, 2, 18, 2], num_heads=[6, 12, 24, 48], window_size=12, - **args) + **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["SwinTransformer_large_patch4_window12_384"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/tnt.py b/ppcls/arch/backbone/model_zoo/tnt.py new file mode 100644 index 0000000000000000000000000000000000000000..61f1083e4cadd268ca2022a7908935db33138797 --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/tnt.py @@ -0,0 +1,318 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import numpy as np + +import paddle +import paddle.nn as nn + +from paddle.nn.initializer import TruncatedNormal, Constant + +from ppcls.arch.backbone.base.theseus_layer import Identity +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + + +MODEL_URLS = { + "TNT_small": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams" +} + + +__all__ = MODEL_URLS.keys() + + +trunc_normal_ = TruncatedNormal(std=.02) +zeros_ = Constant(value=0.) +ones_ = Constant(value=1.) + + +def drop_path(x, drop_prob=0., training=False): + """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). + the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... + See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... + """ + if drop_prob == 0. or not training: + return x + keep_prob = paddle.to_tensor(1 - drop_prob) + shape = (paddle.shape(x)[0], ) + (1, ) * (x.ndim - 1) + random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype) + random_tensor = paddle.floor(random_tensor) # binarize + output = x.divide(keep_prob) * random_tensor + return output + + +class DropPath(nn.Layer): + """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). + """ + + def __init__(self, drop_prob=None): + super(DropPath, self).__init__() + self.drop_prob = drop_prob + + def forward(self, x): + return drop_path(x, self.drop_prob, self.training) + + +class Mlp(nn.Layer): + def __init__(self, in_features, hidden_features=None, + out_features=None, act_layer=nn.GELU, drop=0.): + super().__init__() + out_features = out_features or in_features + hidden_features = hidden_features or in_features + self.fc1 = nn.Linear(in_features, hidden_features) + self.act = act_layer() + self.fc2 = nn.Linear(hidden_features, out_features) + self.drop = nn.Dropout(drop) + + def forward(self, x): + x = self.fc1(x) + x = self.act(x) + x = self.drop(x) + x = self.fc2(x) + x = self.drop(x) + return x + + +class Attention(nn.Layer): + def __init__(self, dim, hidden_dim, num_heads=8, + qkv_bias=False, attn_drop=0., proj_drop=0.): + super().__init__() + self.hidden_dim = hidden_dim + self.num_heads = num_heads + head_dim = hidden_dim // num_heads + self.head_dim = head_dim + self.scale = head_dim ** -0.5 + + self.qk = nn.Linear(dim, hidden_dim * 2, bias_attr=qkv_bias) + self.v = nn.Linear(dim, dim, bias_attr=qkv_bias) + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(dim, dim) + self.proj_drop = nn.Dropout(proj_drop) + + def forward(self, x): + B, N, C = x.shape + qk = self.qk(x).reshape((B, N, 2, self.num_heads, self.head_dim)).transpose((2, 0, 3, 1, 4)) + + q, k = qk[0], qk[1] + v = self.v(x).reshape((B, N, self.num_heads, -1)).transpose((0, 2, 1, 3)) + + attn = (q @ k.transpose((0, 1, 3, 2))) * self.scale + attn = nn.functional.softmax(attn, axis=-1) + attn = self.attn_drop(attn) + + x = (attn @ v).transpose((0, 2, 1, 3)).reshape((B, N, -1)) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class Block(nn.Layer): + def __init__(self, dim, in_dim, num_pixel, num_heads=12, in_num_head=4, mlp_ratio=4., + qkv_bias=False, drop=0., attn_drop=0., drop_path=0., act_layer=nn.GELU, + norm_layer=nn.LayerNorm): + super().__init__() + # Inner transformer + self.norm_in = norm_layer(in_dim) + self.attn_in = Attention( + in_dim, in_dim, num_heads=in_num_head, + qkv_bias=qkv_bias, attn_drop=attn_drop, + proj_drop=drop + ) + + self.norm_mlp_in = norm_layer(in_dim) + self.mlp_in = Mlp( + in_features=in_dim, hidden_features=int(in_dim * 4), + out_features=in_dim, act_layer=act_layer, drop=drop + ) + + self.norm1_proj = norm_layer(in_dim) + self.proj = nn.Linear(in_dim * num_pixel, dim) + # Outer transformer + self.norm_out = norm_layer(dim) + self.attn_out = Attention( + dim, dim, num_heads=num_heads, qkv_bias=qkv_bias, + attn_drop=attn_drop, proj_drop=drop + ) + + self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity() + + self.norm_mlp = norm_layer(dim) + self.mlp = Mlp( + in_features=dim, hidden_features=int(dim * mlp_ratio), + out_features=dim, act_layer=act_layer, drop=drop + ) + + def forward(self, pixel_embed, patch_embed): + # inner + pixel_embed = pixel_embed + self.drop_path(self.attn_in(self.norm_in(pixel_embed))) + pixel_embed = pixel_embed + self.drop_path(self.mlp_in(self.norm_mlp_in(pixel_embed))) + # outer + B, N, C = patch_embed.shape + patch_embed[:, 1:] = patch_embed[:, 1:] + self.proj(self.norm1_proj(pixel_embed).reshape((B, N - 1, -1))) + patch_embed = patch_embed + self.drop_path(self.attn_out(self.norm_out(patch_embed))) + patch_embed = patch_embed + self.drop_path(self.mlp(self.norm_mlp(patch_embed))) + return pixel_embed, patch_embed + + +class PixelEmbed(nn.Layer): + def __init__(self, img_size=224, patch_size=16, in_chans=3, in_dim=48, stride=4): + super().__init__() + num_patches = (img_size // patch_size) ** 2 + self.img_size = img_size + self.num_patches = num_patches + self.in_dim = in_dim + new_patch_size = math.ceil(patch_size / stride) + self.new_patch_size = new_patch_size + + self.proj = nn.Conv2D( + in_chans, self.in_dim, + kernel_size=7, padding=3, + stride=stride + ) + + def forward(self, x, pixel_pos): + B, C, H, W = x.shape + assert H == self.img_size and W == self.img_size, f"Input image size ({H}*{W}) doesn't match model ({self.img_size}*{self.img_size})." + + x = self.proj(x) + x = nn.functional.unfold(x, self.new_patch_size, self.new_patch_size) + x = x.transpose((0, 2, 1)).reshape((B * self.num_patches, self.in_dim, self.new_patch_size, self.new_patch_size)) + x = x + pixel_pos + x = x.reshape((B * self.num_patches, self.in_dim, -1)).transpose((0, 2, 1)) + return x + + +class TNT(nn.Layer): + def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, in_dim=48, depth=12, + num_heads=12, in_num_head=4, mlp_ratio=4., qkv_bias=False, drop_rate=0., attn_drop_rate=0., + drop_path_rate=0., norm_layer=nn.LayerNorm, first_stride=4, class_dim=1000): + super().__init__() + self.class_dim = class_dim + # num_features for consistency with other models + self.num_features = self.embed_dim = embed_dim + + self.pixel_embed = PixelEmbed( + img_size=img_size, patch_size=patch_size, + in_chans=in_chans, in_dim=in_dim, stride=first_stride + ) + num_patches = self.pixel_embed.num_patches + self.num_patches = num_patches + new_patch_size = self.pixel_embed.new_patch_size + num_pixel = new_patch_size ** 2 + + self.norm1_proj = norm_layer(num_pixel * in_dim) + self.proj = nn.Linear(num_pixel * in_dim, embed_dim) + self.norm2_proj = norm_layer(embed_dim) + + self.cls_token = self.create_parameter( + shape=(1, 1, embed_dim), + default_initializer=zeros_ + ) + self.add_parameter("cls_token", self.cls_token) + + self.patch_pos = self.create_parameter( + shape=(1, num_patches + 1, embed_dim), + default_initializer=zeros_ + ) + self.add_parameter("patch_pos", self.patch_pos) + + self.pixel_pos = self.create_parameter( + shape=(1, in_dim, new_patch_size, new_patch_size), + default_initializer=zeros_ + ) + self.add_parameter("pixel_pos", self.pixel_pos) + + self.pos_drop = nn.Dropout(p=drop_rate) + + # stochastic depth decay rule + dpr = np.linspace(0, drop_path_rate, depth) + + blocks = [] + for i in range(depth): + blocks.append(Block( + dim=embed_dim, in_dim=in_dim, num_pixel=num_pixel, num_heads=num_heads, + in_num_head=in_num_head, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, + drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], + norm_layer=norm_layer + )) + self.blocks = nn.LayerList(blocks) + self.norm = norm_layer(embed_dim) + + if class_dim > 0: + self.head = nn.Linear(embed_dim, class_dim) + + trunc_normal_(self.cls_token) + trunc_normal_(self.patch_pos) + trunc_normal_(self.pixel_pos) + self.apply(self._init_weights) + + def _init_weights(self, m): + if isinstance(m, nn.Linear): + trunc_normal_(m.weight) + if isinstance(m, nn.Linear) and m.bias is not None: + zeros_(m.bias) + elif isinstance(m, nn.LayerNorm): + zeros_(m.bias) + ones_(m.weight) + + def forward_features(self, x): + B = x.shape[0] + pixel_embed = self.pixel_embed(x, self.pixel_pos) + + patch_embed = self.norm2_proj(self.proj(self.norm1_proj(pixel_embed.reshape((B, self.num_patches, -1))))) + patch_embed = paddle.concat((self.cls_token.expand((B, -1, -1)), patch_embed), axis=1) + patch_embed = patch_embed + self.patch_pos + patch_embed = self.pos_drop(patch_embed) + + for blk in self.blocks: + pixel_embed, patch_embed = blk(pixel_embed, patch_embed) + + patch_embed = self.norm(patch_embed) + return patch_embed[:, 0] + + def forward(self, x): + x = self.forward_features(x) + + if self.class_dim > 0: + x = self.head(x) + return x + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def TNT_small(pretrained=False, **kwargs): + model = TNT( + patch_size=16, + embed_dim=384, + in_dim=24, + depth=12, + num_heads=6, + in_num_head=4, + qkv_bias=False, + **kwargs + ) + _load_pretrained(pretrained, model, MODEL_URLS["TNT_small"]) + return model diff --git a/ppcls/arch/backbone/model_zoo/vgg.py b/ppcls/arch/backbone/model_zoo/vgg.py index fe00d9feead07d2775f7736df4e1fc92a2c4d57f..9e737590d2e0d821126745276248317177b2239d 100644 --- a/ppcls/arch/backbone/model_zoo/vgg.py +++ b/ppcls/arch/backbone/model_zoo/vgg.py @@ -5,7 +5,16 @@ import paddle.nn.functional as F from paddle.nn import Conv2D, BatchNorm, Linear, Dropout from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D -__all__ = ["VGG11", "VGG13", "VGG16", "VGG19"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "VGG11": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG11_pretrained.pdparams", + "VGG13": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG13_pretrained.pdparams", + "VGG16": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG16_pretrained.pdparams", + "VGG19": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG19_pretrained.pdparams", + } + +__all__ = list(MODEL_URLS.keys()) class ConvBlock(nn.Layer): @@ -131,22 +140,40 @@ class VGGNet(nn.Layer): x = self._out(x) return x - -def VGG11(**args): - model = VGGNet(layers=11, **args) + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def VGG11(pretrained, model, model_url, use_ssld=False, **kwargs): + model = VGGNet(layers=11, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["VGG11"], use_ssld=use_ssld) return model -def VGG13(**args): - model = VGGNet(layers=13, **args) +def VGG13(pretrained, model, model_url, use_ssld=False, **kwargs): + model = VGGNet(layers=13, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["VGG13"], use_ssld=use_ssld) return model -def VGG16(**args): - model = VGGNet(layers=16, **args) +def VGG16(pretrained, model, model_url, use_ssld=False, **kwargs): + model = VGGNet(layers=16, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["VGG16"], use_ssld=use_ssld) return model -def VGG19(**args): - model = VGGNet(layers=19, **args) +def VGG19(pretrained, model, model_url, use_ssld=False, **kwargs): + model = VGGNet(layers=19, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["VGG19"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/vision_transformer.py b/ppcls/arch/backbone/model_zoo/vision_transformer.py index 32f198913d59014326957cc6fe7f9b325a59ef28..4bb39890f79fa407619475332475e79f09322b15 100644 --- a/ppcls/arch/backbone/model_zoo/vision_transformer.py +++ b/ppcls/arch/backbone/model_zoo/vision_transformer.py @@ -12,19 +12,32 @@ # See the License for the specific language governing permissions and # limitations under the License. +from collections import Callable + import numpy as np import paddle import paddle.nn as nn -from paddle.nn.initializer import TruncatedNormal, Constant +from paddle.nn.initializer import TruncatedNormal, Constant, Normal + +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "ViT_small_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams", + "ViT_base_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams", + "ViT_base_patch16_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams", + "ViT_base_patch32_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams", + "ViT_large_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams", + "ViT_large_patch16_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams", + "ViT_large_patch32_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams", + "ViT_huge_patch16_224": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_huge_patch16_224_pretrained.pdparams", + "ViT_huge_patch32_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_huge_patch32_384_pretrained.pdparams" + } + +__all__ = list(MODEL_URLS.keys()) -__all__ = [ - "VisionTransformer", "ViT_small_patch16_224", "ViT_base_patch16_224", - "ViT_base_patch16_384", "ViT_base_patch32_384", "ViT_large_patch16_224", - "ViT_large_patch16_384", "ViT_large_patch32_384", "ViT_huge_patch16_224", - "ViT_huge_patch32_384" -] trunc_normal_ = TruncatedNormal(std=.02) +normal_ = Normal zeros_ = Constant(value=0.) ones_ = Constant(value=1.) @@ -141,7 +154,13 @@ class Block(nn.Layer): norm_layer='nn.LayerNorm', epsilon=1e-5): super().__init__() - self.norm1 = eval(norm_layer)(dim, epsilon=epsilon) + if isinstance(norm_layer, str): + self.norm1 = eval(norm_layer)(dim, epsilon=epsilon) + elif isinstance(norm_layer, Callable): + self.norm1 = norm_layer(dim) + else: + raise TypeError( + "The norm_layer must be str or paddle.nn.layer.Layer class") self.attn = Attention( dim, num_heads=num_heads, @@ -151,7 +170,13 @@ class Block(nn.Layer): proj_drop=drop) # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity() - self.norm2 = eval(norm_layer)(dim, epsilon=epsilon) + if isinstance(norm_layer, str): + self.norm2 = eval(norm_layer)(dim, epsilon=epsilon) + elif isinstance(norm_layer, Callable): + self.norm2 = norm_layer(dim) + else: + raise TypeError( + "The norm_layer must be str or paddle.nn.layer.Layer class") mlp_hidden_dim = int(dim * mlp_ratio) self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, @@ -285,7 +310,21 @@ class VisionTransformer(nn.Layer): return x -def ViT_small_patch16_224(**kwargs): +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + + +def ViT_small_patch16_224(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=768, @@ -294,10 +333,12 @@ def ViT_small_patch16_224(**kwargs): mlp_ratio=3, qk_scale=768**-0.5, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_small_patch16_224"], use_ssld=use_ssld) return model -def ViT_base_patch16_224(**kwargs): + +def ViT_base_patch16_224(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=768, @@ -307,10 +348,11 @@ def ViT_base_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_base_patch16_224"], use_ssld=use_ssld) return model -def ViT_base_patch16_384(**kwargs): +def ViT_base_patch16_384(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( img_size=384, patch_size=16, @@ -321,10 +363,11 @@ def ViT_base_patch16_384(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_base_patch16_384"], use_ssld=use_ssld) return model -def ViT_base_patch32_384(**kwargs): +def ViT_base_patch32_384(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( img_size=384, patch_size=32, @@ -335,10 +378,11 @@ def ViT_base_patch32_384(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_base_patch32_384"], use_ssld=use_ssld) return model -def ViT_large_patch16_224(**kwargs): +def ViT_large_patch16_224(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=1024, @@ -348,10 +392,11 @@ def ViT_large_patch16_224(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_large_patch16_224"], use_ssld=use_ssld) return model -def ViT_large_patch16_384(**kwargs): +def ViT_large_patch16_384(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( img_size=384, patch_size=16, @@ -362,10 +407,11 @@ def ViT_large_patch16_384(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_large_patch16_384"], use_ssld=use_ssld) return model -def ViT_large_patch32_384(**kwargs): +def ViT_large_patch32_384(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( img_size=384, patch_size=32, @@ -376,10 +422,11 @@ def ViT_large_patch32_384(**kwargs): qkv_bias=True, epsilon=1e-6, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_large_patch32_384"], use_ssld=use_ssld) return model -def ViT_huge_patch16_224(**kwargs): +def ViT_huge_patch16_224(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( patch_size=16, embed_dim=1280, @@ -387,10 +434,11 @@ def ViT_huge_patch16_224(**kwargs): num_heads=16, mlp_ratio=4, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_huge_patch16_224"], use_ssld=use_ssld) return model -def ViT_huge_patch32_384(**kwargs): +def ViT_huge_patch32_384(pretrained, model, model_url, use_ssld=False, **kwargs): model = VisionTransformer( img_size=384, patch_size=32, @@ -399,4 +447,5 @@ def ViT_huge_patch32_384(**kwargs): num_heads=16, mlp_ratio=4, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["ViT_huge_patch32_384"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/xception.py b/ppcls/arch/backbone/model_zoo/xception.py index e7ff6c5225a33c5dbe95e89de0c6b6d05907fbdf..126c3dfdb34771c7668b8494223839b2c00742f0 100644 --- a/ppcls/arch/backbone/model_zoo/xception.py +++ b/ppcls/arch/backbone/model_zoo/xception.py @@ -8,7 +8,16 @@ from paddle.nn.initializer import Uniform import math import sys -__all__ = ['Xception41', 'Xception65', 'Xception71'] + +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = { + "Xception41": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams", + "Xception65": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams", + "Xception71": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams" + } + +__all__ = list(MODEL_URLS.keys()) class ConvBNLayer(nn.Layer): @@ -329,17 +338,32 @@ class Xception(nn.Layer): x = self._exit_flow(x) return x - -def Xception41(**args): - model = Xception(entry_flow_block_num=3, middle_flow_block_num=8, **args) +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) + + +def Xception41(pretrained=False, use_ssld=False, **kwargs): + model = Xception(entry_flow_block_num=3, middle_flow_block_num=8, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Xception41"], use_ssld=use_ssld) return model -def Xception65(**args): - model = Xception(entry_flow_block_num=3, middle_flow_block_num=16, **args) +def Xception65(pretrained=False, use_ssld=False, **kwargs): + model = Xception(entry_flow_block_num=3, middle_flow_block_num=16, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Xception65"], use_ssld=use_ssld) return model -def Xception71(**args): - model = Xception(entry_flow_block_num=5, middle_flow_block_num=16, **args) +def Xception71(pretrained=False, use_ssld=False, **kwargs): + model = Xception(entry_flow_block_num=5, middle_flow_block_num=16, **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Xception71"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/backbone/model_zoo/xception_deeplab.py b/ppcls/arch/backbone/model_zoo/xception_deeplab.py index e3e39a4fb5d1aa88945615cd9de0efa5d207103a..f2424823fd90f37646063692b48726a6e4186e70 100644 --- a/ppcls/arch/backbone/model_zoo/xception_deeplab.py +++ b/ppcls/arch/backbone/model_zoo/xception_deeplab.py @@ -1,3 +1,17 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import paddle from paddle import ParamAttr import paddle.nn as nn @@ -5,7 +19,12 @@ import paddle.nn.functional as F from paddle.nn import Conv2D, BatchNorm, Linear, Dropout from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D -__all__ = ["Xception41_deeplab", "Xception65_deeplab", "Xception71_deeplab"] +from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url + +MODEL_URLS = {"Xception41_deeplab": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams", + "Xception65_deeplab": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams"} + +__all__ = list(MODEL_URLS.keys()) def check_data(data, number): @@ -369,18 +388,28 @@ class XceptionDeeplab(nn.Layer): x = paddle.squeeze(x, axis=[2, 3]) x = self._fc(x) return x + + +def _load_pretrained(pretrained, model, model_url, use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld) + elif isinstance(pretrained, str): + load_dygraph_pretrain(model, pretrained) + else: + raise RuntimeError( + "pretrained type is not available. Please use `string` or `boolean` type." + ) -def Xception41_deeplab(**args): - model = XceptionDeeplab('xception_41', **args) - return model - - -def Xception65_deeplab(**args): - model = XceptionDeeplab("xception_65", **args) +def Xception41_deeplab(pretrained=False, use_ssld=False, **kwargs): + model = XceptionDeeplab('xception_41', **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Xception41_deeplab"], use_ssld=use_ssld) return model -def Xception71_deeplab(**args): - model = XceptionDeeplab("xception_71", **args) +def Xception65_deeplab(pretrained=False, use_ssld=False, **kwargs): + model = XceptionDeeplab("xception_65", **kwargs) + _load_pretrained(pretrained, model, MODEL_URLS["Xception65_deeplab"], use_ssld=use_ssld) return model diff --git a/ppcls/arch/loss_metrics/__init__.py b/ppcls/arch/loss_metrics/__init__.py deleted file mode 100644 index 934fbd828654e5e73ee7a947eb597c25823f9395..0000000000000000000000000000000000000000 --- a/ppcls/arch/loss_metrics/__init__.py +++ /dev/null @@ -1,91 +0,0 @@ -#copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. - -import copy -import sys - -import paddle -import paddle.nn as nn -import paddle.nn.functional as F - - -# TODO: fix the format -class CELoss(nn.Layer): - """ - """ - - def __init__(self, name="loss", epsilon=None): - super().__init__() - self.name = name - if epsilon is not None and (epsilon <= 0 or epsilon >= 1): - epsilon = None - self.epsilon = epsilon - - def _labelsmoothing(self, target, class_num): - if target.shape[-1] != class_num: - one_hot_target = F.one_hot(target, class_num) - else: - one_hot_target = target - soft_target = F.label_smooth(one_hot_target, epsilon=self.epsilon) - soft_target = paddle.reshape(soft_target, shape=[-1, class_num]) - return soft_target - - def forward(self, logits, label, mode="train"): - loss_dict = {} - if self.epsilon is not None: - class_num = logits.shape[-1] - label = self._labelsmoothing(label, class_num) - x = -F.log_softmax(logits, axis=-1) - loss = paddle.sum(logits * label, axis=-1) - else: - if label.shape[-1] == logits.shape[-1]: - label = F.softmax(label, axis=-1) - soft_label = True - else: - soft_label = False - loss = F.cross_entropy(logits, label=label, soft_label=soft_label) - loss_dict[self.name] = paddle.mean(loss) - return loss_dict - - -# TODO: fix the format -class Topk(nn.Layer): - def __init__(self, topk=[1, 5]): - super().__init__() - assert isinstance(topk, (int, list)) - if isinstance(topk, int): - topk = [topk] - self.topk = topk - - def forward(self, x, label): - if isinstance(x, dict): - x = x["logits"] - - metric_dict = dict() - for k in self.topk: - metric_dict["top{}".format(k)] = paddle.metric.accuracy( - x, label, k=k) - return metric_dict - - -# TODO: fix the format -def build_loss(config): - loss_func = CELoss() - return loss_func - - -# TODO: fix the format -def build_metrics(config): - metrics_func = Topk() - return metrics_func diff --git a/ppcls/configs/Cartoonface/ResNet50_icartoon.yaml b/ppcls/configs/Cartoonface/ResNet50_icartoon.yaml index f18f3346bd6abf6a997ee1461cc815799c61d6a6..01ab83e176444710c1f4a81088f583c8a368ed85 100644 --- a/ppcls/configs/Cartoonface/ResNet50_icartoon.yaml +++ b/ppcls/configs/Cartoonface/ResNet50_icartoon.yaml @@ -60,19 +60,22 @@ Optimizer: DataLoader: Train: dataset: - name: ICartoonDataset - image_root: "./dataset/iCartoonFace" - cls_label_path: "./dataset/iCartoonFace/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ICartoonDataset + image_root: "./dataset/iCartoonFace" + cls_label_path: "./dataset/iCartoonFace/train_list.txt" + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: name: DistributedBatchSampler #num_instances: 2 @@ -86,27 +89,30 @@ DataLoader: Eval: Query: dataset: - name: ICartoonDataset - image_root: "./dataset/iCartoonFace" - cls_label_path: "./dataset/iCartoonFace/query.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ICartoonDataset + image_root: "./dataset/iCartoonFace" + cls_label_path: "./dataset/iCartoonFace/query.txt" + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: name: DistributedBatchSampler batch_size: 64 drop_last: False shuffle: False loader: - num_workers: 6 - use_shared_memory: False + num_workers: 8 + use_shared_memory: True Gallery: dataset: @@ -114,6 +120,9 @@ DataLoader: image_root: "./dataset/iCartoonFace" cls_label_path: "./dataset/iCartoonFace/gallery.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: resize_short: 256 - CropImage: @@ -129,8 +138,8 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 6 - use_shared_memory: False + num_workers: 8 + use_shared_memory: True Metric: Train: @@ -138,4 +147,4 @@ Metric: topk: [1, 5] Eval: - Recallk: - topk: 1 + topk: [1] diff --git a/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml b/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml new file mode 100644 index 0000000000000000000000000000000000000000..6df81d046e576ddddad5bf99a1b8c34b89e0b65c --- /dev/null +++ b/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: AlexNet + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.01 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA102.yaml b/ppcls/configs/ImageNet/DLA/DLA102.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0eb965173bd8ace3059f37f57b7d2eac23585ac9 --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA102.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA102 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA102x.yaml b/ppcls/configs/ImageNet/DLA/DLA102x.yaml new file mode 100644 index 0000000000000000000000000000000000000000..8f31a528f9981fa15c98d9a2beb2578d5d0b661c --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA102x.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA102x + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA102x2.yaml b/ppcls/configs/ImageNet/DLA/DLA102x2.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1140e4de4b0d70f6651a3c1c31072c4e5ebc2124 --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA102x2.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA102x2 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA169.yaml b/ppcls/configs/ImageNet/DLA/DLA169.yaml new file mode 100644 index 0000000000000000000000000000000000000000..482efecf623350f708b31279c753621cc59bb3be --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA169.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA169 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA34.yaml b/ppcls/configs/ImageNet/DLA/DLA34.yaml new file mode 100644 index 0000000000000000000000000000000000000000..85d80e285b601cb6b573e4a5d5857e9ad8535017 --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA34.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA34 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA46_c.yaml b/ppcls/configs/ImageNet/DLA/DLA46_c.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e0c2c84bf3f73be622cdc9633b9f3f49d249b336 --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA46_c.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA46_c + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA46x_c.yaml b/ppcls/configs/ImageNet/DLA/DLA46x_c.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b38acf746bbaaf64350346ac77c1cc496f2b5fe2 --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA46x_c.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA46x_c + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA60.yaml b/ppcls/configs/ImageNet/DLA/DLA60.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ebdac4e32b8d55550769fff189faf2b42e64e880 --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA60.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA60 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA60x.yaml b/ppcls/configs/ImageNet/DLA/DLA60x.yaml new file mode 100644 index 0000000000000000000000000000000000000000..261d3522e32aa19cbcb45e99fb44491a888f49ed --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA60x.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA60x + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DLA/DLA60x_c.yaml b/ppcls/configs/ImageNet/DLA/DLA60x_c.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1c507cb6179615cb6debf3ab491833c3ce064bde --- /dev/null +++ b/ppcls/configs/ImageNet/DLA/DLA60x_c.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DLA60x_c + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DPN/DPN107.yaml b/ppcls/configs/ImageNet/DPN/DPN107.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d4a82939f6629fc57bc6be67ac71bbb9cd431e19 --- /dev/null +++ b/ppcls/configs/ImageNet/DPN/DPN107.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DPN107 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DPN/DPN131.yaml b/ppcls/configs/ImageNet/DPN/DPN131.yaml new file mode 100644 index 0000000000000000000000000000000000000000..3f7640f1c9e76d0f95ccae525abe8428caadd798 --- /dev/null +++ b/ppcls/configs/ImageNet/DPN/DPN131.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DPN131 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DPN/DPN68.yaml b/ppcls/configs/ImageNet/DPN/DPN68.yaml new file mode 100644 index 0000000000000000000000000000000000000000..103313212a1548b9e2698e076e7b3f0bc6ef872f --- /dev/null +++ b/ppcls/configs/ImageNet/DPN/DPN68.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DPN68 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DPN/DPN92.yaml b/ppcls/configs/ImageNet/DPN/DPN92.yaml new file mode 100644 index 0000000000000000000000000000000000000000..606e79e8f3bb23ff09b308711c6a55fc2202df06 --- /dev/null +++ b/ppcls/configs/ImageNet/DPN/DPN92.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DPN92 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DPN/DPN98.yaml b/ppcls/configs/ImageNet/DPN/DPN98.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f23f407362aa8a071f706f029a4f349413245169 --- /dev/null +++ b/ppcls/configs/ImageNet/DPN/DPN98.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DPN98 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml b/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml new file mode 100644 index 0000000000000000000000000000000000000000..870ebf2d70ce550742898e7a7f942e1369aa0961 --- /dev/null +++ b/ppcls/configs/ImageNet/DarkNet/DarkNet53.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DarkNet53 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml b/ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml new file mode 100644 index 0000000000000000000000000000000000000000..10c442e36ec3ab1490eeaac207671389d223c709 --- /dev/null +++ b/ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DenseNet121 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DenseNet/DenseNet161.yaml b/ppcls/configs/ImageNet/DenseNet/DenseNet161.yaml new file mode 100644 index 0000000000000000000000000000000000000000..2c2917baf1440a2375c1287f2c38c9702b3d5102 --- /dev/null +++ b/ppcls/configs/ImageNet/DenseNet/DenseNet161.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DenseNet161 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DenseNet/DenseNet169.yaml b/ppcls/configs/ImageNet/DenseNet/DenseNet169.yaml new file mode 100644 index 0000000000000000000000000000000000000000..4ae4ad653a27062d15b02ea756545de14db650c4 --- /dev/null +++ b/ppcls/configs/ImageNet/DenseNet/DenseNet169.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DenseNet169 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DenseNet/DenseNet201.yaml b/ppcls/configs/ImageNet/DenseNet/DenseNet201.yaml new file mode 100644 index 0000000000000000000000000000000000000000..bf6013724cd28e9aaf9fa4fb2022b6f5b3219fab --- /dev/null +++ b/ppcls/configs/ImageNet/DenseNet/DenseNet201.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DenseNet201 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/DenseNet/DenseNet264.yaml b/ppcls/configs/ImageNet/DenseNet/DenseNet264.yaml new file mode 100644 index 0000000000000000000000000000000000000000..803a66bc4436bbef6ed84616004f190c1d87ff6d --- /dev/null +++ b/ppcls/configs/ImageNet/DenseNet/DenseNet264.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: DenseNet264 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Ineption/InceptionV3.yaml b/ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml similarity index 67% rename from ppcls/configs/ImageNet/Ineption/InceptionV3.yaml rename to ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml index db29940b0e516499b7800ed191e14f5dcb36c993..0721b1901113d1ed0a3f04723625168ce6eb20cf 100644 --- a/ppcls/configs/ImageNet/Ineption/InceptionV3.yaml +++ b/ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml @@ -8,7 +8,7 @@ Global: save_interval: 1 eval_during_train: True eval_interval: 1 - epochs: 200 + epochs: 120 print_batch_step: 10 use_visualdl: False # used for static mode and model export @@ -17,29 +17,48 @@ Global: # model architecture Arch: - name: "InceptionV3" - + name: "DistillationModel" + # if not null, its lengths should be same as models + pretrained_list: + # if not null, its lengths should be same as models + freeze_params_list: + - True + - False + models: + - Teacher: + name: MobileNetV3_large_x1_0 + pretrained: True + use_ssld: True + - Student: + name: MobileNetV3_small_x1_0 + pretrained: False + + infer_model_name: "Student" + + # loss function config for traing/eval process Loss: Train: - - CELoss: + - DistillationCELoss: weight: 1.0 + model_name_pairs: + - ["Student", "Teacher"] Eval: - - CELoss: + - DistillationGTCELoss: weight: 1.0 - + model_names: ["Student"] + Optimizer: name: Momentum momentum: 0.9 lr: name: Cosine - learning_rate: 0.045 - decay_epochs: [30, 60, 90] - values: [0.1, 0.01, 0.001, 0.0001] + learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' - coeff: 0.0001 + coeff: 0.00001 # data loader for train and eval @@ -50,10 +69,14 @@ DataLoader: image_root: "./dataset/ILSVRC2012/" cls_label_path: "./dataset/ILSVRC2012/train_list.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - RandCropImage: - size: 299 + size: 224 - RandFlipImage: flip_code: 1 + - AutoAugment: - NormalizeImage: scale: 0.00392157 mean: [0.485, 0.456, 0.406] @@ -62,7 +85,7 @@ DataLoader: sampler: name: DistributedBatchSampler - batch_size: 64 + batch_size: 512 drop_last: False shuffle: True loader: @@ -76,10 +99,13 @@ DataLoader: image_root: "./dataset/ILSVRC2012/" cls_label_path: "./dataset/ILSVRC2012/val_list.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: - resize_short: 320 + resize_short: 256 - CropImage: - size: 299 + size: 224 - NormalizeImage: scale: 0.00392157 mean: [0.485, 0.456, 0.406] @@ -102,9 +128,9 @@ Infer: to_rgb: True channel_first: False - ResizeImage: - resize_short: 320 + resize_short: 256 - CropImage: - size: 299 + size: 224 - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] @@ -112,14 +138,17 @@ Infer: order: '' - ToCHWImage: PostProcess: - name: Topk + name: DistillationPostProcess + func: Topk topk: 5 class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" Metric: Train: - - TopkAcc: + - DistillationTopkAcc: + model_key: "Student" topk: [1, 5] Eval: - - TopkAcc: + - DistillationTopkAcc: + model_key: "Student" topk: [1, 5] diff --git a/ppcls/configs/ImageNet/EfficientNet/EfficientNetB0.yaml b/ppcls/configs/ImageNet/EfficientNet/EfficientNetB0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..87dad8058cd6a1205e0d922c8af8f147bbdef62a --- /dev/null +++ b/ppcls/configs/ImageNet/EfficientNet/EfficientNetB0.yaml @@ -0,0 +1,133 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: EfficientNetB0 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: RMSProp + momentum: 0.9 + rho: 0.9 + epsilon: 0.001 + lr: + name: Cosine + learning_rate: 0.032 + regularizer: + name: 'L2' + coeff: 0.00001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - AutoAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/GhostNet/GhostNet_x0_5.yaml b/ppcls/configs/ImageNet/GhostNet/GhostNet_x0_5.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f6b62b956c7d61606d87af6352671d1acd5ef865 --- /dev/null +++ b/ppcls/configs/ImageNet/GhostNet/GhostNet_x0_5.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 360 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: GhostNet_x0_5 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.8 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/GhostNet/GhostNet_x1_0.yaml b/ppcls/configs/ImageNet/GhostNet/GhostNet_x1_0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e57c096df7ce1213284e5ece458f813839504566 --- /dev/null +++ b/ppcls/configs/ImageNet/GhostNet/GhostNet_x1_0.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 360 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: GhostNet_x1_0 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.8 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/GhostNet/GhostNet_x1_3.yaml b/ppcls/configs/ImageNet/GhostNet/GhostNet_x1_3.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e3c5b5076326f2712eb2c1c39fac8e1c37611f9e --- /dev/null +++ b/ppcls/configs/ImageNet/GhostNet/GhostNet_x1_3.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 360 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: GhostNet_x1_3 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.8 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml index fb76c33014478d2ef518128ddab74696b26260c4..8947bd5a379e32b1d2a8b7d9643f6e6aafe38c1a 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W18_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W18_C" + name: HRNet_W18_C # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W30_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W30_C.yaml index bd263ea7dfe078930244f052f2fa060fef0402f8..5722fb456c8ac7f65aa141a700a849b892492007 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W30_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W30_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W30_C" + name: HRNet_W30_C # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W32_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W32_C.yaml index c45be6415faf5325c2f059ded5d6737ef1afb9b8..5e5a13165e0a82bf134dbbf2ae3d07d9ceb154b8 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W32_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W32_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W32_C" + name: HRNet_W32_C # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W40_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W40_C.yaml index f82ee4b8ed26477a596687f1d84822f411104277..1fc06f5c75c5c59049239c265b1cb69b925833d5 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W40_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W40_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W40_C" + name: HRNet_W40_C # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W44_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W44_C.yaml index 2845e7e466e6eebc18c6c38e7dffaedbeb60b50d..9731472981b3ea9412a7cf9c92868eb516b8adb5 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W44_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W44_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W44_C" + name: HRNet_W44_C # loss function config for traing/eval process Loss: @@ -46,78 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W48_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W48_C.yaml index 377df4f4d9e4ef6e37a346a7639bd1dd77869f92..7f9f3d436209e28910500d7fba65bcc4b1340134 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W48_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W48_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W48_C" + name: HRNet_W48_C # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HRNet/HRNet_W64_C.yaml b/ppcls/configs/ImageNet/HRNet/HRNet_W64_C.yaml index fad039c8efb6bde0b075c2960256db2c94249cea..c9c891fd5f3e182f97a83661aed9b3d4dd2c84ae 100644 --- a/ppcls/configs/ImageNet/HRNet/HRNet_W64_C.yaml +++ b/ppcls/configs/ImageNet/HRNet/HRNet_W64_C.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "HRNet_W64_C" + name: HRNet_W64_C # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HarDNet/HarDNet39_ds.yaml b/ppcls/configs/ImageNet/HarDNet/HarDNet39_ds.yaml new file mode 100644 index 0000000000000000000000000000000000000000..90f24b35ada95126905381f22b2d005a1a47c0ea --- /dev/null +++ b/ppcls/configs/ImageNet/HarDNet/HarDNet39_ds.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: HarDNet39_ds + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HarDNet/HarDNet68.yaml b/ppcls/configs/ImageNet/HarDNet/HarDNet68.yaml new file mode 100644 index 0000000000000000000000000000000000000000..140a3d7709c79f7339c2ebcc7c922934d69632d8 --- /dev/null +++ b/ppcls/configs/ImageNet/HarDNet/HarDNet68.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: HarDNet68 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HarDNet/HarDNet68_ds.yaml b/ppcls/configs/ImageNet/HarDNet/HarDNet68_ds.yaml new file mode 100644 index 0000000000000000000000000000000000000000..9accc22932599cda0bedb1a79ca52c2be6f726d8 --- /dev/null +++ b/ppcls/configs/ImageNet/HarDNet/HarDNet68_ds.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: HarDNet68_ds + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/HarDNet/HarDNet85.yaml b/ppcls/configs/ImageNet/HarDNet/HarDNet85.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f60ba467a4f5d5e677432f9df66f7f28748845a1 --- /dev/null +++ b/ppcls/configs/ImageNet/HarDNet/HarDNet85.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: HarDNet85 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Inception/GoogLeNet.yaml b/ppcls/configs/ImageNet/Inception/GoogLeNet.yaml new file mode 100644 index 0000000000000000000000000000000000000000..8bc209f58b13da3189c8374846945835572122f7 --- /dev/null +++ b/ppcls/configs/ImageNet/Inception/GoogLeNet.yaml @@ -0,0 +1,127 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: GoogLeNet + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.01 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Inception/InceptionV3.yaml b/ppcls/configs/ImageNet/Inception/InceptionV3.yaml new file mode 100644 index 0000000000000000000000000000000000000000..85981e1b03eb4df602f128d1c65df6b72ccf16b5 --- /dev/null +++ b/ppcls/configs/ImageNet/Inception/InceptionV3.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: InceptionV3 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Inception/InceptionV4.yaml b/ppcls/configs/ImageNet/Inception/InceptionV4.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1f3183706b5bce09bb3b8c1f15c29385ffeaacaf --- /dev/null +++ b/ppcls/configs/ImageNet/Inception/InceptionV4.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: InceptionV4 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/LeViT/LeViT_128.yaml b/ppcls/configs/ImageNet/LeViT/LeViT_128.yaml new file mode 100644 index 0000000000000000000000000000000000000000..25a9cc7b0358faa7d685c202666ddeb26d548eb4 --- /dev/null +++ b/ppcls/configs/ImageNet/LeViT/LeViT_128.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: LeViT128 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/LeViT/LeViT_128S.yaml b/ppcls/configs/ImageNet/LeViT/LeViT_128S.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0a21caf49a062a82787edada39e1d25221d2a38d --- /dev/null +++ b/ppcls/configs/ImageNet/LeViT/LeViT_128S.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: LeViT128S + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/LeViT/LeViT_192.yaml b/ppcls/configs/ImageNet/LeViT/LeViT_192.yaml new file mode 100644 index 0000000000000000000000000000000000000000..51dca81bf069149a101d094ea1678c5498451f0d --- /dev/null +++ b/ppcls/configs/ImageNet/LeViT/LeViT_192.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: LeViT192 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/LeViT/LeViT_256.yaml b/ppcls/configs/ImageNet/LeViT/LeViT_256.yaml new file mode 100644 index 0000000000000000000000000000000000000000..3dbcc79b0c331519c4f424c77cc54ac5ac797f0b --- /dev/null +++ b/ppcls/configs/ImageNet/LeViT/LeViT_256.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: LeViT256 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/LeViT/LeViT_384.yaml b/ppcls/configs/ImageNet/LeViT/LeViT_384.yaml new file mode 100644 index 0000000000000000000000000000000000000000..933ed84bf3c8737b0aa748e98493a464dc13dc4c --- /dev/null +++ b/ppcls/configs/ImageNet/LeViT/LeViT_384.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: LeViT384 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml index ecbd65a2bec892be00111737447a4baf23797089..da3ec16fffedca8dbfa243afefababc90e4afa9e 100644 --- a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml +++ b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV1" + name: MobileNetV1 # loss function config for traing/eval process Loss: @@ -39,87 +39,93 @@ Optimizer: values: [0.1, 0.01, 0.001, 0.0001] regularizer: name: 'L2' - coeff: 0.00003 + coeff: 0.0003 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_25.yaml b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_25.yaml index 6d4b49939e2fa7bcbc6c984212acd5ff9445c446..6838771ca323174cb37ad0a8c83f1c6a15f8416d 100644 --- a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_25.yaml +++ b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_25.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV1_x0_25" + name: MobileNetV1_x0_25 # loss function config for traing/eval process Loss: @@ -39,87 +39,93 @@ Optimizer: values: [0.1, 0.01, 0.001, 0.0001] regularizer: name: 'L2' - coeff: 0.00003 + coeff: 0.0003 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_5.yaml b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_5.yaml index 492c0a085b13176988c0a7e14f8954e59baad230..e1ecf18dfe0b23aa1f40dce613c86e0423eb502e 100644 --- a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_5.yaml +++ b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_5.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV1_x0_5" + name: MobileNetV1_x0_5 # loss function config for traing/eval process Loss: @@ -39,87 +39,93 @@ Optimizer: values: [0.1, 0.01, 0.001, 0.0001] regularizer: name: 'L2' - coeff: 0.00003 + coeff: 0.0003 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_75.yaml b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_75.yaml index 4b500f56f88adaeff9dd73b164a6ab5a028c9e11..f93a03cbe442bc242d6e7f192a1953af620b154b 100644 --- a/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_75.yaml +++ b/ppcls/configs/ImageNet/MobileNetV1/MobileNetV1_x0_75.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV1_x0_75" + name: MobileNetV1_x0_75 # loss function config for traing/eval process Loss: @@ -39,87 +39,93 @@ Optimizer: values: [0.1, 0.01, 0.001, 0.0001] regularizer: name: 'L2' - coeff: 0.00003 + coeff: 0.0003 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2.yaml b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2.yaml new file mode 100644 index 0000000000000000000000000000000000000000..a0d0a7321c59a2748e8a449b6ced0d6038893de1 --- /dev/null +++ b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: MobileNetV2 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_25.yaml b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_25.yaml new file mode 100644 index 0000000000000000000000000000000000000000..269ce7e9ed7bba98e323e6b6cf8243f29d636858 --- /dev/null +++ b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_25.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: MobileNetV2_x0_25 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0003 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_5.yaml b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_5.yaml new file mode 100644 index 0000000000000000000000000000000000000000..81cb4e845094f27c7debb225df0852452fc483f3 --- /dev/null +++ b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_5.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: MobileNetV2_x0_5 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0003 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_75.yaml b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_75.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d1a8a831a709e60772bd37a260e2186e6ca84ad8 --- /dev/null +++ b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x0_75.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: MobileNetV2_x0_75 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x1_5.yaml b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x1_5.yaml new file mode 100644 index 0000000000000000000000000000000000000000..71a609c55cda53baca312408b5b0fcbc501b80fe --- /dev/null +++ b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x1_5.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: MobileNetV2_x1_5 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x2_0.yaml b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x2_0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f9e7594185fb2ad719691607c2e8aa1844c68061 --- /dev/null +++ b/ppcls/configs/ImageNet/MobileNetV2/MobileNetV2_x2_0.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: MobileNetV2_x2_0 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_35.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_35.yaml index 59ef2b5c70d780f550a5d9b529617ec02e954073..a34068cd06f122e0f388ce169f3e699ef6c35a7d 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_35.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_35.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_large_x0_35" + name: MobileNetV3_large_x0_35 # loss function config for traing/eval process Loss: @@ -36,6 +36,7 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_5.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_5.yaml index d291b5b35d4e4738dcb466875f128ed97ee0661e..fb9bc47db0fd43a7a7ee0a086ce4ca6487853484 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_5.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_5.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_large_x0_5" + name: MobileNetV3_large_x0_5 # loss function config for traing/eval process Loss: @@ -36,6 +36,7 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_75.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_75.yaml index 9d2f52d6a9b5d05bd6966d4f5238fb6713fc8d22..590fa35f197caf119769d11b8632bed367f1c315 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_75.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x0_75.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_large_x0_75" + name: MobileNetV3_large_x0_75 # loss function config for traing/eval process Loss: @@ -36,6 +36,7 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml index cf99d812eeed65c6430b20c2ce690fb7b55246bb..15c945e96924fd72254b8d51efb450e912e2e186 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_large_x1_0" + name: MobileNetV3_large_x1_0 # loss function config for traing/eval process Loss: @@ -35,7 +35,8 @@ Optimizer: momentum: 0.9 lr: name: Cosine - learning_rate: 1.3 + learning_rate: 0.65 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,81 +46,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - AutoAugment: - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - AutoAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 256 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_25.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_25.yaml index 2d7b7546334a007fe35359673c4595f9606b5674..1def84365d7c8103c2dd0781b7a61df84e43b2f7 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_25.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_25.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_large_x1_25" + name: MobileNetV3_large_x1_25 # loss function config for traing/eval process Loss: @@ -36,89 +36,96 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' - coeff: 0.00002 + coeff: 0.00004 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_35.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_35.yaml index c954757d0bc9887793452cfbc48d7fe790c8e877..0849c5bdea56f5ed108d0d0d6410d2efbbe3cda0 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_35.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_35.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_small_x0_35" + name: MobileNetV3_small_x0_35 # loss function config for traing/eval process Loss: @@ -36,89 +36,96 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' - coeff: 0.00002 + coeff: 0.00001 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_5.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_5.yaml index f1ce5eb2cf8df363aebf28dad52e809d4a66021a..e4d71e35ec2b79cc4f80a105b2e4bb3b2263cd9e 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_5.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_5.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_small_x0_5" + name: MobileNetV3_small_x0_5 # loss function config for traing/eval process Loss: @@ -36,89 +36,96 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' - coeff: 0.00002 + coeff: 0.00001 # data loader for train and eval DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_75.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_75.yaml index 3d2d9263d7df1868819e1259c9d8ea13467104ce..75c6ff1740560fab1cc54d6a44bd8c5298fd75c6 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_75.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x0_75.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_small_x0_75" + name: MobileNetV3_small_x0_75 # loss function config for traing/eval process Loss: @@ -36,6 +36,7 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_0.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_0.yaml index e8da5ffb410cb7cb1760e7d85278ec368e6969e2..68749f43c10fadc5bebcba7d49e88e75459eafd4 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_0.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_0.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_small_x1_0" + name: MobileNetV3_small_x1_0 # loss function config for traing/eval process Loss: @@ -36,6 +36,7 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml index 1417632c196530c58d3762881d680c211ccd2711..fb6109350f2065ca4783684d6a56e7adcc730716 100644 --- a/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml +++ b/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "MobileNetV3_small_x1_25" + name: MobileNetV3_small_x1_25 # loss function config for traing/eval process Loss: @@ -36,6 +36,7 @@ Optimizer: lr: name: Cosine learning_rate: 1.3 + warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.00002 @@ -45,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 512 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 512 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/RedNet/RedNet101.yaml b/ppcls/configs/ImageNet/RedNet/RedNet101.yaml new file mode 100644 index 0000000000000000000000000000000000000000..29bf25037632f9da3a3a4e6dafe6e4e5037626fe --- /dev/null +++ b/ppcls/configs/ImageNet/RedNet/RedNet101.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: RedNet101 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/RedNet/RedNet152.yaml b/ppcls/configs/ImageNet/RedNet/RedNet152.yaml new file mode 100644 index 0000000000000000000000000000000000000000..08972e8e357fcbdc56b9e4b7c554b39568c53d6c --- /dev/null +++ b/ppcls/configs/ImageNet/RedNet/RedNet152.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: RedNet152 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/RedNet/RedNet26.yaml b/ppcls/configs/ImageNet/RedNet/RedNet26.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f61d20c7f849a77a2be507780220d68bb789d532 --- /dev/null +++ b/ppcls/configs/ImageNet/RedNet/RedNet26.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: RedNet26 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/RedNet/RedNet38.yaml b/ppcls/configs/ImageNet/RedNet/RedNet38.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f35d949e0761c70cbe90aafb4874a3053e30ae1d --- /dev/null +++ b/ppcls/configs/ImageNet/RedNet/RedNet38.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: RedNet38 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/RedNet/RedNet50.yaml b/ppcls/configs/ImageNet/RedNet/RedNet50.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f85a079166ce0fe23ceba0c4da27c1ef30abb30c --- /dev/null +++ b/ppcls/configs/ImageNet/RedNet/RedNet50.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: RedNet50 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0 + mean: [123.675, 116.28, 103.53] + std: [58.395, 57.12, 57.375] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ce4651ca3f53189eec242f2d0da93f3cca5765c2 --- /dev/null +++ b/ppcls/configs/ImageNet/Res2Net/Res2Net101_vd_26w_4s.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Res2Net101_vd_26w_4s + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml new file mode 100644 index 0000000000000000000000000000000000000000..8a7047de39e1750350051431334a12b7a1679e9a --- /dev/null +++ b/ppcls/configs/ImageNet/Res2Net/Res2Net200_vd_26w_4s.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Res2Net200_vd_26w_4s + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml new file mode 100644 index 0000000000000000000000000000000000000000..2f78e017e9db3cdb009ece3234a6a2ac4e5d91c2 --- /dev/null +++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Res2Net50_14w_8s + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b9815edde87d7118587a87bd656567895f019e53 --- /dev/null +++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_26w_4s.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Res2Net50_26w_4s + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml b/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5021672ac96e82dceed12d1f966a18df51d20ccf --- /dev/null +++ b/ppcls/configs/ImageNet/Res2Net/Res2Net50_vd_26w_4s.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Res2Net50_vd_26w_4s + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml b/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml new file mode 100644 index 0000000000000000000000000000000000000000..3fbe2bd7b17ec561fe147c912ad664c23b85e24d --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt101.yaml @@ -0,0 +1,132 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 300 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeSt101 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - AutoAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e15801e0db25a7d4e391015f4d262e5d47d172ff --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml @@ -0,0 +1,132 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 300 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeSt50 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - AutoAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..900e0ef9ef7ab386254c8229a8418c0d3a31307b --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeSt/ResNeSt50_fast_1s1x64d.yaml @@ -0,0 +1,132 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 300 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeSt50_fast_1s1x64d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - AutoAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..6844fa106fd3045a041ede81f68f87e385321d66 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt101_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_64x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..49668a3fd537b36f37f58049d3dc4838034e60dc --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_64x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt101_64x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.00015 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..4fa402fd8fe26ddd0bb0c18c31516f864fb0fcaa --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt101_vd_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..bb4b03ec3778d7b632b6f5ae9f8f0cbe18e4641f --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt101_vd_64x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt101_vd_64x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0a7448dfd1db6d1961dccd243b9fe570a44e3df3 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt152_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_64x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0097c5cd3eac8f34563bb88ce52792cfad9c3862 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_64x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt152_64x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.00018 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..73922c1621dc55913a4c208b9c52e7d1dfe00fd3 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt152_vd_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..950aba28523f52464d736d8a0701fcad687add30 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt152_vd_64x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt152_vd_64x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..417c10f637a6bb5536a3052918ba68f0d107fbc4 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt50_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_64x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..69feb0ae4acb514d3af1cdf99d2ea8f66efa2584 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_64x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt50_64x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1ccf1b048b1ba8ba0467a7858abc72e30a852081 --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt50_vd_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..dec7770e09705f2d78f0ff0ebe1f222a067996dd --- /dev/null +++ b/ppcls/configs/ImageNet/ResNeXt/ResNeXt50_vd_64x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ResNeXt50_vd_64x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet101.yaml b/ppcls/configs/ImageNet/ResNet/ResNet101.yaml index f3ecafe6682128404a6a6ed6cc22bba44bde6606..8fa6c72baab1624a4ed90684ee4061471329659c 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet101.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet101.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet101" + name: ResNet101 # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml index 37bdb4080afa98e065d5584c0799646e12fcb9fb..b5af3e647964f434d21a9a165f86e67ab2be7914 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,17 +13,18 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet101_vd" + name: ResNet101_vd # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 + epsilon: 0.1 Eval: - CELoss: weight: 1.0 @@ -44,80 +45,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: + Train: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet152.yaml b/ppcls/configs/ImageNet/ResNet/ResNet152.yaml index da05e570deabd94bc7dcdb24bb5429ff686baa49..e1bcbadc5c040c51924c1856e499e6317ad9f6c9 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet152.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet152.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet152" + name: ResNet152 # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml index 9bfcfc0620ea56a31d07315bdbe930d3483f1b49..b332e45cbf67f05d87bd4c7702b1544bc4abe1d3 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet152_vd.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,17 +13,18 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet152_vd" + name: ResNet152_vd # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 + epsilon: 0.1 Eval: - CELoss: weight: 1.0 @@ -44,80 +45,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: + Train: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet18.yaml b/ppcls/configs/ImageNet/ResNet/ResNet18.yaml index 12a38ccba5d6c14f884069f49623fdfa94dcbf2f..aca7c651084b0ab3b963bb2082ce26acdcabaca8 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet18.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet18.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet18" + name: ResNet18 # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml index 6d207d65016f92536d9c6b13057542f4d1577dbe..eed1d22506c07d402b0603ea93fce1260c20de31 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet18_vd.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,17 +13,18 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet18_vd" + name: ResNet18_vd # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 + epsilon: 0.1 Eval: - CELoss: weight: 1.0 @@ -44,80 +45,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: + Train: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml index d20aeb12fb510a304ab23f8cb1bfd18fca6e3109..2d9b918f8d30fcf55d6909091efce55db235ed54 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet200_vd.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,17 +13,18 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet200_vd" + name: ResNet200_vd # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 + epsilon: 0.1 Eval: - CELoss: weight: 1.0 @@ -44,80 +45,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: + Train: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet34.yaml b/ppcls/configs/ImageNet/ResNet/ResNet34.yaml index c05228cacd1779686df6a4ee4d99d8d3e3455eb2..59e58a68ccca43df073e340d9515dd10194d3610 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet34.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet34.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet34" + name: ResNet34 # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml index ea5be38a9ee7b89d8d23654a166e01d85ec3773f..41f1d56816c756b09d0df55fee9a484bfe03ba3d 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet34_vd.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,17 +13,18 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet34_vd" + name: ResNet34_vd # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 + epsilon: 0.1 Eval: - CELoss: weight: 1.0 @@ -44,80 +45,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: + Train: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet50.yaml b/ppcls/configs/ImageNet/ResNet/ResNet50.yaml index a19f78c691a2c10595e6fe2f7aa4913aad4f9e0c..5206bfb03a59e4a0b5a33db800d6aec7e300e666 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet50.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet50.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet50" + name: ResNet50 # loss function config for traing/eval process Loss: @@ -46,80 +46,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet50_retrieval.yml b/ppcls/configs/ImageNet/ResNet/ResNet50_retrieval.yml deleted file mode 100644 index bf143c861d6b9e774c20c291867d5501763cb7f0..0000000000000000000000000000000000000000 --- a/ppcls/configs/ImageNet/ResNet/ResNet50_retrieval.yml +++ /dev/null @@ -1,109 +0,0 @@ -# global configs -Global: - checkpoints: null - pretrained_model: null - output_dir: "./output/" - device: "gpu" - class_num: 1000 - save_interval: 1 - eval_during_train: True - eval_interval: 1 - epochs: 120 - print_batch_step: 10 - use_visualdl: False - image_shape: [3, 224, 224] - infer_imgs: - -# model architecture -Arch: - name: "RecModel" - Backbone: - name: "ResNet50" - Stoplayer: - name: "flatten_0" - output_dim: 2048 - embedding_size: 512 - Head: - name: "ArcMargin" - margin: 0.5 - scale: 80 - -# loss function config for traing/eval process -Loss: - Train: - - CELoss: - weight: 1.0 - Eval: - - CELoss: - weight: 1.0 - - -Optimizer: - name: Momentum - momentum: 0.9 - lr: - name: Piecewise - learning_rate: 0.1 - decay_epochs: [30, 60, 90] - values: [0.1, 0.01, 0.001, 0.0001] - regularizer: - name: 'L2' - coeff: 0.0001 - -# data loader for train and eval -DataLoader: - Train: - # Dataset: - # Sampler: - # Loader: - batch_size: 256 - num_workers: 4 - file_list: "./dataset/ILSVRC2012/train_list.txt" - data_dir: "./dataset/ILSVRC2012/" - shuffle_seed: 0 - transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 1./255. - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: - Eval: - # TOTO: modify to the latest trainer - # Dataset: - # Sampler: - # Loader: - batch_size: 128 - num_workers: 4 - file_list: "./dataset/ILSVRC2012/val_list.txt" - data_dir: "./dataset/ILSVRC2012/" - shuffle_seed: 0 - transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: - -Metric: - Train: - - Topk: - k: [1, 5] - Eval: - - Topk: - k: [1, 5] diff --git a/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml b/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml index 15a02e7c29696e551bc3653524f44ba4b427fe8e..8ae56a01e00d5833741ed8fb26f49f25b02a12e8 100644 --- a/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml +++ b/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,17 +13,18 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "ResNet50_vd" + name: ResNet50_vd # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 + epsilon: 0.1 Eval: - CELoss: weight: 1.0 @@ -44,81 +45,87 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: + Train: + Eval: - TopkAcc: topk: [1, 5] - diff --git a/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml b/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c9b741d1ab2d38cfa8e62a3337c7b5e08c3c9559 --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SENet154_vd.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SENet154_vd + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..bf2f4da3152c33b821ac7b139eb90ad42b8ba5b0 --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt101_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SE_ResNeXt101_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..07d6baac9332aace1984761bd517c899f67d554c --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SE_ResNeXt50_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b740069a7c39703baf78f443aaa0f02cd7c2e462 --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SE_ResNeXt50_vd_32x4d.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SE_ResNeXt50_vd_32x4d + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml new file mode 100644 index 0000000000000000000000000000000000000000..86d8786ece73f1eb90e86060b76f4ca869b6b4a8 --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SE_ResNet18_vd.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SE_ResNet18_vd + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml new file mode 100644 index 0000000000000000000000000000000000000000..fe57ba791373ebd7f50e87aa9dd125827a94e743 --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SE_ResNet34_vd.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SE_ResNet34_vd + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml b/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml new file mode 100644 index 0000000000000000000000000000000000000000..9807a86a75a6910c3af1201ab768a07caf86dc19 --- /dev/null +++ b/ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SE_ResNet50_vd + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.1 + regularizer: + name: 'L2' + coeff: 0.00007 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_25.yaml b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_25.yaml new file mode 100644 index 0000000000000000000000000000000000000000..8861fcba9961e2b59960df7bbc5eb32004522257 --- /dev/null +++ b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_25.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ShuffleNetV2_x0_25 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.5 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0003 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 256 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_33.yaml b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_33.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b4f8744d55e390f18d6ef27596ce359b694f071e --- /dev/null +++ b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_33.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ShuffleNetV2_x0_33 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.5 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0003 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 256 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_5.yaml b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_5.yaml new file mode 100644 index 0000000000000000000000000000000000000000..4d8f29489363037608572e301b1b3ef9ed468cf7 --- /dev/null +++ b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x0_5.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ShuffleNetV2_x0_5 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.5 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0003 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 256 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..9add6e5fb01c3a1a6121562cc9aa79d9d3360609 --- /dev/null +++ b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ShuffleNetV2_x1_0 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.5 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 256 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_5.yaml b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_5.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b0f8bc6491361372da8d7ba3fc349dfb09c368b6 --- /dev/null +++ b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_5.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ShuffleNetV2_x1_5 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.25 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x2_0.yaml b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x2_0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..eb65f021547ba753881b8ef7bdf66e2da56e22ca --- /dev/null +++ b/ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x2_0.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 240 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: ShuffleNetV2_x2_0 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.25 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0004 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SqueezeNet/SqueezeNet1_0.yaml b/ppcls/configs/ImageNet/SqueezeNet/SqueezeNet1_0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..92a9efce14064140b4da72a0842762875d8528e1 --- /dev/null +++ b/ppcls/configs/ImageNet/SqueezeNet/SqueezeNet1_0.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SqueezeNet1_0 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.02 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/SqueezeNet/SqueezeNet1_1.yaml b/ppcls/configs/ImageNet/SqueezeNet/SqueezeNet1_1.yaml new file mode 100644 index 0000000000000000000000000000000000000000..de893cde9b0273245e96758da4f2240fa7c3588e --- /dev/null +++ b/ppcls/configs/ImageNet/SqueezeNet/SqueezeNet1_1.yaml @@ -0,0 +1,129 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: SqueezeNet1_1 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.02 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/TNT/TNT_small.yaml b/ppcls/configs/ImageNet/TNT/TNT_small.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f30347fec437cdb12719cfb7b2e5994a2767bc34 --- /dev/null +++ b/ppcls/configs/ImageNet/TNT/TNT_small.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: TNT_small + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 248 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 248 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml b/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7ee4f005d74a3ea2bc098e18d077bab146dcfea9 --- /dev/null +++ b/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: alt_gvt_base + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml b/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml new file mode 100644 index 0000000000000000000000000000000000000000..4a81359fd6dd9516b9a841707dc9cf96bd5e16e9 --- /dev/null +++ b/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: alt_gvt_large + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml b/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b65060552211776803948d149b012aa76f38afd7 --- /dev/null +++ b/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: alt_gvt_small + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml b/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e4b85b9857e06edfee3b429f81b653d90854c4e9 --- /dev/null +++ b/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: pcpvt_base + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml b/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5464cc5a926383822d8e6cc4ac8a2b322f126c96 --- /dev/null +++ b/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: pcpvt_large + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml b/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml new file mode 100644 index 0000000000000000000000000000000000000000..189985fa03f1e4d564907e79fb5d80917b8838e2 --- /dev/null +++ b/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml @@ -0,0 +1,131 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: pcpvt_small + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Piecewise + learning_rate: 0.1 + decay_epochs: [30, 60, 90] + values: [0.1, 0.01, 0.001, 0.0001] + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/VGG/VGG11.yaml b/ppcls/configs/ImageNet/VGG/VGG11.yaml index bc7018e034754ba51edf70b70574b91af3dc3046..c770023bb248c7addc8dc12aa901e5b43b579fab 100644 --- a/ppcls/configs/ImageNet/VGG/VGG11.yaml +++ b/ppcls/configs/ImageNet/VGG/VGG11.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "VGG11" + name: VGG11 # loss function config for traing/eval process Loss: @@ -44,80 +44,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/VGG/VGG13.yaml b/ppcls/configs/ImageNet/VGG/VGG13.yaml index bef8c4e1d027d656dc527ce9a7ec2a2f969a7089..f906eef3eba712c3e9c43aeb1bbe02e04f9208a8 100644 --- a/ppcls/configs/ImageNet/VGG/VGG13.yaml +++ b/ppcls/configs/ImageNet/VGG/VGG13.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "VGG13" + name: VGG13 # loss function config for traing/eval process Loss: @@ -44,80 +44,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/VGG/VGG16.yaml b/ppcls/configs/ImageNet/VGG/VGG16.yaml index 4099bdcd72df6b1170bb1ce3e36793953a18c059..0fe378ed69589cbe111ac8747f1630b93886b454 100644 --- a/ppcls/configs/ImageNet/VGG/VGG16.yaml +++ b/ppcls/configs/ImageNet/VGG/VGG16.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "VGG16" + name: VGG16 # loss function config for traing/eval process Loss: @@ -44,80 +44,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/VGG/VGG19.yaml b/ppcls/configs/ImageNet/VGG/VGG19.yaml index 6353815d31520a357594ea676e2b1fdc1caaf227..025e518b345b799a392ba62d6dce3f051fd14123 100644 --- a/ppcls/configs/ImageNet/VGG/VGG19.yaml +++ b/ppcls/configs/ImageNet/VGG/VGG19.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 1000 save_interval: 1 eval_during_train: True @@ -13,11 +13,11 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" + save_inference_dir: ./inference # model architecture Arch: - name: "VGG19" + name: VGG19 # loss function config for traing/eval process Loss: @@ -44,80 +44,86 @@ Optimizer: DataLoader: Train: dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/train_list.txt" - transform_ops: - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: ImageNetDataset - image_root: "./dataset/ILSVRC2012/" - cls_label_path: "./dataset/ILSVRC2012/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: PostProcess: name: Topk topk: 5 - class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt Metric: - Train: + Train: - TopkAcc: topk: [1, 5] - Eval: + Eval: - TopkAcc: topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Xception/Xception41.yaml b/ppcls/configs/ImageNet/Xception/Xception41.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e615a172472a308f1e4f8264c986716945d2f94e --- /dev/null +++ b/ppcls/configs/ImageNet/Xception/Xception41.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Xception41 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Xception/Xception41_deeplab.yaml b/ppcls/configs/ImageNet/Xception/Xception41_deeplab.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c1c1f67450f5b283b13b03728ae632d0099cc7e3 --- /dev/null +++ b/ppcls/configs/ImageNet/Xception/Xception41_deeplab.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Xception41_deeplab + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Xception/Xception65.yaml b/ppcls/configs/ImageNet/Xception/Xception65.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e9cd90c655ce6514a470a0ee7a37382a3e958522 --- /dev/null +++ b/ppcls/configs/ImageNet/Xception/Xception65.yaml @@ -0,0 +1,133 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Xception41_deeplab + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Xception/Xception65_deeplab.yaml b/ppcls/configs/ImageNet/Xception/Xception65_deeplab.yaml new file mode 100644 index 0000000000000000000000000000000000000000..11520686ce73e33772277b63b307e0821468d6b8 --- /dev/null +++ b/ppcls/configs/ImageNet/Xception/Xception65_deeplab.yaml @@ -0,0 +1,130 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 120 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Xception65_deeplab + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.045 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/Xception/Xception71.yaml b/ppcls/configs/ImageNet/Xception/Xception71.yaml new file mode 100644 index 0000000000000000000000000000000000000000..833090d8bc209d4bfe02deee9ae9a9b26743ae8a --- /dev/null +++ b/ppcls/configs/ImageNet/Xception/Xception71.yaml @@ -0,0 +1,133 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + class_num: 1000 + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 200 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 299, 299] + save_inference_dir: ./inference + +# model architecture +Arch: + name: Xception71 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.0225 + regularizer: + name: 'L2' + coeff: 0.0001 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 299 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + batch_transform_ops: + - MixupOperator: + alpha: 0.2 + + sampler: + name: DistributedBatchSampler + batch_size: 32 + drop_last: False + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + # TOTO: modify to the latest trainer + dataset: + name: ImageNetDataset + image_root: ./dataset/ILSVRC2012/ + cls_label_path: ./dataset/ILSVRC2012/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/whl/demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 320 + - CropImage: + size: 299 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/Logo/ResNet50_ReID.yaml b/ppcls/configs/Logo/ResNet50_ReID.yaml index 5b1299114fccbc786257d2627465286eecad7cc2..6e6a825df4c517265f350998792d2ff3e666b877 100644 --- a/ppcls/configs/Logo/ResNet50_ReID.yaml +++ b/ppcls/configs/Logo/ResNet50_ReID.yaml @@ -65,6 +65,9 @@ DataLoader: image_root: "dataset/LogoDet-3K-crop/train/" cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - RandFlipImage: @@ -91,25 +94,28 @@ DataLoader: Query: # TOTO: modify to the latest trainer dataset: - name: LogoDataset - image_root: "dataset/LogoDet-3K-crop/val/" - cls_label_path: "LogoDet-3K-crop/LogoDet-3K+query.txt" - transform_ops: - - ResizeImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: LogoDataset + image_root: "dataset/LogoDet-3K-crop/val/" + cls_label_path: "LogoDet-3K-crop/LogoDet-3K+query.txt" + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: name: DistributedBatchSampler batch_size: 128 drop_last: False shuffle: False loader: - num_workers: 10 - use_shared_memory: False + num_workers: 8 + use_shared_memory: True Gallery: # TOTO: modify to the latest trainer @@ -118,6 +124,9 @@ DataLoader: image_root: "dataset/LogoDet-3K-crop/train/" cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+gallery.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: @@ -131,8 +140,8 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 10 - use_shared_memory: False + num_workers: 8 + use_shared_memory: True Metric: Eval: @@ -145,14 +154,14 @@ Infer: infer_imgs: "docs/images/whl/demo.jpg" batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: diff --git a/ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml b/ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml index 795178300ddccf8005a1ea7de6ee1c91f95ef200..263e266d5f9c3697cb6092ce2e4f0d6e0438d9d2 100644 --- a/ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml +++ b/ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml @@ -2,8 +2,8 @@ Global: checkpoints: null pretrained_model: null - output_dir: "./output/" - device: "gpu" + output_dir: ./output/ + device: gpu class_num: 50030 save_interval: 1 eval_during_train: True @@ -13,23 +13,23 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" - eval_mode: "classification" + save_inference_dir: ./inference + eval_mode: classification # model architecture Arch: - name: "RecModel" + name: RecModel Backbone: - name: "ResNet50_vd" - pretrained: False + name: ResNet50_vd + pretrained: True BackboneStopLayer: - name: "flatten_0" + name: flatten_0 Neck: - name: "FC" + name: FC embedding_size: 2048 class_num: 512 Head: - name: "FC" + name: FC embedding_size: 512 class_num: 50030 @@ -56,52 +56,58 @@ Optimizer: DataLoader: Train: dataset: - name: "ImageNetDataset" - image_root: "./dataset/Aliproduct/" - cls_label_path: "./dataset/Aliproduct/train_list.txt" - transform_ops: - - ResizeImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/Aliproduct/ + cls_label_path: ./dataset/Aliproduct/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer dataset: - name: "ImageNetDataset" - image_root: "./dataset/Aliproduct/" - cls_label_path: "./dataset/Aliproduct/val_list.txt" - transform_ops: - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' + name: ImageNetDataset + image_root: ./dataset/Aliproduct/ + cls_label_path: ./dataset/Aliproduct/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True Metric: Train: - TopkAcc: @@ -111,17 +117,17 @@ Metric: topk: [1, 5] Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: diff --git a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml index 2d264922cbd2a684759ed9b694c21586f4af4271..d6c4da863edab3c08303ece55ca65561ff3b4f90 100644 --- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml +++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml @@ -1,9 +1,11 @@ # global configs Global: checkpoints: null - pretrained_model: null - output_dir: "./output/" - device: "gpu" +# please download pretrained model via this link: +# https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams + pretrained_model: product_ResNet50_vd_Aliproduct_v1.0_pretrained + output_dir: ./output/ + device: gpu class_num: 3997 save_interval: 10 eval_during_train: True @@ -13,29 +15,30 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" - eval_mode: "retrieval" + save_inference_dir: ./inference + eval_mode: retrieval # model architecture Arch: - name: "RecModel" + name: RecModel + infer_output_key: features + infer_add_softmax: False + Backbone: - name: "ResNet50_vd" + name: ResNet50_vd pretrained: False BackboneStopLayer: - name: "flatten_0" + name: flatten_0 Neck: - name: "FC" + name: FC embedding_size: 2048 class_num: 512 Head: - name: "ArcMargin" + name: ArcMargin embedding_size: 512 class_num: 3997 margin: 0.15 scale: 30 - infer_output_key: "features" - infer_add_softmax: False # loss function config for traing/eval process Loss: @@ -67,43 +70,48 @@ Optimizer: DataLoader: Train: dataset: - name: "ImageNetDataset" - image_root: "./dataset/Inshop/" - cls_label_path: "./dataset/Inshop/train_list.txt" - transform_ops: - - ResizeImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - RandomErasing: - EPSILON: 0.5 - sl: 0.02 - sh: 0.4 - r1: 0.3 - mean: [0., 0., 0.] - + name: ImageNetDataset + image_root: ./dataset/Inshop/ + cls_label_path: ./dataset/Inshop/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - RandomErasing: + EPSILON: 0.5 + sl: 0.02 + sh: 0.4 + r1: 0.3 + mean: [0., 0., 0.] sampler: - name: DistributedRandomIdentitySampler - batch_size: 64 - num_instances: 2 - drop_last: False - shuffle: True + name: DistributedRandomIdentitySampler + batch_size: 64 + num_instances: 2 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 4 + use_shared_memory: True + Eval: Query: - # TOTO: modify to the latest trainer dataset: - name: "ImageNetDataset" - image_root: "./dataset/Inshop/" - cls_label_path: "./dataset/Inshop/query_list.txt" + name: ImageNetDataset + image_root: ./dataset/Inshop/ + cls_label_path: ./dataset/Inshop/query_list.txt transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: @@ -117,20 +125,22 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 6 + num_workers: 4 use_shared_memory: True Gallery: - # TOTO: modify to the latest trainer dataset: - name: "ImageNetDataset" - image_root: "./dataset/Inshop/" - cls_label_path: "./dataset/Inshop/gallery_list.txt" + name: ImageNetDataset + image_root: ./dataset/Inshop/ + cls_label_path: ./dataset/Inshop/gallery_list.txt transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: - scale: 0.00392157 + scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' @@ -140,7 +150,7 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 6 + num_workers: 4 use_shared_memory: True Metric: @@ -149,17 +159,17 @@ Metric: topk: [1, 5] Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: diff --git a/ppcls/configs/Products/ResNet50_vd_SOP.yaml b/ppcls/configs/Products/ResNet50_vd_SOP.yaml index a3bddf4a801f81fadcf7f80e5342bca6b42cd02e..9c078a41dafd795c0204aae132a3c66e87bd6d6e 100644 --- a/ppcls/configs/Products/ResNet50_vd_SOP.yaml +++ b/ppcls/configs/Products/ResNet50_vd_SOP.yaml @@ -1,9 +1,11 @@ # global configs Global: checkpoints: null - pretrained_model: null - output_dir: "./output/" - device: "gpu" +# please download pretrained model via this link: +# https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams + pretrained_model: product_ResNet50_vd_Aliproduct_v1.0_pretrained + output_dir: ./output/ + device: gpu class_num: 11319 save_interval: 10 eval_during_train: True @@ -13,28 +15,28 @@ Global: use_visualdl: False # used for static mode and model export image_shape: [3, 224, 224] - save_inference_dir: "./inference" - eval_mode: "retrieval" + save_inference_dir: ./inference + eval_mode: retrieval # model architecture Arch: - name: "RecModel" + name: RecModel Backbone: - name: "ResNet50_vd" + name: ResNet50_vd pretrained: False BackboneStopLayer: - name: "flatten_0" + name: flatten_0 Neck: - name: "FC" + name: FC embedding_size: 2048 class_num: 512 Head: - name: "ArcMargin" + name: ArcMargin embedding_size: 512 class_num: 11319 margin: 0.15 scale: 30 - infer_output_key: "features" + infer_output_key: features infer_add_softmax: False # loss function config for traing/eval process @@ -67,43 +69,48 @@ Optimizer: DataLoader: Train: dataset: - name: "ImageNetDataset" - image_root: "./dataset/Stanford_Online_Products/" - cls_label_path: "./dataset/Stanford_Online_Products/train_list.txt" - transform_ops: - - ResizeImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 0.00392157 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - RandomErasing: - EPSILON: 0.5 - sl: 0.02 - sh: 0.4 - r1: 0.3 - mean: [0., 0., 0.] + name: ImageNetDataset + image_root: ./dataset/Stanford_Online_Products/ + cls_label_path: ./dataset/Stanford_Online_Products/train_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - RandomErasing: + EPSILON: 0.5 + sl: 0.02 + sh: 0.4 + r1: 0.3 + mean: [0., 0., 0.] sampler: - name: DistributedRandomIdentitySampler - batch_size: 64 - num_instances: 2 - drop_last: False - shuffle: True + name: DistributedRandomIdentitySampler + batch_size: 64 + num_instances: 2 + drop_last: False + shuffle: True loader: - num_workers: 6 - use_shared_memory: True + num_workers: 6 + use_shared_memory: True Eval: Query: - # TOTO: modify to the latest trainer dataset: - name: "ImageNetDataset" - image_root: "./dataset/Stanford_Online_Products/" - cls_label_path: "./dataset/Stanford_Online_Products/test_list.txt" + name: ImageNetDataset + image_root: ./dataset/Stanford_Online_Products/ + cls_label_path: ./dataset/Stanford_Online_Products/test_list.txt transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: @@ -117,16 +124,18 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 6 + num_workers: 4 use_shared_memory: True Gallery: - # TOTO: modify to the latest trainer dataset: - name: "ImageNetDataset" - image_root: "./dataset/Stanford_Online_Products/" - cls_label_path: "./dataset/Stanford_Online_Products/test_list.txt" + name: ImageNetDataset + image_root: ./dataset/Stanford_Online_Products/ + cls_label_path: ./dataset/Stanford_Online_Products/test_list.txt transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: @@ -140,7 +149,7 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 6 + num_workers: 4 use_shared_memory: True Metric: @@ -149,17 +158,17 @@ Metric: topk: [1, 5] Infer: - infer_imgs: "docs/images/whl/demo.jpg" + infer_imgs: docs/images/whl/demo.jpg batch_size: 10 transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - order: '' - - ToCHWImage: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: diff --git a/ppcls/configs/Vehicle/ResNet50.yaml b/ppcls/configs/Vehicle/ResNet50.yaml index 4424fcd3c35eb25d0009b5ffbc1fef7747ca1ccf..e874179885cd8aa34f05ffb8088022865737fd4e 100644 --- a/ppcls/configs/Vehicle/ResNet50.yaml +++ b/ppcls/configs/Vehicle/ResNet50.yaml @@ -73,6 +73,9 @@ DataLoader: bbox_crop: True cls_label_path: "./dataset/CompCars/train_test_split/classification/train_label.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - RandFlipImage: @@ -98,8 +101,8 @@ DataLoader: drop_last: False shuffle: True loader: - num_workers: 6 - use_shared_memory: False + num_workers: 8 + use_shared_memory: True Eval: # TOTO: modify to the latest trainer @@ -110,6 +113,9 @@ DataLoader: cls_label_path: "./dataset/CompCars/train_test_split/classification/test_label.txt" bbox_crop: True transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: @@ -123,8 +129,8 @@ DataLoader: drop_last: False shuffle: False loader: - num_workers: 6 - use_shared_memory: False + num_workers: 8 + use_shared_memory: True Infer: infer_imgs: "docs/images/whl/demo.jpg" diff --git a/ppcls/configs/Vehicle/ResNet50_ReID.yaml b/ppcls/configs/Vehicle/ResNet50_ReID.yaml index d1a919c94470ea6c5a678ff0a6238ef9f55635ad..ce5f31c47786431a200f9c394dcdb74f98d95a59 100644 --- a/ppcls/configs/Vehicle/ResNet50_ReID.yaml +++ b/ppcls/configs/Vehicle/ResNet50_ReID.yaml @@ -71,6 +71,9 @@ DataLoader: image_root: "./dataset/VeRI-Wild/images/" cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - RandFlipImage: @@ -106,6 +109,9 @@ DataLoader: image_root: "./dataset/VeRI-Wild/images" cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id_query.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: @@ -129,6 +135,9 @@ DataLoader: image_root: "./dataset/VeRI-Wild/images" cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id.txt" transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False - ResizeImage: size: 224 - NormalizeImage: diff --git a/ppcls/data/__init__.py b/ppcls/data/__init__.py index eaa3aee67c787d548d7f894b8eff2358b43ec10e..9e896e0e7fdfc7ab3c6498186fef86426af8b21a 100644 --- a/ppcls/data/__init__.py +++ b/ppcls/data/__init__.py @@ -19,7 +19,6 @@ from paddle.io import DistributedBatchSampler, BatchSampler, DataLoader from ppcls.utils import logger from ppcls.data import dataloader -from ppcls.data import imaug # dataset from ppcls.data.dataloader.imagenet_dataset import ImageNetDataset from ppcls.data.dataloader.multilabel_dataset import MultiLabelDataset @@ -28,15 +27,35 @@ from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild from ppcls.data.dataloader.logo_dataset import LogoDataset from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset - # sampler from ppcls.data.dataloader.DistributedRandomIdentitySampler import DistributedRandomIdentitySampler +from ppcls.data import preprocess from ppcls.data.preprocess import transform +def create_operators(params): + """ + create operators based on the config + + Args: + params(list): a dict list, used to create some operators + """ + assert isinstance(params, list), ('operator config should be a list') + ops = [] + for operator in params: + assert isinstance(operator, + dict) and len(operator) == 1, "yaml format error" + op_name = list(operator)[0] + param = {} if operator[op_name] is None else operator[op_name] + op = getattr(preprocess, op_name)(**param) + ops.append(op) + + return ops + + def build_dataloader(config, mode, device, seed=None): assert mode in ['Train', 'Eval', 'Test', 'Gallery', 'Query' - ], "Mode should be Train, Eval, Test, Gallery or Query" + ], "Mode should be Train, Eval, Test, Gallery, Query" # build dataset config_dataset = config[mode]['dataset'] config_dataset = copy.deepcopy(config_dataset) @@ -48,7 +67,7 @@ def build_dataloader(config, mode, device, seed=None): dataset = eval(dataset_name)(**config_dataset) - logger.info("build dataset({}) success...".format(dataset)) + logger.debug("build dataset({}) success...".format(dataset)) # build sampler config_sampler = config[mode]['sampler'] @@ -61,7 +80,7 @@ def build_dataloader(config, mode, device, seed=None): sampler_name = config_sampler.pop("name") batch_sampler = eval(sampler_name)(dataset, **config_sampler) - logger.info("build batch_sampler({}) success...".format(batch_sampler)) + logger.debug("build batch_sampler({}) success...".format(batch_sampler)) # build batch operator def mix_collate_fn(batch): @@ -108,17 +127,5 @@ def build_dataloader(config, mode, device, seed=None): batch_sampler=batch_sampler, collate_fn=batch_collate_fn) - logger.info("build data_loader({}) success...".format(data_loader)) - + logger.debug("build data_loader({}) success...".format(data_loader)) return data_loader - - -''' -# TODO: fix the format -def build_dataloader(config, mode, device, seed=None): - from . import reader - from .reader import Reader - dataloader = Reader(config, mode=mode, places=device)() - return dataloader - -''' diff --git a/ppcls/data/dataloader/common_dataset.py b/ppcls/data/dataloader/common_dataset.py index a99cc23c22cc2e93b153e44e9db7ce2718435179..b7b03d8b9e06aa7aa190fb325c2221db3b666c5c 100644 --- a/ppcls/data/dataloader/common_dataset.py +++ b/ppcls/data/dataloader/common_dataset.py @@ -63,8 +63,8 @@ class CommonDataset(Dataset): def __getitem__(self, idx): try: - img = cv2.imread(self.images[idx]) - img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + with open(self.images[idx], 'rb') as f: + img = f.read() if self._transform_ops: img = transform(img, self._transform_ops) img = img.transpose((2, 0, 1)) diff --git a/ppcls/data/dataloader/icartoon_dataset.py b/ppcls/data/dataloader/icartoon_dataset.py index 32f6038f92667aee0c773ede58bd101218185b96..212341481b8bb10c98578cfb701f6a3c5a9709cd 100644 --- a/ppcls/data/dataloader/icartoon_dataset.py +++ b/ppcls/data/dataloader/icartoon_dataset.py @@ -29,12 +29,8 @@ class ICartoonDataset(CommonDataset): with open(self._cls_path) as fd: lines = fd.readlines() - if seed is not None: - np.random.RandomState(seed).shuffle(lines) - else: - np.random.shuffle(lines) for l in lines: l = l.strip().split("\t") - self.images.append(os.path.join(self._img_root, l[0][2:])) + self.images.append(os.path.join(self._img_root, l[0])) self.labels.append(int(l[1])) assert os.path.exists(self.images[-1]) diff --git a/ppcls/data/dataloader/imagenet_dataset.py b/ppcls/data/dataloader/imagenet_dataset.py index 08846ba87085bd8723c182c20008a4bb75f8dd74..e084bb7419417a70d437c3b163c3d58b4648ca01 100644 --- a/ppcls/data/dataloader/imagenet_dataset.py +++ b/ppcls/data/dataloader/imagenet_dataset.py @@ -31,8 +31,6 @@ class ImageNetDataset(CommonDataset): lines = fd.readlines() if seed is not None: np.random.RandomState(seed).shuffle(lines) - else: - np.random.shuffle(lines) for l in lines: l = l.strip().split(" ") self.images.append(os.path.join(self._img_root, l[0])) diff --git a/ppcls/data/imaug/__init__.py b/ppcls/data/imaug/__init__.py deleted file mode 100644 index 6860382bcc78bdf2a5e16aee9806321ed105abe1..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/__init__.py +++ /dev/null @@ -1,94 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from .autoaugment import ImageNetPolicy as RawImageNetPolicy -from .randaugment import RandAugment as RawRandAugment -from .cutout import Cutout - -from .hide_and_seek import HideAndSeek -from .random_erasing import RandomErasing -from .grid import GridMask - -from .operators import DecodeImage -from .operators import ResizeImage -from .operators import CropImage -from .operators import RandCropImage -from .operators import RandFlipImage -from .operators import NormalizeImage -from .operators import ToCHWImage - -from .batch_operators import MixupOperator -from .batch_operators import CutmixOperator -from .batch_operators import FmixOperator - -import six -import numpy as np -from PIL import Image - - -def transform(data, ops=[]): - """ transform """ - for op in ops: - data = op(data) - return data - - -class AutoAugment(RawImageNetPolicy): - """ ImageNetPolicy wrapper to auto fit different img types """ - - def __init__(self, *args, **kwargs): - if six.PY2: - super(AutoAugment, self).__init__(*args, **kwargs) - else: - super().__init__(*args, **kwargs) - - def __call__(self, img): - if not isinstance(img, Image.Image): - img = np.ascontiguousarray(img) - img = Image.fromarray(img) - - if six.PY2: - img = super(AutoAugment, self).__call__(img) - else: - img = super().__call__(img) - - if isinstance(img, Image.Image): - img = np.asarray(img) - - return img - - -class RandAugment(RawRandAugment): - """ RandAugment wrapper to auto fit different img types """ - - def __init__(self, *args, **kwargs): - if six.PY2: - super(RandAugment, self).__init__(*args, **kwargs) - else: - super().__init__(*args, **kwargs) - - def __call__(self, img): - if not isinstance(img, Image.Image): - img = np.ascontiguousarray(img) - img = Image.fromarray(img) - - if six.PY2: - img = super(RandAugment, self).__call__(img) - else: - img = super().__call__(img) - - if isinstance(img, Image.Image): - img = np.asarray(img) - - return img diff --git a/ppcls/data/imaug/autoaugment.py b/ppcls/data/imaug/autoaugment.py deleted file mode 100644 index 6065697e2a61f18ba2ae5ef05f651f0ae223ddaf..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/autoaugment.py +++ /dev/null @@ -1,264 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# This code is based on https://github.com/DeepVoltaire/AutoAugment/blob/master/autoaugment.py - -from PIL import Image, ImageEnhance, ImageOps -import numpy as np -import random - - -class ImageNetPolicy(object): - """ Randomly choose one of the best 24 Sub-policies on ImageNet. - - Example: - >>> policy = ImageNetPolicy() - >>> transformed = policy(image) - - Example as a PyTorch Transform: - >>> transform=transforms.Compose([ - >>> transforms.Resize(256), - >>> ImageNetPolicy(), - >>> transforms.ToTensor()]) - """ - - def __init__(self, fillcolor=(128, 128, 128)): - self.policies = [ - SubPolicy(0.4, "posterize", 8, 0.6, "rotate", 9, fillcolor), - SubPolicy(0.6, "solarize", 5, 0.6, "autocontrast", 5, fillcolor), - SubPolicy(0.8, "equalize", 8, 0.6, "equalize", 3, fillcolor), - SubPolicy(0.6, "posterize", 7, 0.6, "posterize", 6, fillcolor), - SubPolicy(0.4, "equalize", 7, 0.2, "solarize", 4, fillcolor), - SubPolicy(0.4, "equalize", 4, 0.8, "rotate", 8, fillcolor), - SubPolicy(0.6, "solarize", 3, 0.6, "equalize", 7, fillcolor), - SubPolicy(0.8, "posterize", 5, 1.0, "equalize", 2, fillcolor), - SubPolicy(0.2, "rotate", 3, 0.6, "solarize", 8, fillcolor), - SubPolicy(0.6, "equalize", 8, 0.4, "posterize", 6, fillcolor), - SubPolicy(0.8, "rotate", 8, 0.4, "color", 0, fillcolor), - SubPolicy(0.4, "rotate", 9, 0.6, "equalize", 2, fillcolor), - SubPolicy(0.0, "equalize", 7, 0.8, "equalize", 8, fillcolor), - SubPolicy(0.6, "invert", 4, 1.0, "equalize", 8, fillcolor), - SubPolicy(0.6, "color", 4, 1.0, "contrast", 8, fillcolor), - SubPolicy(0.8, "rotate", 8, 1.0, "color", 2, fillcolor), - SubPolicy(0.8, "color", 8, 0.8, "solarize", 7, fillcolor), - SubPolicy(0.4, "sharpness", 7, 0.6, "invert", 8, fillcolor), - SubPolicy(0.6, "shearX", 5, 1.0, "equalize", 9, fillcolor), - SubPolicy(0.4, "color", 0, 0.6, "equalize", 3, fillcolor), - SubPolicy(0.4, "equalize", 7, 0.2, "solarize", 4, fillcolor), - SubPolicy(0.6, "solarize", 5, 0.6, "autocontrast", 5, fillcolor), - SubPolicy(0.6, "invert", 4, 1.0, "equalize", 8, fillcolor), - SubPolicy(0.6, "color", 4, 1.0, "contrast", 8, fillcolor), - SubPolicy(0.8, "equalize", 8, 0.6, "equalize", 3, fillcolor) - ] - - def __call__(self, img, policy_idx=None): - if policy_idx is None or not isinstance(policy_idx, int): - policy_idx = random.randint(0, len(self.policies) - 1) - else: - policy_idx = policy_idx % len(self.policies) - return self.policies[policy_idx](img) - - def __repr__(self): - return "AutoAugment ImageNet Policy" - - -class CIFAR10Policy(object): - """ Randomly choose one of the best 25 Sub-policies on CIFAR10. - - Example: - >>> policy = CIFAR10Policy() - >>> transformed = policy(image) - - Example as a PyTorch Transform: - >>> transform=transforms.Compose([ - >>> transforms.Resize(256), - >>> CIFAR10Policy(), - >>> transforms.ToTensor()]) - """ - - def __init__(self, fillcolor=(128, 128, 128)): - self.policies = [ - SubPolicy(0.1, "invert", 7, 0.2, "contrast", 6, fillcolor), - SubPolicy(0.7, "rotate", 2, 0.3, "translateX", 9, fillcolor), - SubPolicy(0.8, "sharpness", 1, 0.9, "sharpness", 3, fillcolor), - SubPolicy(0.5, "shearY", 8, 0.7, "translateY", 9, fillcolor), - SubPolicy(0.5, "autocontrast", 8, 0.9, "equalize", 2, fillcolor), - SubPolicy(0.2, "shearY", 7, 0.3, "posterize", 7, fillcolor), - SubPolicy(0.4, "color", 3, 0.6, "brightness", 7, fillcolor), - SubPolicy(0.3, "sharpness", 9, 0.7, "brightness", 9, fillcolor), - SubPolicy(0.6, "equalize", 5, 0.5, "equalize", 1, fillcolor), - SubPolicy(0.6, "contrast", 7, 0.6, "sharpness", 5, fillcolor), - SubPolicy(0.7, "color", 7, 0.5, "translateX", 8, fillcolor), - SubPolicy(0.3, "equalize", 7, 0.4, "autocontrast", 8, fillcolor), - SubPolicy(0.4, "translateY", 3, 0.2, "sharpness", 6, fillcolor), - SubPolicy(0.9, "brightness", 6, 0.2, "color", 8, fillcolor), - SubPolicy(0.5, "solarize", 2, 0.0, "invert", 3, fillcolor), - SubPolicy(0.2, "equalize", 0, 0.6, "autocontrast", 0, fillcolor), - SubPolicy(0.2, "equalize", 8, 0.8, "equalize", 4, fillcolor), - SubPolicy(0.9, "color", 9, 0.6, "equalize", 6, fillcolor), - SubPolicy(0.8, "autocontrast", 4, 0.2, "solarize", 8, fillcolor), - SubPolicy(0.1, "brightness", 3, 0.7, "color", 0, fillcolor), - SubPolicy(0.4, "solarize", 5, 0.9, "autocontrast", 3, fillcolor), - SubPolicy(0.9, "translateY", 9, 0.7, "translateY", 9, fillcolor), - SubPolicy(0.9, "autocontrast", 2, 0.8, "solarize", 3, fillcolor), - SubPolicy(0.8, "equalize", 8, 0.1, "invert", 3, fillcolor), - SubPolicy(0.7, "translateY", 9, 0.9, "autocontrast", 1, fillcolor) - ] - - def __call__(self, img, policy_idx=None): - if policy_idx is None or not isinstance(policy_idx, int): - policy_idx = random.randint(0, len(self.policies) - 1) - else: - policy_idx = policy_idx % len(self.policies) - return self.policies[policy_idx](img) - - def __repr__(self): - return "AutoAugment CIFAR10 Policy" - - -class SVHNPolicy(object): - """ Randomly choose one of the best 25 Sub-policies on SVHN. - - Example: - >>> policy = SVHNPolicy() - >>> transformed = policy(image) - - Example as a PyTorch Transform: - >>> transform=transforms.Compose([ - >>> transforms.Resize(256), - >>> SVHNPolicy(), - >>> transforms.ToTensor()]) - """ - - def __init__(self, fillcolor=(128, 128, 128)): - self.policies = [ - SubPolicy(0.9, "shearX", 4, 0.2, "invert", 3, fillcolor), - SubPolicy(0.9, "shearY", 8, 0.7, "invert", 5, fillcolor), - SubPolicy(0.6, "equalize", 5, 0.6, "solarize", 6, fillcolor), - SubPolicy(0.9, "invert", 3, 0.6, "equalize", 3, fillcolor), - SubPolicy(0.6, "equalize", 1, 0.9, "rotate", 3, fillcolor), - SubPolicy(0.9, "shearX", 4, 0.8, "autocontrast", 3, fillcolor), - SubPolicy(0.9, "shearY", 8, 0.4, "invert", 5, fillcolor), - SubPolicy(0.9, "shearY", 5, 0.2, "solarize", 6, fillcolor), - SubPolicy(0.9, "invert", 6, 0.8, "autocontrast", 1, fillcolor), - SubPolicy(0.6, "equalize", 3, 0.9, "rotate", 3, fillcolor), - SubPolicy(0.9, "shearX", 4, 0.3, "solarize", 3, fillcolor), - SubPolicy(0.8, "shearY", 8, 0.7, "invert", 4, fillcolor), - SubPolicy(0.9, "equalize", 5, 0.6, "translateY", 6, fillcolor), - SubPolicy(0.9, "invert", 4, 0.6, "equalize", 7, fillcolor), - SubPolicy(0.3, "contrast", 3, 0.8, "rotate", 4, fillcolor), - SubPolicy(0.8, "invert", 5, 0.0, "translateY", 2, fillcolor), - SubPolicy(0.7, "shearY", 6, 0.4, "solarize", 8, fillcolor), - SubPolicy(0.6, "invert", 4, 0.8, "rotate", 4, fillcolor), - SubPolicy( - 0.3, "shearY", 7, 0.9, "translateX", 3, fillcolor), SubPolicy( - 0.1, "shearX", 6, 0.6, "invert", 5, fillcolor), SubPolicy( - 0.7, "solarize", 2, 0.6, "translateY", 7, - fillcolor), SubPolicy(0.8, "shearY", 4, 0.8, "invert", - 8, fillcolor), SubPolicy( - 0.7, "shearX", 9, 0.8, - "translateY", 3, - fillcolor), SubPolicy( - 0.8, "shearY", 5, 0.7, - "autocontrast", 3, - fillcolor), - SubPolicy(0.7, "shearX", 2, 0.1, "invert", 5, fillcolor) - ] - - def __call__(self, img, policy_idx=None): - if policy_idx is None or not isinstance(policy_idx, int): - policy_idx = random.randint(0, len(self.policies) - 1) - else: - policy_idx = policy_idx % len(self.policies) - return self.policies[policy_idx](img) - - def __repr__(self): - return "AutoAugment SVHN Policy" - - -class SubPolicy(object): - def __init__(self, - p1, - operation1, - magnitude_idx1, - p2, - operation2, - magnitude_idx2, - fillcolor=(128, 128, 128)): - ranges = { - "shearX": np.linspace(0, 0.3, 10), - "shearY": np.linspace(0, 0.3, 10), - "translateX": np.linspace(0, 150 / 331, 10), - "translateY": np.linspace(0, 150 / 331, 10), - "rotate": np.linspace(0, 30, 10), - "color": np.linspace(0.0, 0.9, 10), - "posterize": np.round(np.linspace(8, 4, 10), 0).astype(np.int), - "solarize": np.linspace(256, 0, 10), - "contrast": np.linspace(0.0, 0.9, 10), - "sharpness": np.linspace(0.0, 0.9, 10), - "brightness": np.linspace(0.0, 0.9, 10), - "autocontrast": [0] * 10, - "equalize": [0] * 10, - "invert": [0] * 10 - } - - # from https://stackoverflow.com/questions/5252170/specify-image-filling-color-when-rotating-in-python-with-pil-and-setting-expand - def rotate_with_fill(img, magnitude): - rot = img.convert("RGBA").rotate(magnitude) - return Image.composite(rot, - Image.new("RGBA", rot.size, (128, ) * 4), - rot).convert(img.mode) - - func = { - "shearX": lambda img, magnitude: img.transform( - img.size, Image.AFFINE, (1, magnitude * random.choice([-1, 1]), 0, 0, 1, 0), - Image.BICUBIC, fillcolor=fillcolor), - "shearY": lambda img, magnitude: img.transform( - img.size, Image.AFFINE, (1, 0, 0, magnitude * random.choice([-1, 1]), 1, 0), - Image.BICUBIC, fillcolor=fillcolor), - "translateX": lambda img, magnitude: img.transform( - img.size, Image.AFFINE, (1, 0, magnitude * img.size[0] * random.choice([-1, 1]), 0, 1, 0), - fillcolor=fillcolor), - "translateY": lambda img, magnitude: img.transform( - img.size, Image.AFFINE, (1, 0, 0, 0, 1, magnitude * img.size[1] * random.choice([-1, 1])), - fillcolor=fillcolor), - "rotate": lambda img, magnitude: rotate_with_fill(img, magnitude), - # "rotate": lambda img, magnitude: img.rotate(magnitude * random.choice([-1, 1])), - "color": lambda img, magnitude: ImageEnhance.Color(img).enhance(1 + magnitude * random.choice([-1, 1])), - "posterize": lambda img, magnitude: ImageOps.posterize(img, magnitude), - "solarize": lambda img, magnitude: ImageOps.solarize(img, magnitude), - "contrast": lambda img, magnitude: ImageEnhance.Contrast(img).enhance( - 1 + magnitude * random.choice([-1, 1])), - "sharpness": lambda img, magnitude: ImageEnhance.Sharpness(img).enhance( - 1 + magnitude * random.choice([-1, 1])), - "brightness": lambda img, magnitude: ImageEnhance.Brightness(img).enhance( - 1 + magnitude * random.choice([-1, 1])), - "autocontrast": lambda img, magnitude: ImageOps.autocontrast(img), - "equalize": lambda img, magnitude: ImageOps.equalize(img), - "invert": lambda img, magnitude: ImageOps.invert(img) - } - - self.p1 = p1 - self.operation1 = func[operation1] - self.magnitude1 = ranges[operation1][magnitude_idx1] - self.p2 = p2 - self.operation2 = func[operation2] - self.magnitude2 = ranges[operation2][magnitude_idx2] - - def __call__(self, img): - if random.random() < self.p1: - img = self.operation1(img, self.magnitude1) - if random.random() < self.p2: - img = self.operation2(img, self.magnitude2) - return img diff --git a/ppcls/data/imaug/batch_operators.py b/ppcls/data/imaug/batch_operators.py deleted file mode 100644 index e12b7b4d2bfbd6f8409323e9dd259eb1d4ec33ab..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/batch_operators.py +++ /dev/null @@ -1,117 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -from __future__ import unicode_literals - -import numpy as np - -from .fmix import sample_mask - - -class BatchOperator(object): - """ BatchOperator """ - - def __init__(self, *args, **kwargs): - pass - - def _unpack(self, batch): - """ _unpack """ - assert isinstance(batch, list), \ - 'batch should be a list filled with tuples (img, label)' - bs = len(batch) - assert bs > 0, 'size of the batch data should > 0' - imgs, labels = list(zip(*batch)) - return np.array(imgs), np.array(labels), bs - - def __call__(self, batch): - return batch - - -class MixupOperator(BatchOperator): - """ Mixup operator """ - - def __init__(self, alpha=0.2): - assert alpha > 0., \ - 'parameter alpha[%f] should > 0.0' % (alpha) - self._alpha = alpha - - def __call__(self, batch): - imgs, labels, bs = self._unpack(batch) - idx = np.random.permutation(bs) - lam = np.random.beta(self._alpha, self._alpha) - lams = np.array([lam] * bs, dtype=np.float32) - imgs = lam * imgs + (1 - lam) * imgs[idx] - return list(zip(imgs, labels, labels[idx], lams)) - - -class CutmixOperator(BatchOperator): - """ Cutmix operator """ - - def __init__(self, alpha=0.2): - assert alpha > 0., \ - 'parameter alpha[%f] should > 0.0' % (alpha) - self._alpha = alpha - - def _rand_bbox(self, size, lam): - """ _rand_bbox """ - w = size[2] - h = size[3] - cut_rat = np.sqrt(1. - lam) - cut_w = np.int(w * cut_rat) - cut_h = np.int(h * cut_rat) - - # uniform - cx = np.random.randint(w) - cy = np.random.randint(h) - - bbx1 = np.clip(cx - cut_w // 2, 0, w) - bby1 = np.clip(cy - cut_h // 2, 0, h) - bbx2 = np.clip(cx + cut_w // 2, 0, w) - bby2 = np.clip(cy + cut_h // 2, 0, h) - - return bbx1, bby1, bbx2, bby2 - - def __call__(self, batch): - imgs, labels, bs = self._unpack(batch) - idx = np.random.permutation(bs) - lam = np.random.beta(self._alpha, self._alpha) - - bbx1, bby1, bbx2, bby2 = self._rand_bbox(imgs.shape, lam) - imgs[:, :, bbx1:bbx2, bby1:bby2] = imgs[idx, :, bbx1:bbx2, bby1:bby2] - lam = 1 - (float(bbx2 - bbx1) * (bby2 - bby1) / - (imgs.shape[-2] * imgs.shape[-1])) - lams = np.array([lam] * bs, dtype=np.float32) - return list(zip(imgs, labels, labels[idx], lams)) - - -class FmixOperator(BatchOperator): - """ Fmix operator """ - - def __init__(self, alpha=1, decay_power=3, max_soft=0., reformulate=False): - self._alpha = alpha - self._decay_power = decay_power - self._max_soft = max_soft - self._reformulate = reformulate - - def __call__(self, batch): - imgs, labels, bs = self._unpack(batch) - idx = np.random.permutation(bs) - size = (imgs.shape[2], imgs.shape[3]) - lam, mask = sample_mask(self._alpha, self._decay_power, \ - size, self._max_soft, self._reformulate) - imgs = mask * imgs + (1 - mask) * imgs[idx] - return list(zip(imgs, labels, labels[idx], [lam] * bs)) diff --git a/ppcls/data/imaug/cutout.py b/ppcls/data/imaug/cutout.py deleted file mode 100644 index 43d557f8619075976f3f313c1535ed9badb65563..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/cutout.py +++ /dev/null @@ -1,41 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# This code is based on https://github.com/uoguelph-mlrg/Cutout - -import numpy as np -import random - - -class Cutout(object): - def __init__(self, n_holes=1, length=112): - self.n_holes = n_holes - self.length = length - - def __call__(self, img): - """ cutout_image """ - h, w = img.shape[:2] - mask = np.ones((h, w), np.float32) - - for n in range(self.n_holes): - y = np.random.randint(h) - x = np.random.randint(w) - - y1 = np.clip(y - self.length // 2, 0, h) - y2 = np.clip(y + self.length // 2, 0, h) - x1 = np.clip(x - self.length // 2, 0, w) - x2 = np.clip(x + self.length // 2, 0, w) - - img[y1:y2, x1:x2] = 0 - return img diff --git a/ppcls/data/imaug/fmix.py b/ppcls/data/imaug/fmix.py deleted file mode 100644 index fb9382115c1f3eb32ac2fd3e90699899006a0469..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/fmix.py +++ /dev/null @@ -1,217 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import math -import random - -import numpy as np -from scipy.stats import beta - - -def fftfreqnd(h, w=None, z=None): - """ Get bin values for discrete fourier transform of size (h, w, z) - - :param h: Required, first dimension size - :param w: Optional, second dimension size - :param z: Optional, third dimension size - """ - fz = fx = 0 - fy = np.fft.fftfreq(h) - - if w is not None: - fy = np.expand_dims(fy, -1) - - if w % 2 == 1: - fx = np.fft.fftfreq(w)[:w // 2 + 2] - else: - fx = np.fft.fftfreq(w)[:w // 2 + 1] - - if z is not None: - fy = np.expand_dims(fy, -1) - if z % 2 == 1: - fz = np.fft.fftfreq(z)[:, None] - else: - fz = np.fft.fftfreq(z)[:, None] - - return np.sqrt(fx * fx + fy * fy + fz * fz) - - -def get_spectrum(freqs, decay_power, ch, h, w=0, z=0): - """ Samples a fourier image with given size and frequencies decayed by decay power - - :param freqs: Bin values for the discrete fourier transform - :param decay_power: Decay power for frequency decay prop 1/f**d - :param ch: Number of channels for the resulting mask - :param h: Required, first dimension size - :param w: Optional, second dimension size - :param z: Optional, third dimension size - """ - scale = np.ones(1) / (np.maximum(freqs, np.array([1. / max(w, h, z)])) - **decay_power) - - param_size = [ch] + list(freqs.shape) + [2] - param = np.random.randn(*param_size) - - scale = np.expand_dims(scale, -1)[None, :] - - return scale * param - - -def make_low_freq_image(decay, shape, ch=1): - """ Sample a low frequency image from fourier space - - :param decay_power: Decay power for frequency decay prop 1/f**d - :param shape: Shape of desired mask, list up to 3 dims - :param ch: Number of channels for desired mask - """ - freqs = fftfreqnd(*shape) - spectrum = get_spectrum(freqs, decay, ch, - *shape) #.reshape((1, *shape[:-1], -1)) - spectrum = spectrum[:, 0] + 1j * spectrum[:, 1] - mask = np.real(np.fft.irfftn(spectrum, shape)) - - if len(shape) == 1: - mask = mask[:1, :shape[0]] - if len(shape) == 2: - mask = mask[:1, :shape[0], :shape[1]] - if len(shape) == 3: - mask = mask[:1, :shape[0], :shape[1], :shape[2]] - - mask = mask - mask = (mask - mask.min()) - mask = mask / mask.max() - return mask - - -def sample_lam(alpha, reformulate=False): - """ Sample a lambda from symmetric beta distribution with given alpha - - :param alpha: Alpha value for beta distribution - :param reformulate: If True, uses the reformulation of [1]. - """ - if reformulate: - lam = beta.rvs(alpha + 1, alpha) - else: - lam = beta.rvs(alpha, alpha) - - return lam - - -def binarise_mask(mask, lam, in_shape, max_soft=0.0): - """ Binarises a given low frequency image such that it has mean lambda. - - :param mask: Low frequency image, usually the result of `make_low_freq_image` - :param lam: Mean value of final mask - :param in_shape: Shape of inputs - :param max_soft: Softening value between 0 and 0.5 which smooths hard edges in the mask. - :return: - """ - idx = mask.reshape(-1).argsort()[::-1] - mask = mask.reshape(-1) - num = math.ceil(lam * mask.size) if random.random() > 0.5 else math.floor( - lam * mask.size) - - eff_soft = max_soft - if max_soft > lam or max_soft > (1 - lam): - eff_soft = min(lam, 1 - lam) - - soft = int(mask.size * eff_soft) - num_low = int(num - soft) - num_high = int(num + soft) - - mask[idx[:num_high]] = 1 - mask[idx[num_low:]] = 0 - mask[idx[num_low:num_high]] = np.linspace(1, 0, (num_high - num_low)) - - mask = mask.reshape((1, 1, in_shape[0], in_shape[1])) - return mask - - -def sample_mask(alpha, decay_power, shape, max_soft=0.0, reformulate=False): - """ Samples a mean lambda from beta distribution parametrised by alpha, creates a low frequency image and binarises - it based on this lambda - - :param alpha: Alpha value for beta distribution from which to sample mean of mask - :param decay_power: Decay power for frequency decay prop 1/f**d - :param shape: Shape of desired mask, list up to 3 dims - :param max_soft: Softening value between 0 and 0.5 which smooths hard edges in the mask. - :param reformulate: If True, uses the reformulation of [1]. - """ - if isinstance(shape, int): - shape = (shape, ) - - # Choose lambda - lam = sample_lam(alpha, reformulate) - - # Make mask, get mean / std - mask = make_low_freq_image(decay_power, shape) - mask = binarise_mask(mask, lam, shape, max_soft) - - return float(lam), mask - - -def sample_and_apply(x, - alpha, - decay_power, - shape, - max_soft=0.0, - reformulate=False): - """ - - :param x: Image batch on which to apply fmix of shape [b, c, shape*] - :param alpha: Alpha value for beta distribution from which to sample mean of mask - :param decay_power: Decay power for frequency decay prop 1/f**d - :param shape: Shape of desired mask, list up to 3 dims - :param max_soft: Softening value between 0 and 0.5 which smooths hard edges in the mask. - :param reformulate: If True, uses the reformulation of [1]. - :return: mixed input, permutation indices, lambda value of mix, - """ - lam, mask = sample_mask(alpha, decay_power, shape, max_soft, reformulate) - index = np.random.permutation(x.shape[0]) - - x1, x2 = x * mask, x[index] * (1 - mask) - return x1 + x2, index, lam - - -class FMixBase: - """ FMix augmentation - - Args: - decay_power (float): Decay power for frequency decay prop 1/f**d - alpha (float): Alpha value for beta distribution from which to sample mean of mask - size ([int] | [int, int] | [int, int, int]): Shape of desired mask, list up to 3 dims - max_soft (float): Softening value between 0 and 0.5 which smooths hard edges in the mask. - reformulate (bool): If True, uses the reformulation of [1]. - """ - - def __init__(self, - decay_power=3, - alpha=1, - size=(32, 32), - max_soft=0.0, - reformulate=False): - super().__init__() - self.decay_power = decay_power - self.reformulate = reformulate - self.size = size - self.alpha = alpha - self.max_soft = max_soft - self.index = None - self.lam = None - - def __call__(self, x): - raise NotImplementedError - - def loss(self, *args, **kwargs): - raise NotImplementedError diff --git a/ppcls/data/imaug/grid.py b/ppcls/data/imaug/grid.py deleted file mode 100644 index 93e0c58ac756bb43bcfda388240c50786487973b..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/grid.py +++ /dev/null @@ -1,89 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# This code is based on https://github.com/akuxcw/GridMask - -import numpy as np -from PIL import Image -import pdb - -# curr -CURR_EPOCH = 0 -# epoch for the prob to be the upper limit -NUM_EPOCHS = 240 - - -class GridMask(object): - def __init__(self, d1=96, d2=224, rotate=1, ratio=0.5, mode=0, prob=1.): - self.d1 = d1 - self.d2 = d2 - self.rotate = rotate - self.ratio = ratio - self.mode = mode - self.st_prob = prob - self.prob = prob - self.last_prob = -1 - - def set_prob(self): - global CURR_EPOCH - global NUM_EPOCHS - self.prob = self.st_prob * min(1, 1.0 * CURR_EPOCH / NUM_EPOCHS) - - def __call__(self, img): - self.set_prob() - if abs(self.last_prob - self.prob) > 1e-10: - global CURR_EPOCH - global NUM_EPOCHS - print( - "self.prob is updated, self.prob={}, CURR_EPOCH: {}, NUM_EPOCHS: {}". - format(self.prob, CURR_EPOCH, NUM_EPOCHS)) - self.last_prob = self.prob - # print("CURR_EPOCH: {}, NUM_EPOCHS: {}, self.prob is set as: {}".format(CURR_EPOCH, NUM_EPOCHS, self.prob) ) - if np.random.rand() > self.prob: - return img - _, h, w = img.shape - hh = int(1.5 * h) - ww = int(1.5 * w) - d = np.random.randint(self.d1, self.d2) - #d = self.d - self.l = int(d * self.ratio + 0.5) - mask = np.ones((hh, ww), np.float32) - st_h = np.random.randint(d) - st_w = np.random.randint(d) - for i in range(-1, hh // d + 1): - s = d * i + st_h - t = s + self.l - s = max(min(s, hh), 0) - t = max(min(t, hh), 0) - mask[s:t, :] *= 0 - for i in range(-1, ww // d + 1): - s = d * i + st_w - t = s + self.l - s = max(min(s, ww), 0) - t = max(min(t, ww), 0) - mask[:, s:t] *= 0 - r = np.random.randint(self.rotate) - mask = Image.fromarray(np.uint8(mask)) - mask = mask.rotate(r) - mask = np.asarray(mask) - mask = mask[(hh - h) // 2:(hh - h) // 2 + h, (ww - w) // 2:(ww - w) // - 2 + w] - - if self.mode == 1: - mask = 1 - mask - - mask = np.expand_dims(mask, axis=0) - img = (img * mask).astype(img.dtype) - - return img diff --git a/ppcls/data/imaug/hide_and_seek.py b/ppcls/data/imaug/hide_and_seek.py deleted file mode 100644 index 5d4a8f97d4a8fb8f5d5972ca7463cd536b69924a..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/hide_and_seek.py +++ /dev/null @@ -1,44 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# This code is based on https://github.com/kkanshul/Hide-and-Seek - -import numpy as np -import random - - -class HideAndSeek(object): - def __init__(self): - # possible grid size, 0 means no hiding - self.grid_sizes = [0, 16, 32, 44, 56] - # hiding probability - self.hide_prob = 0.5 - - def __call__(self, img): - # randomly choose one grid size - grid_size = np.random.choice(self.grid_sizes) - - _, h, w = img.shape - - # hide the patches - if grid_size == 0: - return img - for x in range(0, w, grid_size): - for y in range(0, h, grid_size): - x_end = min(w, x + grid_size) - y_end = min(h, y + grid_size) - if (random.random() <= self.hide_prob): - img[:, x:x_end, y:y_end] = 0 - - return img diff --git a/ppcls/data/imaug/operators.py b/ppcls/data/imaug/operators.py deleted file mode 100644 index f4e2a27a2aa959dffcf5eff7462c642eedb9efc9..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/operators.py +++ /dev/null @@ -1,244 +0,0 @@ -""" -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -from __future__ import unicode_literals - -import six -import math -import random -import cv2 -import numpy as np - -from .autoaugment import ImageNetPolicy - - -class OperatorParamError(ValueError): - """ OperatorParamError - """ - pass - - -class DecodeImage(object): - """ decode image """ - - def __init__(self, to_rgb=True, to_np=False, channel_first=False): - self.to_rgb = to_rgb - self.to_np = to_np # to numpy - self.channel_first = channel_first # only enabled when to_np is True - - def __call__(self, img): - if six.PY2: - assert type(img) is str and len( - img) > 0, "invalid input 'img' in DecodeImage" - else: - assert type(img) is bytes and len( - img) > 0, "invalid input 'img' in DecodeImage" - data = np.frombuffer(img, dtype='uint8') - img = cv2.imdecode(data, 1) - if self.to_rgb: - assert img.shape[2] == 3, 'invalid shape of image[%s]' % ( - img.shape) - img = img[:, :, ::-1] - - if self.channel_first: - img = img.transpose((2, 0, 1)) - - return img - - -class ResizeImage(object): - """ resize image """ - - def __init__(self, size=None, resize_short=None, interpolation=-1): - self.interpolation = interpolation if interpolation >= 0 else None - if resize_short is not None and resize_short > 0: - self.resize_short = resize_short - self.w = None - self.h = None - elif size is not None: - self.resize_short = None - self.w = size if type(size) is int else size[0] - self.h = size if type(size) is int else size[1] - else: - raise OperatorParamError("invalid params for ReisizeImage for '\ - 'both 'size' and 'resize_short' are None") - - def __call__(self, img): - img_h, img_w = img.shape[:2] - if self.resize_short is not None: - percent = float(self.resize_short) / min(img_w, img_h) - w = int(round(img_w * percent)) - h = int(round(img_h * percent)) - else: - w = self.w - h = self.h - if self.interpolation is None: - return cv2.resize(img, (w, h)) - else: - return cv2.resize(img, (w, h), interpolation=self.interpolation) - - -class CropImage(object): - """ crop image """ - - def __init__(self, size): - if type(size) is int: - self.size = (size, size) - else: - self.size = size # (h, w) - - def __call__(self, img): - w, h = self.size - img_h, img_w = img.shape[:2] - w_start = (img_w - w) // 2 - h_start = (img_h - h) // 2 - - w_end = w_start + w - h_end = h_start + h - return img[h_start:h_end, w_start:w_end, :] - - -class RandCropImage(object): - """ random crop image """ - - def __init__(self, size, scale=None, ratio=None, interpolation=-1): - - self.interpolation = interpolation if interpolation >= 0 else None - if type(size) is int: - self.size = (size, size) # (h, w) - else: - self.size = size - - self.scale = [0.08, 1.0] if scale is None else scale - self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio - - def __call__(self, img): - size = self.size - scale = self.scale - ratio = self.ratio - - aspect_ratio = math.sqrt(random.uniform(*ratio)) - w = 1. * aspect_ratio - h = 1. / aspect_ratio - - img_h, img_w = img.shape[:2] - - bound = min((float(img_w) / img_h) / (w**2), - (float(img_h) / img_w) / (h**2)) - scale_max = min(scale[1], bound) - scale_min = min(scale[0], bound) - - target_area = img_w * img_h * random.uniform(scale_min, scale_max) - target_size = math.sqrt(target_area) - w = int(target_size * w) - h = int(target_size * h) - - i = random.randint(0, img_w - w) - j = random.randint(0, img_h - h) - - img = img[j:j + h, i:i + w, :] - if self.interpolation is None: - return cv2.resize(img, size) - else: - return cv2.resize(img, size, interpolation=self.interpolation) - - -class RandFlipImage(object): - """ random flip image - flip_code: - 1: Flipped Horizontally - 0: Flipped Vertically - -1: Flipped Horizontally & Vertically - """ - - def __init__(self, flip_code=1): - assert flip_code in [-1, 0, 1 - ], "flip_code should be a value in [-1, 0, 1]" - self.flip_code = flip_code - - def __call__(self, img): - if random.randint(0, 1) == 1: - return cv2.flip(img, self.flip_code) - else: - return img - - -class AutoAugment(object): - def __init__(self): - self.policy = ImageNetPolicy() - - def __call__(self, img): - from PIL import Image - img = np.ascontiguousarray(img) - img = Image.fromarray(img) - img = self.policy(img) - img = np.asarray(img) - - -class NormalizeImage(object): - """ normalize image such as substract mean, divide std - """ - - def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3): - if isinstance(scale, str): - scale = eval(scale) - assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4." - self.channel_num = channel_num - self.output_dtype = 'float16' if output_fp16 else 'float32' - self.scale = np.float32(scale if scale is not None else 1.0 / 255.0) - self.order = order - mean = mean if mean is not None else [0.485, 0.456, 0.406] - std = std if std is not None else [0.229, 0.224, 0.225] - - shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3) - self.mean = np.array(mean).reshape(shape).astype('float32') - self.std = np.array(std).reshape(shape).astype('float32') - - def __call__(self, img): - from PIL import Image - if isinstance(img, Image.Image): - img = np.array(img) - - assert isinstance(img, - np.ndarray), "invalid input 'img' in NormalizeImage" - - img = (img.astype('float32') * self.scale - self.mean) / self.std - - if self.channel_num == 4: - img_h = img.shape[1] if self.order == 'chw' else img.shape[0] - img_w = img.shape[2] if self.order == 'chw' else img.shape[1] - pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1)) - img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' - else np.concatenate((img, pad_zeros), axis=2)) - return img.astype(self.output_dtype) - - -class ToCHWImage(object): - """ convert hwc image to chw image - """ - - def __init__(self): - pass - - def __call__(self, img): - from PIL import Image - if isinstance(img, Image.Image): - img = np.array(img) - - return img.transpose((2, 0, 1)) diff --git a/ppcls/data/imaug/randaugment.py b/ppcls/data/imaug/randaugment.py deleted file mode 100644 index cb3e9695c08a1ba7a39508a58012c385b4a3a5df..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/randaugment.py +++ /dev/null @@ -1,106 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# This code is based on https://github.com/heartInsert/randaugment - -from PIL import Image, ImageEnhance, ImageOps -import numpy as np -import random - - -class RandAugment(object): - def __init__(self, num_layers=2, magnitude=5, fillcolor=(128, 128, 128)): - self.num_layers = num_layers - self.magnitude = magnitude - self.max_level = 10 - - abso_level = self.magnitude / self.max_level - self.level_map = { - "shearX": 0.3 * abso_level, - "shearY": 0.3 * abso_level, - "translateX": 150.0 / 331 * abso_level, - "translateY": 150.0 / 331 * abso_level, - "rotate": 30 * abso_level, - "color": 0.9 * abso_level, - "posterize": int(4.0 * abso_level), - "solarize": 256.0 * abso_level, - "contrast": 0.9 * abso_level, - "sharpness": 0.9 * abso_level, - "brightness": 0.9 * abso_level, - "autocontrast": 0, - "equalize": 0, - "invert": 0 - } - - # from https://stackoverflow.com/questions/5252170/ - # specify-image-filling-color-when-rotating-in-python-with-pil-and-setting-expand - def rotate_with_fill(img, magnitude): - rot = img.convert("RGBA").rotate(magnitude) - return Image.composite(rot, - Image.new("RGBA", rot.size, (128, ) * 4), - rot).convert(img.mode) - - rnd_ch_op = random.choice - - self.func = { - "shearX": lambda img, magnitude: img.transform( - img.size, - Image.AFFINE, - (1, magnitude * rnd_ch_op([-1, 1]), 0, 0, 1, 0), - Image.BICUBIC, - fillcolor=fillcolor), - "shearY": lambda img, magnitude: img.transform( - img.size, - Image.AFFINE, - (1, 0, 0, magnitude * rnd_ch_op([-1, 1]), 1, 0), - Image.BICUBIC, - fillcolor=fillcolor), - "translateX": lambda img, magnitude: img.transform( - img.size, - Image.AFFINE, - (1, 0, magnitude * img.size[0] * rnd_ch_op([-1, 1]), 0, 1, 0), - fillcolor=fillcolor), - "translateY": lambda img, magnitude: img.transform( - img.size, - Image.AFFINE, - (1, 0, 0, 0, 1, magnitude * img.size[1] * rnd_ch_op([-1, 1])), - fillcolor=fillcolor), - "rotate": lambda img, magnitude: rotate_with_fill(img, magnitude), - "color": lambda img, magnitude: ImageEnhance.Color(img).enhance( - 1 + magnitude * rnd_ch_op([-1, 1])), - "posterize": lambda img, magnitude: - ImageOps.posterize(img, magnitude), - "solarize": lambda img, magnitude: - ImageOps.solarize(img, magnitude), - "contrast": lambda img, magnitude: - ImageEnhance.Contrast(img).enhance( - 1 + magnitude * rnd_ch_op([-1, 1])), - "sharpness": lambda img, magnitude: - ImageEnhance.Sharpness(img).enhance( - 1 + magnitude * rnd_ch_op([-1, 1])), - "brightness": lambda img, magnitude: - ImageEnhance.Brightness(img).enhance( - 1 + magnitude * rnd_ch_op([-1, 1])), - "autocontrast": lambda img, magnitude: - ImageOps.autocontrast(img), - "equalize": lambda img, magnitude: ImageOps.equalize(img), - "invert": lambda img, magnitude: ImageOps.invert(img) - } - - def __call__(self, img): - avaiable_op_names = list(self.level_map.keys()) - for layer_num in range(self.num_layers): - op_name = np.random.choice(avaiable_op_names) - img = self.func[op_name](img, self.level_map[op_name]) - return img diff --git a/ppcls/data/imaug/random_erasing.py b/ppcls/data/imaug/random_erasing.py deleted file mode 100644 index 76e4abf39b399cc91737e8ee2f850f40a2695b06..0000000000000000000000000000000000000000 --- a/ppcls/data/imaug/random_erasing.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -#This code is based on https://github.com/zhunzhong07/Random-Erasing - -import math -import random - -import numpy as np - - -class RandomErasing(object): - def __init__(self, EPSILON=0.5, sl=0.02, sh=0.4, r1=0.3, - mean=[0., 0., 0.]): - self.EPSILON = EPSILON - self.mean = mean - self.sl = sl - self.sh = sh - self.r1 = r1 - - def __call__(self, img): - if random.uniform(0, 1) > self.EPSILON: - return img - - for attempt in range(100): - area = img.shape[1] * img.shape[2] - - target_area = random.uniform(self.sl, self.sh) * area - aspect_ratio = random.uniform(self.r1, 1 / self.r1) - - h = int(round(math.sqrt(target_area * aspect_ratio))) - w = int(round(math.sqrt(target_area / aspect_ratio))) - - if w < img.shape[2] and h < img.shape[1]: - x1 = random.randint(0, img.shape[1] - h) - y1 = random.randint(0, img.shape[2] - w) - if img.shape[0] == 3: - img[0, x1:x1 + h, y1:y1 + w] = self.mean[0] - img[1, x1:x1 + h, y1:y1 + w] = self.mean[1] - img[2, x1:x1 + h, y1:y1 + w] = self.mean[2] - else: - img[0, x1:x1 + h, y1:y1 + w] = self.mean[1] - return img - return img diff --git a/ppcls/data/postprocess/__init__.py b/ppcls/data/postprocess/__init__.py index 6ef0ea819a638da942795d0c4f6f9f5c5a6bba3b..801e7f101cec0d2781c232074f1543821d2aa2d1 100644 --- a/ppcls/data/postprocess/__init__.py +++ b/ppcls/data/postprocess/__init__.py @@ -25,3 +25,17 @@ def build_postprocess(config): mod = importlib.import_module(__name__) postprocess_func = getattr(mod, model_name)(**config) return postprocess_func + + +class DistillationPostProcess(object): + def __init__(self, model_name="Student", key=None, func="Topk", **kargs): + super().__init__() + self.func = eval(func)(**kargs) + self.model_name = model_name + self.key = key + + def __call__(self, x, file_names=None): + x = x[self.model_name] + if self.key is not None: + x = x[self.key] + return self.func(x, file_names=file_names) diff --git a/ppcls/data/reader.py b/ppcls/data/reader.py deleted file mode 100755 index e30a79f7fc92b11be14a02742041e481cd0b0294..0000000000000000000000000000000000000000 --- a/ppcls/data/reader.py +++ /dev/null @@ -1,319 +0,0 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import random -import imghdr -import os -import signal - -from paddle.io import Dataset, DataLoader, DistributedBatchSampler - -from . import imaug -from .imaug import transform -from ppcls.utils import logger - -trainers_num = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) -trainer_id = int(os.environ.get("PADDLE_TRAINER_ID", 0)) - - -class ModeException(Exception): - """ - ModeException - """ - - def __init__(self, message='', mode=''): - message += "\nOnly the following 3 modes are supported: " \ - "train, valid, test. Given mode is {}".format(mode) - super(ModeException, self).__init__(message) - - -class SampleNumException(Exception): - """ - SampleNumException - """ - - def __init__(self, message='', sample_num=0, batch_size=1): - message += "\nError: The number of the whole data ({}) " \ - "is smaller than the batch_size ({}), and drop_last " \ - "is turnning on, so nothing will feed in program, " \ - "Terminated now. Please reset batch_size to a smaller " \ - "number or feed more data!".format(sample_num, batch_size) - super(SampleNumException, self).__init__(message) - - -class ShuffleSeedException(Exception): - """ - ShuffleSeedException - """ - - def __init__(self, message=''): - message += "\nIf trainers_num > 1, the shuffle_seed must be set, " \ - "because the order of batch data generated by reader " \ - "must be the same in the respective processes." - super(ShuffleSeedException, self).__init__(message) - - -def check_params(params): - """ - check params to avoid unexpect errors - - Args: - params(dict): - """ - if 'shuffle_seed' not in params: - params['shuffle_seed'] = None - - if trainers_num > 1 and params['shuffle_seed'] is None: - raise ShuffleSeedException() - - data_dir = params.get('data_dir', '') - assert os.path.isdir(data_dir), \ - "{} doesn't exist, please check datadir path".format(data_dir) - - if params['mode'] != 'test': - file_list = params.get('file_list', '') - assert os.path.isfile(file_list), \ - "{} doesn't exist, please check file list path".format(file_list) - - -def create_file_list(params): - """ - if mode is test, create the file list - - Args: - params(dict): - """ - data_dir = params.get('data_dir', '') - params['file_list'] = ".tmp.txt" - imgtype_list = {'jpg', 'bmp', 'png', 'jpeg', 'rgb', 'tif', 'tiff'} - with open(params['file_list'], "w") as fout: - tmp_file_list = os.listdir(data_dir) - for file_name in tmp_file_list: - file_path = os.path.join(data_dir, file_name) - if imghdr.what(file_path) not in imgtype_list: - continue - fout.write(file_name + " 0" + "\n") - - -def shuffle_lines(full_lines, seed=None): - """ - random shuffle lines - Args: - full_lines(list): - seed(int): random seed - """ - if seed is not None: - np.random.RandomState(seed).shuffle(full_lines) - else: - np.random.shuffle(full_lines) - - return full_lines - - -def get_file_list(params): - """ - read label list from file and shuffle the list - - Args: - params(dict): - """ - if params['mode'] == 'test': - create_file_list(params) - - with open(params['file_list']) as flist: - full_lines = [line.strip() for line in flist] - - if params["mode"] == "train": - full_lines = shuffle_lines(full_lines, seed=params['shuffle_seed']) - - return full_lines - - -def create_operators(params): - """ - create operators based on the config - - Args: - params(list): a dict list, used to create some operators - """ - assert isinstance(params, list), ('operator config should be a list') - ops = [] - for operator in params: - assert isinstance(operator, - dict) and len(operator) == 1, "yaml format error" - op_name = list(operator)[0] - param = {} if operator[op_name] is None else operator[op_name] - op = getattr(imaug, op_name)(**param) - ops.append(op) - - return ops - - -def term_mp(sig_num, frame): - """ kill all child processes - """ - pid = os.getpid() - pgid = os.getpgid(os.getpid()) - logger.info("main proc {} exit, kill process group " - "{}".format(pid, pgid)) - os.killpg(pgid, signal.SIGKILL) - return - - -class CommonDataset(Dataset): - def __init__(self, params): - self.params = params - self.mode = params.get("mode", "train") - self.full_lines = get_file_list(params) - self.delimiter = params.get('delimiter', ' ') - self.ops = create_operators(params['transforms']) - self.num_samples = len(self.full_lines) - return - - def __getitem__(self, idx): - try: - line = self.full_lines[idx] - img_path, label = line.split(self.delimiter) - img_path = os.path.join(self.params['data_dir'], img_path) - with open(img_path, 'rb') as f: - img = f.read() - return (transform(img, self.ops), int(label)) - except Exception as e: - logger.error("data read faild: {}, exception info: {}".format(line, - e)) - return self.__getitem__(random.randint(0, len(self))) - - def __len__(self): - return self.num_samples - - -class MultiLabelDataset(Dataset): - """ - Define dataset class for multilabel image classification - """ - - def __init__(self, params): - self.params = params - self.mode = params.get("mode", "train") - self.full_lines = get_file_list(params) - self.delimiter = params.get("delimiter", "\t") - self.ops = create_operators(params["transforms"]) - self.num_samples = len(self.full_lines) - return - - def __getitem__(self, idx): - try: - line = self.full_lines[idx] - img_path, label_str = line.split(self.delimiter) - img_path = os.path.join(self.params["data_dir"], img_path) - with open(img_path, "rb") as f: - img = f.read() - - labels = label_str.split(',') - labels = [int(i) for i in labels] - - return (transform(img, self.ops), - np.array(labels).astype("float32")) - except Exception as e: - logger.error("data read failed: {}, exception info: {}".format( - line, e)) - return self.__getitem__(random.randint(0, len(self))) - - def __len__(self): - return self.num_samples - - -class Reader: - """ - Create a reader for trainning/validate/test - - Args: - config(dict): arguments - mode(str): train or val or test - seed(int): random seed used to generate same sequence in each trainer - - Returns: - the specific reader - """ - - def __init__(self, config, mode='train', places=None): - try: - self.params = config[mode.capitalize()] - except KeyError: - raise ModeException(mode=mode) - - use_mix = config.get('use_mix') - self.params['mode'] = mode - self.shuffle = mode == "train" - self.is_train = mode == "train" - - self.collate_fn = None - self.batch_ops = [] - if use_mix and mode == "train": - self.batch_ops = create_operators(self.params['mix']) - self.collate_fn = self.mix_collate_fn - - self.places = places - self.use_xpu = config.get("use_xpu", False) - self.multilabel = config.get("multilabel", False) - - def mix_collate_fn(self, batch): - batch = transform(batch, self.batch_ops) - # batch each field - slots = [] - for items in batch: - for i, item in enumerate(items): - if len(slots) < len(items): - slots.append([item]) - else: - slots[i].append(item) - - return [np.stack(slot, axis=0) for slot in slots] - - def __call__(self): - batch_size = int(self.params['batch_size']) // trainers_num - - if self.multilabel: - dataset = MultiLabelDataset(self.params) - else: - dataset = CommonDataset(self.params) - if (self.params['mode'] != "train") and self.use_xpu: - loader = DataLoader( - dataset, - places=self.places, - batch_size=batch_size, - drop_last=False, - return_list=True, - shuffle=False, - num_workers=self.params["num_workers"]) - else: - is_train = self.is_train - batch_sampler = DistributedBatchSampler( - dataset, - batch_size=batch_size, - shuffle=self.shuffle and is_train, - drop_last=is_train) - loader = DataLoader( - dataset, - batch_sampler=batch_sampler, - collate_fn=self.collate_fn if is_train else None, - places=self.places, - return_list=True, - num_workers=self.params["num_workers"]) - return loader - - -signal.signal(signal.SIGINT, term_mp) -signal.signal(signal.SIGTERM, term_mp) diff --git a/ppcls/engine/trainer.py b/ppcls/engine/trainer.py index 513fba1e091442b188216f609bf61f7f1fe295b4..4234d62940ed5f74272d0954cbed8c7dfbdbb171 100644 --- a/ppcls/engine/trainer.py +++ b/ppcls/engine/trainer.py @@ -30,6 +30,8 @@ import paddle.distributed as dist from ppcls.utils.check import check_gpu from ppcls.utils.misc import AverageMeter from ppcls.utils import logger +from ppcls.utils.logger import init_logger +from ppcls.utils.config import print_config from ppcls.data import build_dataloader from ppcls.arch import build_model from ppcls.loss import build_loss @@ -41,7 +43,7 @@ from ppcls.utils import save_load from ppcls.data.utils.get_image_list import get_image_list from ppcls.data.postprocess import build_postprocess -from ppcls.data.reader import create_operators +from ppcls.data import create_operators class Trainer(object): @@ -49,6 +51,11 @@ class Trainer(object): self.mode = mode self.config = config self.output_dir = self.config['Global']['output_dir'] + + log_file = os.path.join(self.output_dir, self.config["Arch"]["name"], + f"{mode}.log") + init_logger(name='root', log_file=log_file) + print_config(config) # set device assert self.config["Global"]["device"] in ["cpu", "gpu", "xpu"] self.device = paddle.set_device(self.config["Global"]["device"]) @@ -153,8 +160,8 @@ class Trainer(object): time_info[key].reset() time_info["reader_cost"].update(time.time() - tic) batch_size = batch[0].shape[0] - batch[1] = paddle.to_tensor(batch[1].numpy().astype("int64") - .reshape([-1, 1])) + batch[1] = batch[1].reshape([-1, 1]).astype("int64") + global_step += 1 # image input if not self.is_rec: @@ -206,8 +213,9 @@ class Trainer(object): eta_msg = "eta: {:s}".format( str(datetime.timedelta(seconds=int(eta_sec)))) logger.info( - "[Train][Epoch {}][Iter: {}/{}]{}, {}, {}, {}, {}". - format(epoch_id, iter_id, + "[Train][Epoch {}/{}][Iter: {}/{}]{}, {}, {}, {}, {}". + format(epoch_id, self.config["Global"][ + "epochs"], iter_id, len(self.train_dataloader), lr_msg, metric_msg, time_msg, ips_msg, eta_msg)) tic = time.time() @@ -216,14 +224,14 @@ class Trainer(object): "{}: {:.5f}".format(key, output_info[key].avg) for key in output_info ]) - logger.info("[Train][Epoch {}][Avg]{}".format(epoch_id, - metric_msg)) + logger.info("[Train][Epoch {}/{}][Avg]{}".format( + epoch_id, self.config["Global"]["epochs"], metric_msg)) output_info.clear() # eval model and save model if possible if self.config["Global"][ "eval_during_train"] and epoch_id % self.config["Global"][ - "eval_during_train"] == 0: + "eval_interval"] == 0: acc = self.eval(epoch_id) if acc > best_metric["metric"]: best_metric["metric"] = acc @@ -235,6 +243,8 @@ class Trainer(object): self.output_dir, model_name=self.config["Arch"]["name"], prefix="best_model") + logger.info("[Eval][Epoch {}][best metric: {}]".format( + epoch_id, best_metric["metric"])) self.model.train() # save model @@ -245,14 +255,21 @@ class Trainer(object): "epoch": epoch_id}, self.output_dir, model_name=self.config["Arch"]["name"], - prefix="ppcls_epoch_{}".format(epoch_id)) + prefix="epoch_{}".format(epoch_id)) + # save the latest model + save_load.save_model( + self.model, + optimizer, {"metric": acc, + "epoch": epoch_id}, + self.output_dir, + model_name=self.config["Arch"]["name"], + prefix="latest") def build_avg_metrics(self, info_dict): return {key: AverageMeter(key, '7.5f') for key in info_dict} @paddle.no_grad() def eval(self, epoch_id=0): - self.model.eval() if self.eval_loss_func is None: loss_config = self.config.get("Loss", None) @@ -318,7 +335,7 @@ class Trainer(object): time_info["reader_cost"].update(time.time() - tic) batch_size = batch[0].shape[0] batch[0] = paddle.to_tensor(batch[0]).astype("float32") - batch[1] = paddle.to_tensor(batch[1]).reshape([-1, 1]) + batch[1] = batch[1].reshape([-1, 1]).astype("int64") # image input if self.is_rec: out = self.model(batch[0], batch[1]) @@ -392,9 +409,7 @@ class Trainer(object): name='gallery') query_feas, query_img_id, query_query_id = self._cal_feature( name='query') - gallery_img_id = gallery_img_id - # if gallery_unique_id is not None: - # gallery_unique_id = gallery_unique_id + # step2. do evaluation sim_block_size = self.config["Global"].get("sim_block_size", 64) sections = [sim_block_size] * (len(query_feas) // sim_block_size) @@ -404,8 +419,7 @@ class Trainer(object): if query_query_id is not None: query_id_blocks = paddle.split( query_query_id, num_or_sections=sections) - image_id_blocks = paddle.split( - query_img_id, num_or_sections=sections) + image_id_blocks = paddle.split(query_img_id, num_or_sections=sections) metric_key = None if self.eval_metric_func is None: @@ -423,20 +437,21 @@ class Trainer(object): image_id_mask = (image_id_block != gallery_img_id.t()) keep_mask = paddle.logical_or(query_id_mask, image_id_mask) - similarity_matrix = similarity_matrix * keep_mask.astype("float32") - - metric_tmp = self.eval_metric_func(similarity_matrix,image_id_blocks[block_idx], gallery_img_id) - + similarity_matrix = similarity_matrix * keep_mask.astype( + "float32") + + metric_tmp = self.eval_metric_func(similarity_matrix, + image_id_blocks[block_idx], + gallery_img_id) + for key in metric_tmp: if key not in metric_dict: - metric_dict[key] = metric_tmp[key] + metric_dict[key] = metric_tmp[key] * block_fea.shape[ + 0] / len(query_feas) else: - metric_dict[key] += metric_tmp[key] - - num_sections = len(fea_blocks) - for key in metric_dict: - metric_dict[key] = metric_dict[key]/num_sections - + metric_dict[key] += metric_tmp[key] * block_fea.shape[ + 0] / len(query_feas) + metric_info_list = [] for key in metric_dict: if metric_key is None: @@ -445,8 +460,7 @@ class Trainer(object): metric_msg = ", ".join(metric_info_list) logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg)) - return metric_dict[metric_key] - + return metric_dict[metric_key] def _cal_feature(self, name='gallery'): all_feas = None @@ -463,10 +477,10 @@ class Trainer(object): for idx, batch in enumerate(dataloader( )): # load is very time-consuming batch = [paddle.to_tensor(x) for x in batch] - batch[1] = batch[1].reshape([-1, 1]) + batch[1] = batch[1].reshape([-1, 1]).astype("int64") if len(batch) == 3: has_unique_id = True - batch[2] = batch[2].reshape([-1, 1]) + batch[2] = batch[2].reshape([-1, 1]).astype("int64") out = self.model(batch[0], batch[1]) batch_feas = out["features"] diff --git a/ppcls/loss/__init__.py b/ppcls/loss/__init__.py index c49a5355bbbd020886c5a4e81febecdd3460b025..fc1be87e317cf34b6daa486c237a2a59b483ebed 100644 --- a/ppcls/loss/__init__.py +++ b/ppcls/loss/__init__.py @@ -13,7 +13,12 @@ from .trihardloss import TriHardLoss from .triplet import TripletLoss, TripletLossV2 from .supconloss import SupConLoss from .pairwisecosface import PairwiseCosface +from .dmlloss import DMLLoss +from .distanceloss import DistanceLoss +from .distillationloss import DistillationCELoss +from .distillationloss import DistillationGTCELoss +from .distillationloss import DistillationDMLLoss class CombinedLoss(nn.Layer): @@ -47,5 +52,5 @@ class CombinedLoss(nn.Layer): def build_loss(config): module_class = CombinedLoss(copy.deepcopy(config)) - logger.info("build loss {} success.".format(module_class)) + logger.debug("build loss {} success.".format(module_class)) return module_class diff --git a/ppcls/loss/celoss.py b/ppcls/loss/celoss.py index 257c41e13dfeac63b1827557f5964745eb765b2b..54c3703009beef11b4a8686620003f6bb948cd58 100644 --- a/ppcls/loss/celoss.py +++ b/ppcls/loss/celoss.py @@ -1,4 +1,4 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,113 +13,40 @@ # limitations under the License. import paddle +import paddle.nn as nn import paddle.nn.functional as F -__all__ = ['CELoss', 'JSDivLoss', 'KLDivLoss'] +class CELoss(nn.Layer): + def __init__(self, epsilon=None): + super().__init__() + if epsilon is not None and (epsilon <= 0 or epsilon >= 1): + epsilon = None + self.epsilon = epsilon -class Loss(object): - """ - Loss - """ - - def __init__(self, class_dim=1000, epsilon=None): - assert class_dim > 1, "class_dim=%d is not larger than 1" % (class_dim) - self._class_dim = class_dim - if epsilon is not None and epsilon >= 0.0 and epsilon <= 1.0: - self._epsilon = epsilon - self._label_smoothing = True #use label smoothing.(Actually, it is softmax label) - else: - self._epsilon = None - self._label_smoothing = False - - #do label_smoothing - def _labelsmoothing(self, target): - if target.shape[-1] != self._class_dim: - one_hot_target = F.one_hot( - target, - self._class_dim) #do ont hot(23,34,46)-> 3 * _class_dim + def _labelsmoothing(self, target, class_num): + if target.shape[-1] != class_num: + one_hot_target = F.one_hot(target, class_num) else: one_hot_target = target - - #do label_smooth - soft_target = F.label_smooth( - one_hot_target, - epsilon=self._epsilon) #(1 - epsilon) * input + eposilon / K. - soft_target = paddle.reshape(soft_target, shape=[-1, self._class_dim]) + soft_target = F.label_smooth(one_hot_target, epsilon=self.epsilon) + soft_target = paddle.reshape(soft_target, shape=[-1, class_num]) return soft_target - def _crossentropy(self, input, target, use_pure_fp16=False): - if self._label_smoothing: - target = self._labelsmoothing(target) - input = -F.log_softmax(input, axis=-1) #softmax and do log - cost = paddle.sum(target * input, axis=-1) #sum - else: - cost = F.cross_entropy(input=input, label=target) - - if use_pure_fp16: - avg_cost = paddle.sum(cost) - else: - avg_cost = paddle.mean(cost) - return avg_cost - - def _kldiv(self, input, target, name=None): - eps = 1.0e-10 - cost = target * paddle.log( - (target + eps) / (input + eps)) * self._class_dim - return cost - - def _jsdiv(self, input, - target): #so the input and target is the fc output; no softmax - input = F.softmax(input) - target = F.softmax(target) - - #two distribution - cost = self._kldiv(input, target) + self._kldiv(target, input) - cost = cost / 2 - avg_cost = paddle.mean(cost) - return avg_cost - - def __call__(self, input, target): - pass - - -class CELoss(Loss): - """ - Cross entropy loss - """ - - def __init__(self, class_dim=1000, epsilon=None): - super(CELoss, self).__init__(class_dim, epsilon) - - def __call__(self, input, target, use_pure_fp16=False): - if type(input) is dict: - logits = input["logits"] + def forward(self, x, label): + if isinstance(x, dict): + x = x["logits"] + if self.epsilon is not None: + class_num = x.shape[-1] + label = self._labelsmoothing(label, class_num) + x = -F.log_softmax(x, axis=-1) + loss = paddle.sum(x * label, axis=-1) else: - logits = input - cost = self._crossentropy(logits, target, use_pure_fp16) - return {"CELoss": cost} - - -class JSDivLoss(Loss): - """ - JSDiv loss - """ - - def __init__(self, class_dim=1000, epsilon=None): - super(JSDivLoss, self).__init__(class_dim, epsilon) - - def __call__(self, input, target): - cost = self._jsdiv(input, target) - return cost - - -class KLDivLoss(paddle.nn.Layer): - def __init__(self): - super(KLDivLoss, self).__init__() - - def __call__(self, p, q, is_logit=True): - if is_logit: - p = paddle.nn.functional.softmax(p) - q = paddle.nn.functional.softmax(q) - return -(p * paddle.log(q + 1e-8)).sum(1).mean() + if label.shape[-1] == x.shape[-1]: + label = F.softmax(label, axis=-1) + soft_label = True + else: + soft_label = False + loss = F.cross_entropy(x, label=label, soft_label=soft_label) + loss = loss.mean() + return {"CELoss": loss} diff --git a/ppcls/loss/distanceloss.py b/ppcls/loss/distanceloss.py new file mode 100644 index 0000000000000000000000000000000000000000..0a09f0cb2e0d0edd74ad3f10fb9b03c514ef21cb --- /dev/null +++ b/ppcls/loss/distanceloss.py @@ -0,0 +1,43 @@ +#copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddle.nn import L1Loss +from paddle.nn import MSELoss as L2Loss +from paddle.nn import SmoothL1Loss + + +class DistanceLoss(nn.Layer): + """ + DistanceLoss: + mode: loss mode + """ + + def __init__(self, mode="l2", **kargs): + super().__init__() + assert mode in ["l1", "l2", "smooth_l1"] + if mode == "l1": + self.loss_func = nn.L1Loss(**kargs) + elif mode == "l2": + self.loss_func = nn.MSELoss(**kargs) + elif mode == "smooth_l1": + self.loss_func = nn.SmoothL1Loss(**kargs) + self.mode = mode + + def forward(self, x, y): + loss = self.loss_func(x, y) + return {"loss_{}".format(self.mode): loss} diff --git a/ppcls/loss/distillationloss.py b/ppcls/loss/distillationloss.py new file mode 100644 index 0000000000000000000000000000000000000000..54dc601b6d18403f2490605b18809f0ca6de116a --- /dev/null +++ b/ppcls/loss/distillationloss.py @@ -0,0 +1,141 @@ +#copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +import paddle +import paddle.nn as nn + +from .celoss import CELoss +from .dmlloss import DMLLoss +from .distanceloss import DistanceLoss + + +class DistillationCELoss(CELoss): + """ + DistillationCELoss + """ + + def __init__(self, + model_name_pairs=[], + epsilon=None, + key=None, + name="loss_ce"): + super().__init__(epsilon=epsilon) + assert isinstance(model_name_pairs, list) + self.key = key + self.model_name_pairs = model_name_pairs + self.name = name + + def forward(self, predicts, batch): + loss_dict = dict() + for idx, pair in enumerate(self.model_name_pairs): + out1 = predicts[pair[0]] + out2 = predicts[pair[1]] + if self.key is not None: + out1 = out1[self.key] + out2 = out2[self.key] + loss = super().forward(out1, out2) + for key in loss: + loss_dict["{}_{}_{}".format(key, pair[0], pair[1])] = loss[key] + return loss_dict + + +class DistillationGTCELoss(CELoss): + """ + DistillationGTCELoss + """ + + def __init__(self, + model_names=[], + epsilon=None, + key=None, + name="loss_gt_ce"): + super().__init__(epsilon=epsilon) + assert isinstance(model_names, list) + self.key = key + self.model_names = model_names + self.name = name + + def forward(self, predicts, batch): + loss_dict = dict() + for idx, name in enumerate(self.model_names): + out = predicts[name] + if self.key is not None: + out = out[self.key] + loss = super().forward(out, batch) + for key in loss: + loss_dict["{}_{}".format(key, name)] = loss[key] + return loss_dict + + +class DistillationDMLLoss(DMLLoss): + """ + """ + + def __init__(self, + model_name_pairs=[], + act=None, + key=None, + name="loss_dml"): + super().__init__(act=act) + assert isinstance(model_name_pairs, list) + self.key = key + self.model_name_pairs = model_name_pairs + self.name = name + + def forward(self, predicts, batch): + loss_dict = dict() + for idx, pair in enumerate(self.model_name_pairs): + out1 = predicts[pair[0]] + out2 = predicts[pair[1]] + if self.key is not None: + out1 = out1[self.key] + out2 = out2[self.key] + loss = super().forward(out1, out2) + if isinstance(loss, dict): + for key in loss: + loss_dict["{}_{}_{}_{}".format(key, pair[0], pair[1], + idx)] = loss[key] + else: + loss_dict["{}_{}".format(self.name, idx)] = loss + return loss_dict + + +class DistillationDistanceLoss(DistanceLoss): + """ + """ + + def __init__(self, + mode="l2", + model_name_pairs=[], + key=None, + name="loss_", + **kargs): + super().__init__(mode=mode, **kargs) + assert isinstance(model_name_pairs, list) + self.key = key + self.model_name_pairs = model_name_pairs + self.name = name + "_l2" + + def forward(self, predicts, batch): + loss_dict = dict() + for idx, pair in enumerate(self.model_name_pairs): + out1 = predicts[pair[0]] + out2 = predicts[pair[1]] + if self.key is not None: + out1 = out1[self.key] + out2 = out2[self.key] + loss = super().forward(out1, out2) + for key in loss: + loss_dict["{}_{}_{}".format(self.name, key, idx)] = loss[key] + return loss_dict diff --git a/ppcls/loss/dmlloss.py b/ppcls/loss/dmlloss.py new file mode 100644 index 0000000000000000000000000000000000000000..d8bb833d5a4dfd7c33ba1f01b9f5b775b87a1d82 --- /dev/null +++ b/ppcls/loss/dmlloss.py @@ -0,0 +1,46 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class DMLLoss(nn.Layer): + """ + DMLLoss + """ + + def __init__(self, act="softmax"): + super().__init__() + if act is not None: + assert act in ["softmax", "sigmoid"] + if act == "softmax": + self.act = nn.Softmax(axis=-1) + elif act == "sigmoid": + self.act = nn.Sigmoid() + else: + self.act = None + + def forward(self, out1, out2): + if self.act is not None: + out1 = self.act(out1) + out2 = self.act(out2) + + log_out1 = paddle.log(out1) + log_out2 = paddle.log(out2) + loss = (F.kl_div( + log_out1, out2, reduction='batchmean') + F.kl_div( + log_out2, out1, reduction='batchmean')) / 2.0 + return {"DMLLoss": loss} diff --git a/ppcls/metric/__init__.py b/ppcls/metric/__init__.py index 696e8f8580f089f53dc9808fd479535f25188ea5..36161b69166f0de0ceec30c72b8e1b04d447d27f 100644 --- a/ppcls/metric/__init__.py +++ b/ppcls/metric/__init__.py @@ -16,7 +16,8 @@ from paddle import nn import copy from collections import OrderedDict -from .metrics import TopkAcc, mAP, mINP, Recallk, RetriMetric +from .metrics import TopkAcc, mAP, mINP, Recallk +from .metrics import DistillationTopkAcc class CombinedMetrics(nn.Layer): def __init__(self, config_list): @@ -24,29 +25,22 @@ class CombinedMetrics(nn.Layer): self.metric_func_list = [] assert isinstance(config_list, list), ( 'operator config should be a list') - - self.retri_config = dict() # retrieval metrics config for config in config_list: assert isinstance(config, dict) and len(config) == 1, "yaml format error" metric_name = list(config)[0] - if metric_name in ["Recallk", "mAP", "mINP"]: - self.retri_config[metric_name] = config[metric_name] - continue metric_params = config[metric_name] - self.metric_func_list.append(eval(metric_name)(**metric_params)) - - if self.retri_config: - self.metric_func_list.append(RetriMetric(self.retri_config)) + if metric_params is not None: + self.metric_func_list.append(eval(metric_name)(**metric_params)) + else: + self.metric_func_list.append(eval(metric_name)()) def __call__(self, *args, **kwargs): metric_dict = OrderedDict() for idx, metric_func in enumerate(self.metric_func_list): metric_dict.update(metric_func(*args, **kwargs)) - return metric_dict - def build_metrics(config): metrics_list = CombinedMetrics(copy.deepcopy(config)) return metrics_list diff --git a/ppcls/metric/metrics.py b/ppcls/metric/metrics.py index d2e66bc54dc298ec329b45305684eb33f39da11b..3bf4532732797da4943a3d5703acbae8db01d9c0 100644 --- a/ppcls/metric/metrics.py +++ b/ppcls/metric/metrics.py @@ -15,10 +15,7 @@ import numpy as np import paddle import paddle.nn as nn -from functools import lru_cache - -# TODO: fix the format class TopkAcc(nn.Layer): def __init__(self, topk=(1, 5)): super().__init__() @@ -37,118 +34,89 @@ class TopkAcc(nn.Layer): x, label, k=k) return metric_dict - class mAP(nn.Layer): def __init__(self): super().__init__() def forward(self, similarities_matrix, query_img_id, gallery_img_id): metric_dict = dict() - _, all_AP, _ = get_metrics(similarities_matrix, query_img_id, - gallery_img_id) - - mAP = np.mean(all_AP) - metric_dict["mAP"] = mAP + + choosen_indices = paddle.argsort(similarities_matrix, axis=1, descending=True) + gallery_labels_transpose = paddle.transpose(gallery_img_id, [1,0]) + gallery_labels_transpose = paddle.broadcast_to(gallery_labels_transpose, shape=[choosen_indices.shape[0], gallery_labels_transpose.shape[1]]) + choosen_label = paddle.index_sample(gallery_labels_transpose, choosen_indices) + equal_flag = paddle.equal(choosen_label, query_img_id) + equal_flag = paddle.cast(equal_flag, 'float32') + + acc_sum = paddle.cumsum(equal_flag, axis=1) + div = paddle.arange(acc_sum.shape[1]).astype("float32") + 1 + precision = paddle.divide(acc_sum, div) + + #calc map + precision_mask = paddle.multiply(equal_flag, precision) + ap = paddle.sum(precision_mask, axis=1) / paddle.sum(equal_flag, axis=1) + metric_dict["mAP"] = paddle.mean(ap).numpy()[0] return metric_dict - class mINP(nn.Layer): def __init__(self): super().__init__() def forward(self, similarities_matrix, query_img_id, gallery_img_id): metric_dict = dict() - _, _, all_INP = get_metrics(similarities_matrix, query_img_id, - gallery_img_id) - - mINP = np.mean(all_INP) - metric_dict["mINP"] = mINP + + choosen_indices = paddle.argsort(similarities_matrix, axis=1, descending=True) + gallery_labels_transpose = paddle.transpose(gallery_img_id, [1,0]) + gallery_labels_transpose = paddle.broadcast_to(gallery_labels_transpose, shape=[choosen_indices.shape[0], gallery_labels_transpose.shape[1]]) + choosen_label = paddle.index_sample(gallery_labels_transpose, choosen_indices) + tmp = paddle.equal(choosen_label, query_img_id) + tmp = paddle.cast(tmp, 'float64') + + #do accumulative sum + div = paddle.arange(tmp.shape[1]).astype("float64") + 2 + minus = paddle.divide(tmp, div) + auxilary = paddle.subtract(tmp, minus) + hard_index = paddle.argmax(auxilary, axis=1).astype("float64") + all_INP = paddle.divide(paddle.sum(tmp, axis=1), hard_index) + mINP = paddle.mean(all_INP) + metric_dict["mINP"] = mINP.numpy()[0] return metric_dict - class Recallk(nn.Layer): def __init__(self, topk=(1, 5)): super().__init__() - assert isinstance(topk, (int, list)) + assert isinstance(topk, (int, list, tuple)) if isinstance(topk, int): topk = [topk] self.topk = topk - self.max_rank = max(self.topk) if max(self.topk) > 50 else 50 def forward(self, similarities_matrix, query_img_id, gallery_img_id): metric_dict = dict() - all_cmc, _, _ = get_metrics(similarities_matrix, query_img_id, - gallery_img_id, self.max_rank) + + #get cmc + choosen_indices = paddle.argsort(similarities_matrix, axis=1, descending=True) + gallery_labels_transpose = paddle.transpose(gallery_img_id, [1,0]) + gallery_labels_transpose = paddle.broadcast_to(gallery_labels_transpose, shape=[choosen_indices.shape[0], gallery_labels_transpose.shape[1]]) + choosen_label = paddle.index_sample(gallery_labels_transpose, choosen_indices) + equal_flag = paddle.equal(choosen_label, query_img_id) + equal_flag = paddle.cast(equal_flag, 'float32') + + acc_sum = paddle.cumsum(equal_flag, axis=1) + mask = paddle.greater_than(acc_sum, paddle.to_tensor(0.)).astype("float32") + all_cmc = paddle.mean(mask, axis=0).numpy() for k in self.topk: metric_dict["recall{}".format(k)] = all_cmc[k - 1] return metric_dict -# retrieval metrics -class RetriMetric(nn.Layer): - def __init__(self, config): - super().__init__() - self.config = config - self.max_rank = 50 #max(self.topk) if max(self.topk) > 50 else 50 +class DistillationTopkAcc(TopkAcc): + def __init__(self, model_key, feature_key=None, topk=(1, 5)): + super().__init__(topk=topk) + self.model_key = model_key + self.feature_key = feature_key - def forward(self, similarities_matrix, query_img_id, gallery_img_id): - metric_dict = dict() - all_cmc, all_AP, all_INP = get_metrics(similarities_matrix, query_img_id, - gallery_img_id, self.max_rank) - if "Recallk" in self.config.keys(): - topk = self.config['Recallk']['topk'] - for k in topk: - metric_dict["recall{}".format(k)] = all_cmc[k - 1] - if "mAP" in self.config.keys(): - mAP = np.mean(all_AP) - metric_dict["mAP"] = mAP - if "mINP" in self.config.keys(): - mINP = np.mean(all_INP) - metric_dict["mINP"] = mINP - return metric_dict - - -@lru_cache() -def get_metrics(similarities_matrix, query_img_id, gallery_img_id, - max_rank=50): - num_q, num_g = similarities_matrix.shape - q_pids = query_img_id.numpy().reshape((query_img_id.shape[0])) - g_pids = gallery_img_id.numpy().reshape((gallery_img_id.shape[0])) - if num_g < max_rank: - max_rank = num_g - print('Note: number of gallery samples is quite small, got {}'.format( - num_g)) - indices = paddle.argsort( - similarities_matrix, axis=1, descending=True).numpy() - - all_cmc = [] - all_AP = [] - all_INP = [] - num_valid_q = 0 - matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32) - for q_idx in range(num_q): - raw_cmc = matches[q_idx] - if not np.any(raw_cmc): - continue - cmc = raw_cmc.cumsum() - pos_idx = np.where(raw_cmc == 1) - max_pos_idx = np.max(pos_idx) - inp = cmc[max_pos_idx] / (max_pos_idx + 1.0) - all_INP.append(inp) - cmc[cmc > 1] = 1 - - all_cmc.append(cmc[:max_rank]) - num_valid_q += 1. - - num_rel = raw_cmc.sum() - tmp_cmc = raw_cmc.cumsum() - tmp_cmc = [x / (i + 1.) for i, x in enumerate(tmp_cmc)] - tmp_cmc = np.asarray(tmp_cmc) * raw_cmc - AP = tmp_cmc.sum() / num_rel - all_AP.append(AP) - assert num_valid_q > 0, 'Error: all query identities do not appear in gallery' - - all_cmc = np.asarray(all_cmc).astype(np.float32) - all_cmc = all_cmc.sum(0) / num_valid_q - - return all_cmc, all_AP, all_INP + def forward(self, x, label): + x = x[self.model_key] + if self.feature_key is not None: + x = x[self.feature_key] + return super().forward(x, label) diff --git a/ppcls/optimizer/__init__.py b/ppcls/optimizer/__init__.py index 692d00e369d356b36d9d3493b79f2297d57d057d..9b71bdddd5ec6cc6ed0abbb8e338691a5efc3c49 100644 --- a/ppcls/optimizer/__init__.py +++ b/ppcls/optimizer/__init__.py @@ -45,7 +45,7 @@ def build_optimizer(config, epochs, step_each_epoch, parameters): config = copy.deepcopy(config) # step1 build lr lr = build_lr_scheduler(config.pop('lr'), epochs, step_each_epoch) - logger.info("build lr ({}) success..".format(lr)) + logger.debug("build lr ({}) success..".format(lr)) # step2 build regularization if 'regularizer' in config and config['regularizer'] is not None: reg_config = config.pop('regularizer') @@ -53,7 +53,7 @@ def build_optimizer(config, epochs, step_each_epoch, parameters): reg = getattr(paddle.regularizer, reg_name)(**reg_config) else: reg = None - logger.info("build regularizer ({}) success..".format(reg)) + logger.debug("build regularizer ({}) success..".format(reg)) # step3 build optimizer optim_name = config.pop('name') if 'clip_norm' in config: @@ -65,5 +65,5 @@ def build_optimizer(config, epochs, step_each_epoch, parameters): weight_decay=reg, grad_clip=grad_clip, **config)(parameters=parameters) - logger.info("build optimizer ({}) success..".format(optim)) + logger.debug("build optimizer ({}) success..".format(optim)) return optim, lr diff --git a/ppcls/utils/config.py b/ppcls/utils/config.py index e2bec6fb078273b5d202dc5d80b1aa10bd6c46ad..b92f0d9456c8e7ced5704c0bfe931a080e5eb5cf 100644 --- a/ppcls/utils/config.py +++ b/ppcls/utils/config.py @@ -67,18 +67,14 @@ def print_dict(d, delimiter=0): placeholder = "-" * 60 for k, v in sorted(d.items()): if isinstance(v, dict): - logger.info("{}{} : ".format(delimiter * " ", - logger.coloring(k, "HEADER"))) + logger.info("{}{} : ".format(delimiter * " ", k)) print_dict(v, delimiter + 4) elif isinstance(v, list) and len(v) >= 1 and isinstance(v[0], dict): - logger.info("{}{} : ".format(delimiter * " ", - logger.coloring(str(k), "HEADER"))) + logger.info("{}{} : ".format(delimiter * " ", k)) for value in v: print_dict(value, delimiter + 4) else: - logger.info("{}{} : {}".format(delimiter * " ", - logger.coloring(k, "HEADER"), - logger.coloring(v, "OKGREEN"))) + logger.info("{}{} : {}".format(delimiter * " ", k, v)) if k.isupper(): logger.info(placeholder) @@ -141,7 +137,7 @@ def override(dl, ks, v): if len(ks) == 1: # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl)) if not ks[0] in dl: - logger.warning('A new filed ({}) detected!'.format(ks[0], dl)) + print('A new filed ({}) detected!'.format(ks[0], dl)) dl[ks[0]] = str2num(v) else: override(dl[ks[0]], ks[1:], v) @@ -175,7 +171,7 @@ def override_config(config, options=None): return config -def get_config(fname, overrides=None, show=True): +def get_config(fname, overrides=None, show=False): """ Read config from file """ diff --git a/ppcls/utils/download.py b/ppcls/utils/download.py new file mode 100644 index 0000000000000000000000000000000000000000..9c4575048d3f579d93fcd315ac5193078e5f131f --- /dev/null +++ b/ppcls/utils/download.py @@ -0,0 +1,319 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import sys +import os.path as osp +import shutil +import requests +import hashlib +import tarfile +import zipfile +import time +from collections import OrderedDict +from tqdm import tqdm + +from ppcls.utils import logger + +__all__ = ['get_weights_path_from_url'] + +WEIGHTS_HOME = osp.expanduser("~/.paddleclas/weights") + +DOWNLOAD_RETRY_LIMIT = 3 + + +def is_url(path): + """ + Whether path is URL. + Args: + path (string): URL string or not. + """ + return path.startswith('http://') or path.startswith('https://') + + +def get_weights_path_from_url(url, md5sum=None): + """Get weights path from WEIGHT_HOME, if not exists, + download it from url. + + Args: + url (str): download url + md5sum (str): md5 sum of download package + + Returns: + str: a local path to save downloaded weights. + + Examples: + .. code-block:: python + + from paddle.utils.download import get_weights_path_from_url + + resnet18_pretrained_weight_url = 'https://paddle-hapi.bj.bcebos.com/models/resnet18.pdparams' + local_weight_path = get_weights_path_from_url(resnet18_pretrained_weight_url) + + """ + path = get_path_from_url(url, WEIGHTS_HOME, md5sum) + return path + + +def _map_path(url, root_dir): + # parse path after download under root_dir + fname = osp.split(url)[-1] + fpath = fname + return osp.join(root_dir, fpath) + + +def _get_unique_endpoints(trainer_endpoints): + # Sorting is to avoid different environmental variables for each card + trainer_endpoints.sort() + ips = set() + unique_endpoints = set() + for endpoint in trainer_endpoints: + ip = endpoint.split(":")[0] + if ip in ips: + continue + ips.add(ip) + unique_endpoints.add(endpoint) + logger.info("unique_endpoints {}".format(unique_endpoints)) + return unique_endpoints + + +def get_path_from_url(url, + root_dir, + md5sum=None, + check_exist=True, + decompress=True): + """ Download from given url to root_dir. + if file or directory specified by url is exists under + root_dir, return the path directly, otherwise download + from url and decompress it, return the path. + + Args: + url (str): download url + root_dir (str): root dir for downloading, it should be + WEIGHTS_HOME or DATASET_HOME + md5sum (str): md5 sum of download package + + Returns: + str: a local path to save downloaded models & weights & datasets. + """ + + from paddle.fluid.dygraph.parallel import ParallelEnv + + assert is_url(url), "downloading from {} not a url".format(url) + # parse path after download to decompress under root_dir + fullpath = _map_path(url, root_dir) + # Mainly used to solve the problem of downloading data from different + # machines in the case of multiple machines. Different ips will download + # data, and the same ip will only download data once. + unique_endpoints = _get_unique_endpoints(ParallelEnv() + .trainer_endpoints[:]) + if osp.exists(fullpath) and check_exist and _md5check(fullpath, md5sum): + logger.info("Found {}".format(fullpath)) + else: + if ParallelEnv().current_endpoint in unique_endpoints: + fullpath = _download(url, root_dir, md5sum) + else: + while not os.path.exists(fullpath): + time.sleep(1) + + if ParallelEnv().current_endpoint in unique_endpoints: + if decompress and (tarfile.is_tarfile(fullpath) or + zipfile.is_zipfile(fullpath)): + fullpath = _decompress(fullpath) + + return fullpath + + +def _download(url, path, md5sum=None): + """ + Download from url, save to path. + + url (str): download url + path (str): download to given path + """ + if not osp.exists(path): + os.makedirs(path) + + fname = osp.split(url)[-1] + fullname = osp.join(path, fname) + retry_cnt = 0 + + while not (osp.exists(fullname) and _md5check(fullname, md5sum)): + if retry_cnt < DOWNLOAD_RETRY_LIMIT: + retry_cnt += 1 + else: + raise RuntimeError("Download from {} failed. " + "Retry limit reached".format(url)) + + logger.info("Downloading {} from {}".format(fname, url)) + + try: + req = requests.get(url, stream=True) + except Exception as e: # requests.exceptions.ConnectionError + logger.info( + "Downloading {} from {} failed {} times with exception {}". + format(fname, url, retry_cnt + 1, str(e))) + time.sleep(1) + continue + + if req.status_code != 200: + raise RuntimeError("Downloading from {} failed with code " + "{}!".format(url, req.status_code)) + + # For protecting download interupted, download to + # tmp_fullname firstly, move tmp_fullname to fullname + # after download finished + tmp_fullname = fullname + "_tmp" + total_size = req.headers.get('content-length') + with open(tmp_fullname, 'wb') as f: + if total_size: + with tqdm(total=(int(total_size) + 1023) // 1024) as pbar: + for chunk in req.iter_content(chunk_size=1024): + f.write(chunk) + pbar.update(1) + else: + for chunk in req.iter_content(chunk_size=1024): + if chunk: + f.write(chunk) + shutil.move(tmp_fullname, fullname) + + return fullname + + +def _md5check(fullname, md5sum=None): + if md5sum is None: + return True + + logger.info("File {} md5 checking...".format(fullname)) + md5 = hashlib.md5() + with open(fullname, 'rb') as f: + for chunk in iter(lambda: f.read(4096), b""): + md5.update(chunk) + calc_md5sum = md5.hexdigest() + + if calc_md5sum != md5sum: + logger.info("File {} md5 check failed, {}(calc) != " + "{}(base)".format(fullname, calc_md5sum, md5sum)) + return False + return True + + +def _decompress(fname): + """ + Decompress for zip and tar file + """ + logger.info("Decompressing {}...".format(fname)) + + # For protecting decompressing interupted, + # decompress to fpath_tmp directory firstly, if decompress + # successed, move decompress files to fpath and delete + # fpath_tmp and remove download compress file. + + if tarfile.is_tarfile(fname): + uncompressed_path = _uncompress_file_tar(fname) + elif zipfile.is_zipfile(fname): + uncompressed_path = _uncompress_file_zip(fname) + else: + raise TypeError("Unsupport compress file type {}".format(fname)) + + return uncompressed_path + + +def _uncompress_file_zip(filepath): + files = zipfile.ZipFile(filepath, 'r') + file_list = files.namelist() + + file_dir = os.path.dirname(filepath) + + if _is_a_single_file(file_list): + rootpath = file_list[0] + uncompressed_path = os.path.join(file_dir, rootpath) + + for item in file_list: + files.extract(item, file_dir) + + elif _is_a_single_dir(file_list): + rootpath = os.path.splitext(file_list[0])[0].split(os.sep)[-1] + uncompressed_path = os.path.join(file_dir, rootpath) + + for item in file_list: + files.extract(item, file_dir) + + else: + rootpath = os.path.splitext(filepath)[0].split(os.sep)[-1] + uncompressed_path = os.path.join(file_dir, rootpath) + if not os.path.exists(uncompressed_path): + os.makedirs(uncompressed_path) + for item in file_list: + files.extract(item, os.path.join(file_dir, rootpath)) + + files.close() + + return uncompressed_path + + +def _uncompress_file_tar(filepath, mode="r:*"): + files = tarfile.open(filepath, mode) + file_list = files.getnames() + + file_dir = os.path.dirname(filepath) + + if _is_a_single_file(file_list): + rootpath = file_list[0] + uncompressed_path = os.path.join(file_dir, rootpath) + for item in file_list: + files.extract(item, file_dir) + elif _is_a_single_dir(file_list): + rootpath = os.path.splitext(file_list[0])[0].split(os.sep)[-1] + uncompressed_path = os.path.join(file_dir, rootpath) + for item in file_list: + files.extract(item, file_dir) + else: + rootpath = os.path.splitext(filepath)[0].split(os.sep)[-1] + uncompressed_path = os.path.join(file_dir, rootpath) + if not os.path.exists(uncompressed_path): + os.makedirs(uncompressed_path) + + for item in file_list: + files.extract(item, os.path.join(file_dir, rootpath)) + + files.close() + + return uncompressed_path + + +def _is_a_single_file(file_list): + if len(file_list) == 1 and file_list[0].find(os.sep) < -1: + return True + return False + + +def _is_a_single_dir(file_list): + new_file_list = [] + for file_path in file_list: + if '/' in file_path: + file_path = file_path.replace('/', os.sep) + elif '\\' in file_path: + file_path = file_path.replace('\\', os.sep) + new_file_list.append(file_path) + + file_name = new_file_list[0].split(os.sep)[0] + for i in range(1, len(new_file_list)): + if file_name != new_file_list[i].split(os.sep)[0]: + return False + return True diff --git a/tools/ema.py b/ppcls/utils/ema.py similarity index 100% rename from tools/ema.py rename to ppcls/utils/ema.py diff --git a/tools/feature_maps_visualization/download_resnet50_pretrained.sh b/ppcls/utils/feature_maps_visualization/download_resnet50_pretrained.sh similarity index 100% rename from tools/feature_maps_visualization/download_resnet50_pretrained.sh rename to ppcls/utils/feature_maps_visualization/download_resnet50_pretrained.sh diff --git a/tools/feature_maps_visualization/fm_vis.py b/ppcls/utils/feature_maps_visualization/fm_vis.py similarity index 100% rename from tools/feature_maps_visualization/fm_vis.py rename to ppcls/utils/feature_maps_visualization/fm_vis.py diff --git a/tools/feature_maps_visualization/resnet.py b/ppcls/utils/feature_maps_visualization/resnet.py similarity index 100% rename from tools/feature_maps_visualization/resnet.py rename to ppcls/utils/feature_maps_visualization/resnet.py diff --git a/tools/feature_maps_visualization/utils.py b/ppcls/utils/feature_maps_visualization/utils.py similarity index 100% rename from tools/feature_maps_visualization/utils.py rename to ppcls/utils/feature_maps_visualization/utils.py diff --git a/ppcls/utils/logger.py b/ppcls/utils/logger.py index ece85262446d899a425ac62a0bb1d7a8ff754a50..9a2b4a1cfaa17782e1fdb615c7a5836612c064f0 100644 --- a/ppcls/utils/logger.py +++ b/ppcls/utils/logger.py @@ -12,70 +12,86 @@ # See the License for the specific language governing permissions and # limitations under the License. -import logging import os -import datetime - -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s %(levelname)s: %(message)s", - datefmt="%Y-%m-%d %H:%M:%S") - - -def time_zone(sec, fmt): - real_time = datetime.datetime.now() - return real_time.timetuple() - - -logging.Formatter.converter = time_zone -_logger = logging.getLogger(__name__) +import sys -Color = { - 'RED': '\033[31m', - 'HEADER': '\033[35m', # deep purple - 'PURPLE': '\033[95m', # purple - 'OKBLUE': '\033[94m', - 'OKGREEN': '\033[92m', - 'WARNING': '\033[93m', - 'FAIL': '\033[91m', - 'ENDC': '\033[0m' -} - - -def coloring(message, color="OKGREEN"): - assert color in Color.keys() - if os.environ.get('PADDLECLAS_COLORING', False): - return Color[color] + str(message) + Color["ENDC"] +import logging +import datetime +import paddle.distributed as dist + +_logger = None + + +def init_logger(name='root', log_file=None, log_level=logging.INFO): + """Initialize and get a logger by name. + If the logger has not been initialized, this method will initialize the + logger by adding one or two handlers, otherwise the initialized logger will + be directly returned. During initialization, a StreamHandler will always be + added. If `log_file` is specified a FileHandler will also be added. + Args: + name (str): Logger name. + log_file (str | None): The log filename. If specified, a FileHandler + will be added to the logger. + log_level (int): The logger level. Note that only the process of + rank 0 is affected, and other processes will set the level to + "Error" thus be silent most of the time. + Returns: + logging.Logger: The expected logger. + """ + global _logger + assert _logger is None, "logger should not be initialized twice or more." + _logger = logging.getLogger(name) + + formatter = logging.Formatter( + '[%(asctime)s] %(name)s %(levelname)s: %(message)s', + datefmt="%Y/%m/%d %H:%M:%S") + + stream_handler = logging.StreamHandler(stream=sys.stdout) + stream_handler.setFormatter(formatter) + _logger.addHandler(stream_handler) + if log_file is not None and dist.get_rank() == 0: + log_file_folder = os.path.split(log_file)[0] + os.makedirs(log_file_folder, exist_ok=True) + file_handler = logging.FileHandler(log_file, 'a') + file_handler.setFormatter(formatter) + _logger.addHandler(file_handler) + if dist.get_rank() == 0: + _logger.setLevel(log_level) else: - return message + _logger.setLevel(logging.ERROR) -def anti_fleet(log): +def log_at_trainer0(log): """ logs will print multi-times when calling Fleet API. Only display single log and ignore the others. """ def wrapper(fmt, *args): - if int(os.getenv("PADDLE_TRAINER_ID", 0)) == 0: + if dist.get_rank() == 0: log(fmt, *args) return wrapper -@anti_fleet +@log_at_trainer0 def info(fmt, *args): _logger.info(fmt, *args) -@anti_fleet +@log_at_trainer0 +def debug(fmt, *args): + _logger.debug(fmt, *args) + + +@log_at_trainer0 def warning(fmt, *args): - _logger.warning(coloring(fmt, "RED"), *args) + _logger.warning(fmt, *args) -@anti_fleet +@log_at_trainer0 def error(fmt, *args): - _logger.error(coloring(fmt, "FAIL"), *args) + _logger.error(fmt, *args) def scaler(name, value, step, writer): @@ -108,13 +124,12 @@ def advertise(): website = "https://github.com/PaddlePaddle/PaddleClas" AD_LEN = 6 + len(max([copyright, ad, website], key=len)) - info( - coloring("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format( - "=" * (AD_LEN + 4), - "=={}==".format(copyright.center(AD_LEN)), - "=" * (AD_LEN + 4), - "=={}==".format(' ' * AD_LEN), - "=={}==".format(ad.center(AD_LEN)), - "=={}==".format(' ' * AD_LEN), - "=={}==".format(website.center(AD_LEN)), - "=" * (AD_LEN + 4), ), "RED")) + info("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format( + "=" * (AD_LEN + 4), + "=={}==".format(copyright.center(AD_LEN)), + "=" * (AD_LEN + 4), + "=={}==".format(' ' * AD_LEN), + "=={}==".format(ad.center(AD_LEN)), + "=={}==".format(' ' * AD_LEN), + "=={}==".format(website.center(AD_LEN)), + "=" * (AD_LEN + 4), )) diff --git a/ppcls/utils/save_load.py b/ppcls/utils/save_load.py index d8d80639299b3c676dbc6d657950d4ad695ca708..cca79f4cb24b69a8687b9b3ee92774285bd574be 100644 --- a/ppcls/utils/save_load.py +++ b/ppcls/utils/save_load.py @@ -23,10 +23,8 @@ import shutil import tempfile import paddle -from paddle.static import load_program_state -from paddle.utils.download import get_weights_path_from_url - from ppcls.utils import logger +from .download import get_weights_path_from_url __all__ = ['init_model', 'save_model', 'load_dygraph_pretrain'] @@ -47,70 +45,42 @@ def _mkdir_if_not_exist(path): raise OSError('Failed to mkdir {}'.format(path)) -def load_dygraph_pretrain(model, path=None, load_static_weights=False): +def load_dygraph_pretrain(model, path=None): if not (os.path.isdir(path) or os.path.exists(path + '.pdparams')): raise ValueError("Model pretrain path {} does not " "exists.".format(path)) - if load_static_weights: - pre_state_dict = load_program_state(path) - param_state_dict = {} - model_dict = model.state_dict() - for key in model_dict.keys(): - weight_name = model_dict[key].name - if weight_name in pre_state_dict.keys(): - logger.info('Load weight: {}, shape: {}'.format( - weight_name, pre_state_dict[weight_name].shape)) - param_state_dict[key] = pre_state_dict[weight_name] - else: - param_state_dict[key] = model_dict[key] - model.set_dict(param_state_dict) - return - param_state_dict = paddle.load(path + ".pdparams") model.set_dict(param_state_dict) return -def load_dygraph_pretrain_from_url(model, - pretrained_url, - use_ssld, - load_static_weights=False): +def load_dygraph_pretrain_from_url(model, pretrained_url, use_ssld): if use_ssld: pretrained_url = pretrained_url.replace("_pretrained", "_ssld_pretrained") local_weight_path = get_weights_path_from_url(pretrained_url).replace( ".pdparams", "") - load_dygraph_pretrain( - model, path=local_weight_path, load_static_weights=load_static_weights) + load_dygraph_pretrain(model, path=local_weight_path) return -def load_distillation_model(model, pretrained_model, load_static_weights): +def load_distillation_model(model, pretrained_model): logger.info("In distillation mode, teacher model will be " "loaded firstly before student model.") if not isinstance(pretrained_model, list): pretrained_model = [pretrained_model] - if not isinstance(load_static_weights, list): - load_static_weights = [load_static_weights] * len(pretrained_model) - teacher = model.teacher if hasattr(model, "teacher") else model._layers.teacher student = model.student if hasattr(model, "student") else model._layers.student - load_dygraph_pretrain( - teacher, - path=pretrained_model[0], - load_static_weights=load_static_weights[0]) + load_dygraph_pretrain(teacher, path=pretrained_model[0]) logger.info("Finish initing teacher model from {}".format( pretrained_model)) # load student model if len(pretrained_model) >= 2: - load_dygraph_pretrain( - student, - path=pretrained_model[1], - load_static_weights=load_static_weights[1]) + load_dygraph_pretrain(student, path=pretrained_model[1]) logger.info("Finish initing student model from {}".format( pretrained_model)) @@ -134,34 +104,17 @@ def init_model(config, net, optimizer=None): return metric_dict pretrained_model = config.get('pretrained_model') - load_static_weights = config.get('load_static_weights', False) use_distillation = config.get('use_distillation', False) if pretrained_model: if use_distillation: - load_distillation_model(net, pretrained_model, load_static_weights) + load_distillation_model(net, pretrained_model) else: # common load - load_dygraph_pretrain( - net, - path=pretrained_model, - load_static_weights=load_static_weights) + load_dygraph_pretrain(net, path=pretrained_model) logger.info( logger.coloring("Finish load pretrained model from {}".format( pretrained_model), "HEADER")) -def _save_student_model(net, model_prefix): - """ - save student model if the net is the network contains student - """ - student_model_prefix = model_prefix + "_student.pdparams" - if hasattr(net, "_layers"): - net = net._layers - if hasattr(net, "student"): - paddle.save(net.student.state_dict(), student_model_prefix) - logger.info("Already save student model in {}".format( - student_model_prefix)) - - def save_model(net, optimizer, metric_info, @@ -175,11 +128,9 @@ def save_model(net, return model_path = os.path.join(model_path, model_name) _mkdir_if_not_exist(model_path) - model_prefix = os.path.join(model_path, prefix) - - _save_student_model(net, model_prefix) + model_path = os.path.join(model_path, prefix) - paddle.save(net.state_dict(), model_prefix + ".pdparams") - paddle.save(optimizer.state_dict(), model_prefix + ".pdopt") - paddle.save(metric_info, model_prefix + ".pdstates") + paddle.save(net.state_dict(), model_path + ".pdparams") + paddle.save(optimizer.state_dict(), model_path + ".pdopt") + paddle.save(metric_info, model_path + ".pdstates") logger.info("Already save model in {}".format(model_path)) diff --git a/tools/static/dali.py b/ppcls/utils/static/dali.py similarity index 100% rename from tools/static/dali.py rename to ppcls/utils/static/dali.py diff --git a/tools/static/program.py b/ppcls/utils/static/program.py similarity index 100% rename from tools/static/program.py rename to ppcls/utils/static/program.py diff --git a/tools/static/run_dali.sh b/ppcls/utils/static/run_dali.sh similarity index 100% rename from tools/static/run_dali.sh rename to ppcls/utils/static/run_dali.sh diff --git a/tools/static/save_load.py b/ppcls/utils/static/save_load.py similarity index 100% rename from tools/static/save_load.py rename to ppcls/utils/static/save_load.py diff --git a/tools/static/train.py b/ppcls/utils/static/train.py similarity index 100% rename from tools/static/train.py rename to ppcls/utils/static/train.py diff --git a/tools/benchmark/benchmark.sh b/tools/benchmark/benchmark.sh deleted file mode 100644 index fc50a6eda148656c7dd0a46ac12d014f79873a4e..0000000000000000000000000000000000000000 --- a/tools/benchmark/benchmark.sh +++ /dev/null @@ -1,3 +0,0 @@ -python3.7 -m paddle.distributed.launch \ - --selected_gpus="0" \ - tools/benchmark/benchmark_acc.py diff --git a/tools/benchmark/benchmark_acc.py b/tools/benchmark/benchmark_acc.py deleted file mode 100644 index aa471713a11a91b9e512ea6593d250e16ef2dcad..0000000000000000000000000000000000000000 --- a/tools/benchmark/benchmark_acc.py +++ /dev/null @@ -1,123 +0,0 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import argparse -import os -import sys -__dir__ = os.path.dirname(os.path.abspath(__file__)) -sys.path.append(__dir__) -sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) -sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) - -import paddle - -from multiprocessing import Manager -import tools.eval as eval -from ppcls.utils.model_zoo import _download, _decompress -from ppcls.utils import logger - - -def parse_args(): - def str2bool(v): - return v.lower() in ("true", "t", "1") - - parser = argparse.ArgumentParser() - parser.add_argument( - "-b", - "--benchmark_file_list", - type=str, - default="./tools/benchmark/benchmark_list.txt") - parser.add_argument( - "-p", "--pretrained_dir", type=str, default="./pretrained/") - - return parser.parse_args() - - -def parse_model_infos(benchmark_file_list): - model_infos = [] - with open(benchmark_file_list, "r") as fin: - lines = fin.readlines() - for idx, line in enumerate(lines): - strs = line.strip("\n").strip("\r").split(" ") - if len(strs) != 4: - logger.info( - "line {0}(info: {1}) format wrong, it should be splited into 4 parts, but got {2}". - format(idx, line, len(strs))) - model_infos.append({ - "top1_acc": float(strs[0]), - "model_name": strs[1], - "config_path": strs[2], - "pretrain_path": strs[3], - }) - return model_infos - - -def main(args): - benchmark_file_list = args.benchmark_file_list - model_infos = parse_model_infos(benchmark_file_list) - right_models = [] - wrong_models = [] - - for model_info in model_infos: - try: - pretrained_url = model_info["pretrain_path"] - fname = _download(pretrained_url, args.pretrained_dir) - pretrained_path = os.path.splitext(fname)[0] - if pretrained_url.endswith("tar"): - path = _decompress(fname) - pretrained_path = os.path.join( - os.path.dirname(pretrained_path), path) - - args.config = model_info["config_path"] - args.override = [ - "pretrained_model={}".format(pretrained_path), - "VALID.batch_size=256", - "VALID.num_workers=16", - "load_static_weights=True", - "print_interval=100", - ] - - manager = Manager() - return_dict = manager.dict() - - # A hack method to avoid name conflict. - # Multi-process maybe a better method here. - # More details can be seen in branch 2.0-beta. - # TODO: fluid needs to be removed in the future. - with paddle.utils.unique_name.guard(): - eval.main(args, return_dict) - - top1_acc = return_dict.get("top1_acc", 0.0) - except Exception as e: - logger.error(e) - top1_acc = 0.0 - diff = abs(top1_acc - model_info["top1_acc"]) - if diff > 0.001: - err_info = "[{}]Top-1 acc diff should be <= 0.001 but got diff {}, gt acc: {}, eval acc: {}".format( - model_info["model_name"], diff, model_info["top1_acc"], - top1_acc) - logger.warning(err_info) - wrong_models.append(model_info["model_name"]) - else: - right_models.append(model_info["model_name"]) - - logger.info("[number of right models: {}, they are: {}".format( - len(right_models), right_models)) - logger.info("[number of wrong models: {}, they are: {}".format( - len(wrong_models), wrong_models)) - - -if __name__ == '__main__': - args = parse_args() - main(args) diff --git a/tools/benchmark/benchmark_list.txt b/tools/benchmark/benchmark_list.txt deleted file mode 100644 index 66ad980c75f86c3c2017b263466e69da52f1c328..0000000000000000000000000000000000000000 --- a/tools/benchmark/benchmark_list.txt +++ /dev/null @@ -1,29 +0,0 @@ -0.7098 ResNet18 configs/ResNet/ResNet18.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar -0.7650 ResNet50 configs/ResNet/ResNet50.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.tar -0.7226 ResNet18_vd configs/ResNet/ResNet18_vd.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_vd_pretrained.tar -0.7912 ResNet50_vd configs/ResNet/ResNet50_vd.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_pretrained.tar -0.7099 MobileNetV1 configs/MobileNetV1/MobileNetV1.yaml https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.tar -0.7215 MobileNetV2 configs/MobileNetV2/MobileNetV2.yaml https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.tar -0.7532 MobileNetV3_large_x1_0 configs/MobileNetV3/MobileNetV3_large_x1_0.yaml https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_pretrained.tar -0.6880 ShuffleNetV2 configs/ShuffleNet/ShuffleNetV2.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNetV2_pretrained.tar -0.7933 Res2Net50_26w_4s configs/Res2Net/Res2Net50_26w_4s.yaml https://paddle-imagenet-models-name.bj.bcebos.com/Res2Net50_26w_4s_pretrained.tar -0.7775 ResNeXt50_32x4d configs/ResNeXt/ResNeXt50_32x4d.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_32x4d_pretrained.tar -0.7333 SE_ResNet18_vd configs/SENet/SE_ResNet18_vd.yaml https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet18_vd_pretrained.tar -0.7952 SE_ResNet50_vd configs/SENet/SE_ResNet50_vd.yaml https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNet50_vd_pretrained.tar -0.8024 SE_ResNeXt50_vd_32x4d configs/SENet/SE_ResNeXt50_vd_32x4d.yaml https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt50_vd_32x4d_pretrained.tar -0.7566 DenseNet121 configs/DenseNet/DenseNet121.yaml https://paddle-imagenet-models-name.bj.bcebos.com/DenseNet121_pretrained.tar -0.7678 DPN68 configs/DPN/DPN68.yaml https://paddle-imagenet-models-name.bj.bcebos.com/DPN68_pretrained.tar -0.7692 HRNet_W18_C configs/HRNet/HRNet_W18_C.yaml https://paddle-imagenet-models-name.bj.bcebos.com/HRNet_W18_C_pretrained.tar -0.7070 GoogLeNet configs/Inception/GoogLeNet.yaml https://paddle-imagenet-models-name.bj.bcebos.com/GoogLeNet_pretrained.tar -0.7930 Xception41 configs/Xception/Xception41.yaml https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_pretrained.tar -0.7955 Xception41_deeplab configs/Xception/Xception41_deeplab.yaml https://paddle-imagenet-models-name.bj.bcebos.com/Xception41_deeplab_pretrained.tar -0.8077 InceptionV4 configs/Inception/InceptionV4.yaml https://paddle-imagenet-models-name.bj.bcebos.com/InceptionV4_pretrained.tar -0.8255 ResNeXt101_32x8d_wsl configs/ResNeXt101_wsl/ResNeXt101_32x8d_wsl.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x8d_wsl_pretrained.tar -0.8035 ResNeSt50_fast_1s1x64d configs/ResNeSt/ResNeSt50_fast_1s1x64d.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_fast_1s1x64d_pretrained.pdparams -0.8083 ResNeSt50 configs/ResNeSt/ResNeSt50.yaml https://paddle-imagenet-models-name.bj.bcebos.com/ResNeSt50_pretrained.pdparams -0.785 RegNetX_4GF configs/RegNet/RegNetX_4GF.yaml https://paddle-imagenet-models-name.bj.bcebos.com/RegNetX_4GF_pretrained.pdparams -0.7402 GhostNet_x1_0 configs/GhostNet/GhostNet_x1_0.yaml https://paddle-imagenet-models-name.bj.bcebos.com/GhostNet_x1_0_pretrained.pdparams -0.567 AlexNet configs/AlexNet/AlexNet.yaml https://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.tar -0.596 SqueezeNet1_0 configs/SqueezeNet/SqueezeNet1_0.yaml https://paddle-imagenet-models-name.bj.bcebos.com/SqueezeNet1_0_pretrained.tar -0.693 VGG11 configs/VGG/VGG11.yaml https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.tar -0.780 DarkNet53 configs/DarkNet/DarkNet53.yaml https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_ImageNet1k_pretrained.tar diff --git a/tools/benchmark/run_multi_nodes.sh b/tools/benchmark/run_multi_nodes.sh deleted file mode 100755 index 4a111999843a3ac64e11ba88cef20e768ad8653a..0000000000000000000000000000000000000000 --- a/tools/benchmark/run_multi_nodes.sh +++ /dev/null @@ -1,14 +0,0 @@ -#!/usr/bin/env bash - -# IP Addresses of all nodes, modify it corresponding to your own environment -ALL_NODE_IPS="10.10.10.1,10.10.10.2" -# IP Address of the current node, modify it corresponding to your own environment -CUR_NODE_IPS="10.10.10.1" - -python -m paddle.distributed.launch \ - --cluster_node_ips=$ALL_NODE_IPS \ - --node_ip=$CUR_NODE_IPS \ - --gpus="0,1,2,3" \ - tools/train.py \ - -c ./configs/ResNet/ResNet50.yaml \ - -o print_interval=10 diff --git a/tools/benchmark/run_single_node.sh b/tools/benchmark/run_single_node.sh deleted file mode 100755 index 5ec44455700d821951da9ff5e51d50fd8e621b1d..0000000000000000000000000000000000000000 --- a/tools/benchmark/run_single_node.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/usr/bin/env bash - -python -m paddle.distributed.launch \ - --gpus="0,1,2,3" \ - tools/train.py \ - -c ./configs/ResNet/ResNet50.yaml \ - -o print_interval=10 diff --git a/tools/download.py b/tools/download.py deleted file mode 100644 index 7053634c54e875529fa7a1a01d325641405fa764..0000000000000000000000000000000000000000 --- a/tools/download.py +++ /dev/null @@ -1,50 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import argparse -import os -import sys -__dir__ = os.path.dirname(os.path.abspath(__file__)) -sys.path.append(__dir__) -sys.path.append(os.path.abspath(os.path.join(__dir__, '..'))) - -from ppcls import model_zoo - - -def parse_args(): - def str2bool(v): - return v.lower() in ("true", "t", "1") - - parser = argparse.ArgumentParser() - parser.add_argument('-a', '--architecture', type=str, default='ResNet50') - parser.add_argument('-p', '--path', type=str, default='./pretrained/') - parser.add_argument('--postfix', type=str, default="pdparams") - parser.add_argument('-d', '--decompress', type=str2bool, default=False) - parser.add_argument('-l', '--list', type=str2bool, default=False) - - args = parser.parse_args() - return args - - -def main(): - args = parse_args() - if args.list: - model_zoo.list_models() - else: - model_zoo.get(args.architecture, args.path, args.decompress, - args.postfix) - - -if __name__ == '__main__': - main() diff --git a/tools/eval.py b/tools/eval.py index 1e4b8c9ea6ed19ee2649e967ba1582316b92a419..b03030c5a4fa605cd28d5d93d303c6f886462260 100644 --- a/tools/eval.py +++ b/tools/eval.py @@ -25,6 +25,7 @@ from ppcls.engine.trainer import Trainer if __name__ == "__main__": args = config.parse_args() - config = config.get_config(args.config, overrides=args.override, show=True) + config = config.get_config( + args.config, overrides=args.override, show=False) trainer = Trainer(config, mode="eval") trainer.eval() diff --git a/tools/eval.sh b/tools/eval.sh index f67ba9c95b6c5642fa70a17c2987ba20d7672051..c13ea6d032408afd858de45ccbec7cd45cd969f8 100644 --- a/tools/eval.sh +++ b/tools/eval.sh @@ -1,6 +1,7 @@ -python3.7 -m paddle.distributed.launch \ - --gpus="0,1,2,3" \ - tools/eval.py \ - -c ./configs/ResNet/ResNet50.yaml \ - -o pretrained_model="./ResNet50_pretrained" \ - -o use_gpu=True +#!/usr/bin/env bash + +# for single card eval +# python3.7 tools/eval.py -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml + +# for multi-cards eval +python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" tools/eval.py -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml diff --git a/tools/export_model.py b/tools/export_model.py index c1fbc9d91ab81cc882776daff2620f984ef60f83..53b3e54802bc3b4fb787f59d9fd9a239ed9f2346 100644 --- a/tools/export_model.py +++ b/tools/export_model.py @@ -24,20 +24,27 @@ import paddle import paddle.nn as nn from ppcls.utils import config -from ppcls.arch import build_model, RecModel +from ppcls.arch import build_model, RecModel, DistillationModel from ppcls.utils.save_load import load_dygraph_pretrain from ppcls.arch.gears.identity_head import IdentityHead class ExportModel(nn.Layer): """ - ClasModel: add softmax onto the model + ExportModel: add softmax onto the model """ def __init__(self, config): super().__init__() self.base_model = build_model(config) - self.infer_output_key = config.get("infer_output_key") + + # we should choose a final model to export + if isinstance(self.base_model, DistillationModel): + self.infer_model_name = config["infer_model_name"] + else: + self.infer_model_name = None + + self.infer_output_key = config.get("infer_output_key", None) if self.infer_output_key == "features" and isinstance(self.base_model, RecModel): self.base_model.head = IdentityHead() @@ -54,6 +61,8 @@ class ExportModel(nn.Layer): def forward(self, x): x = self.base_model(x) + if self.infer_model_name is not None: + x = x[self.infer_model_name] if self.infer_output_key is not None: x = x[self.infer_output_key] if self.softmax is not None: diff --git a/tools/export_serving_model.py b/tools/export_serving_model.py deleted file mode 100644 index 6bf7cbe9b8bf2a0db343bf4e9fbaf59152601494..0000000000000000000000000000000000000000 --- a/tools/export_serving_model.py +++ /dev/null @@ -1,76 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import argparse -import os -from ppcls.arch import backbone - -import paddle.fluid as fluid -import paddle_serving_client.io as serving_io - - -def parse_args(): - parser = argparse.ArgumentParser() - parser.add_argument("-m", "--model", type=str) - parser.add_argument("-p", "--pretrained_model", type=str) - parser.add_argument("-o", "--output_path", type=str, default="") - parser.add_argument("--class_dim", type=int, default=1000) - parser.add_argument("--img_size", type=int, default=224) - - return parser.parse_args() - - -def create_input(img_size=224): - image = fluid.data( - name='image', shape=[None, 3, img_size, img_size], dtype='float32') - return image - - -def create_model(args, model, input, class_dim=1000): - if args.model == "GoogLeNet": - out, _, _ = model.net(input=input, class_dim=class_dim) - else: - out = model.net(input=input, class_dim=class_dim) - out = fluid.layers.softmax(out) - return out - - -def main(): - args = parse_args() - - model = backbone.__dict__[args.model]() - - place = fluid.CPUPlace() - exe = fluid.Executor(place) - - startup_prog = fluid.Program() - infer_prog = fluid.Program() - - with fluid.program_guard(infer_prog, startup_prog): - with fluid.unique_name.guard(): - image = create_input(args.img_size) - out = create_model(args, model, image, class_dim=args.class_dim) - - infer_prog = infer_prog.clone(for_test=True) - fluid.load( - program=infer_prog, model_path=args.pretrained_model, executor=exe) - - model_path = os.path.join(args.output_path, "ppcls_model") - conf_path = os.path.join(args.output_path, "ppcls_client_conf") - serving_io.save_model(model_path, conf_path, {"image": image}, - {"prediction": out}, infer_prog) - - -if __name__ == "__main__": - main() diff --git a/tools/infer.py b/tools/infer.py index 256037a76d433828bf59f874c3136b0f70ec9511..da23a3d88c2f00e56c2868cbffd6d8e2b05202f9 100644 --- a/tools/infer.py +++ b/tools/infer.py @@ -25,7 +25,8 @@ from ppcls.engine.trainer import Trainer if __name__ == "__main__": args = config.parse_args() - config = config.get_config(args.config, overrides=args.override, show=True) + config = config.get_config( + args.config, overrides=args.override, show=False) trainer = Trainer(config, mode="infer") trainer.infer() diff --git a/tools/infer/infer.py b/tools/infer/infer.py deleted file mode 100644 index 241cb3c3a06356d1518752629ec765f8de531f3d..0000000000000000000000000000000000000000 --- a/tools/infer/infer.py +++ /dev/null @@ -1,94 +0,0 @@ -# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import cv2 -import os -import sys - -import paddle -import paddle.nn.functional as F - -__dir__ = os.path.dirname(os.path.abspath(__file__)) -sys.path.append(__dir__) -sys.path.append(os.path.abspath(os.path.join(__dir__, '../..'))) - -from ppcls.utils.save_load import load_dygraph_pretrain -from ppcls.utils import logger -from ppcls.arch import backbone -from utils import parse_args, get_image_list, preprocess, postprocess, save_prelabel_results - - -def main(): - args = parse_args() - # assign the place - place = paddle.set_device('gpu' if args.use_gpu else 'cpu') - multilabel = True if args.multilabel else False - - net = backbone.__dict__[args.model](class_dim=args.class_num) - load_dygraph_pretrain(net, args.pretrained_model, args.load_static_weights) - image_list = get_image_list(args.image_file) - batch_input_list = [] - img_path_list = [] - cnt = 0 - for idx, img_path in enumerate(image_list): - img = cv2.imread(img_path) - if img is None: - logger.warning( - "Image file failed to read and has been skipped. The path: {}". - format(img_path)) - continue - else: - img = img[:, :, ::-1] - data = preprocess(img, args) - batch_input_list.append(data) - img_path_list.append(img_path) - cnt += 1 - - if cnt % args.batch_size == 0 or (idx + 1) == len(image_list): - batch_tensor = paddle.to_tensor(batch_input_list) - net.eval() - batch_outputs = net(batch_tensor) - if args.model == "GoogLeNet": - batch_outputs = batch_outputs[0] - if multilabel: - batch_outputs = F.sigmoid(batch_outputs) - else: - batch_outputs = F.softmax(batch_outputs) - batch_outputs = batch_outputs.numpy() - batch_result_list = postprocess(batch_outputs, args.top_k, multilabel=multilabel) - - for number, result_dict in enumerate(batch_result_list): - filename = img_path_list[number].split("/")[-1] - clas_ids = result_dict["clas_ids"] - if multilabel: - print("File:{}, multilabel result: ".format(filename)) - for id, score in zip(clas_ids, result_dict["scores"]): - print("\tclass id: {}, probability: {:.2f}".format(id, score)) - else: - scores_str = "[{}]".format(", ".join("{:.2f}".format( - r) for r in result_dict["scores"])) - print("File:{}, Top-{} result: class id(s): {}, score(s): {}". - format(filename, args.top_k, clas_ids, scores_str)) - - if args.pre_label_image: - save_prelabel_results(clas_ids[0], img_path_list[number], - args.pre_label_out_idr) - - batch_input_list = [] - img_path_list = [] - - -if __name__ == "__main__": - main() diff --git a/tools/program.py b/tools/program.py deleted file mode 100644 index 731aa044479350314d5c1e3d4d6da02c6f10ffd1..0000000000000000000000000000000000000000 --- a/tools/program.py +++ /dev/null @@ -1,446 +0,0 @@ -# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import time -import datetime -from collections import OrderedDict - -import paddle -from paddle import to_tensor -import paddle.nn.functional as F - -from ppcls.optimizer import LearningRateBuilder -from ppcls.optimizer import OptimizerBuilder -from ppcls.arch import backbone -from ppcls.arch.loss import MultiLabelLoss -from ppcls.arch.loss import CELoss -from ppcls.arch.loss import MixCELoss -from ppcls.arch.loss import JSDivLoss -from ppcls.arch.loss import GoogLeNetLoss -from ppcls.utils.misc import AverageMeter -from ppcls.utils import logger -from ppcls.utils import profiler -from ppcls.utils import multi_hot_encode -from ppcls.utils import hamming_distance -from ppcls.utils import accuracy_score - - -def create_model(architecture, classes_num): - """ - Create a model - - Args: - architecture(dict): architecture information, - name(such as ResNet50) is needed - image(variable): model input variable - classes_num(int): num of classes - - Returns: - out(variable): model output variable - """ - name = architecture["name"] - params = architecture.get("params", {}) - return backbone.__dict__[name](class_dim=classes_num, **params) - - -def create_loss(feeds, - out, - architecture, - classes_num=1000, - epsilon=None, - use_mix=False, - use_distillation=False, - multilabel=False): - """ - Create a loss for optimization, such as: - 1. CrossEnotry loss - 2. CrossEnotry loss with label smoothing - 3. CrossEnotry loss with mix(mixup, cutmix, fmix) - 4. CrossEnotry loss with label smoothing and (mixup, cutmix, fmix) - 5. GoogLeNet loss - - Args: - out(variable): model output variable - feeds(dict): dict of model input variables - architecture(dict): architecture information, - name(such as ResNet50) is needed - classes_num(int): num of classes - epsilon(float): parameter for label smoothing, 0.0 <= epsilon <= 1.0 - use_mix(bool): whether to use mix(include mixup, cutmix, fmix) - - Returns: - loss(variable): loss variable - """ - if architecture["name"] == "GoogLeNet": - assert len(out) == 3, "GoogLeNet should have 3 outputs" - loss = GoogLeNetLoss(class_dim=classes_num, epsilon=epsilon) - return loss(out[0], out[1], out[2], feeds["label"]) - - if use_distillation: - assert len(out) == 2, ("distillation output length must be 2, " - "but got {}".format(len(out))) - loss = JSDivLoss(class_dim=classes_num, epsilon=epsilon) - return loss(out[1], out[0]) - - if use_mix: - loss = MixCELoss(class_dim=classes_num, epsilon=epsilon) - feed_y_a = feeds['y_a'] - feed_y_b = feeds['y_b'] - feed_lam = feeds['lam'] - return loss(out, feed_y_a, feed_y_b, feed_lam) - else: - if not multilabel: - loss = CELoss(class_dim=classes_num, epsilon=epsilon) - else: - loss = MultiLabelLoss(class_dim=classes_num, epsilon=epsilon) - return loss(out, feeds["label"]) - - -def create_metric(out, - label, - architecture, - topk=5, - classes_num=1000, - use_distillation=False, - multilabel=False, - mode="train", - use_xpu=False): - """ - Create measures of model accuracy, such as top1 and top5 - - Args: - out(variable): model output variable - feeds(dict): dict of model input variables(included label) - topk(int): usually top5 - classes_num(int): num of classes - use_distillation(bool): whether to use distillation training - mode(str): mode, train/valid - - Returns: - fetchs(dict): dict of measures - """ - if architecture["name"] == "GoogLeNet": - assert len(out) == 3, "GoogLeNet should have 3 outputs" - out = out[0] - else: - # just need student label to get metrics - if use_distillation: - out = out[1] - softmax_out = F.softmax(out) - - fetch_list = [] - metric_names = [] - if not multilabel: - softmax_out = F.softmax(out) - - # set top1 to fetchs - top1 = paddle.metric.accuracy(softmax_out, label=label, k=1) - # set topk to fetchs - k = min(topk, classes_num) - topk = paddle.metric.accuracy(softmax_out, label=label, k=k) - - metric_names.append("top1") - metric_names.append("top{}".format(k)) - - fetch_list.append(top1) - fetch_list.append(topk) - else: - out = F.sigmoid(out) - preds = multi_hot_encode(out.numpy()) - targets = label.numpy() - ham_dist = to_tensor(hamming_distance(preds, targets)) - accuracy = to_tensor(accuracy_score(preds, targets, base="label")) - - ham_dist_name = "hamming_distance" - accuracy_name = "multilabel_accuracy" - metric_names.append(ham_dist_name) - metric_names.append(accuracy_name) - - fetch_list.append(accuracy) - fetch_list.append(ham_dist) - - # multi cards' eval - if not use_xpu: - if mode != "train" and paddle.distributed.get_world_size() > 1: - for idx, fetch in enumerate(fetch_list): - fetch_list[idx] = paddle.distributed.all_reduce( - fetch, op=paddle.distributed.ReduceOp. - SUM) / paddle.distributed.get_world_size() - - fetchs = OrderedDict() - for idx, name in enumerate(metric_names): - fetchs[name] = fetch_list[idx] - return fetchs - - -def create_fetchs(feeds, net, config, mode="train"): - """ - Create fetchs as model outputs(included loss and measures), - will call create_loss and create_metric(if use_mix). - - Args: - out(variable): model output variable - feeds(dict): dict of model input variables. - If use mix_up, it will not include label. - architecture(dict): architecture information, - name(such as ResNet50) is needed - topk(int): usually top5 - classes_num(int): num of classes - epsilon(float): parameter for label smoothing, 0.0 <= epsilon <= 1.0 - use_mix(bool): whether to use mix(include mixup, cutmix, fmix) - - Returns: - fetchs(dict): dict of model outputs(included loss and measures) - """ - architecture = config.ARCHITECTURE - topk = config.topk - classes_num = config.classes_num - epsilon = config.get('ls_epsilon') - use_mix = config.get('use_mix') and mode == 'train' - use_distillation = config.get('use_distillation') - multilabel = config.get('multilabel', False) - use_xpu = config.get("use_xpu", False) - - out = net(feeds["image"]) - - fetchs = OrderedDict() - fetchs['loss'] = create_loss(feeds, out, architecture, classes_num, - epsilon, use_mix, use_distillation, - multilabel) - if not use_mix: - metric = create_metric( - out, - feeds["label"], - architecture, - topk, - classes_num, - use_distillation, - multilabel=multilabel, - mode=mode, - use_xpu=use_xpu) - fetchs.update(metric) - - return fetchs - - -def create_optimizer(config, parameter_list=None): - """ - Create an optimizer using config, usually including - learning rate and regularization. - - Args: - config(dict): such as - { - 'LEARNING_RATE': - {'function': 'Cosine', - 'params': {'lr': 0.1} - }, - 'OPTIMIZER': - {'function': 'Momentum', - 'params':{'momentum': 0.9}, - 'regularizer': - {'function': 'L2', 'factor': 0.0001} - } - } - - Returns: - an optimizer instance - """ - # create learning_rate instance - lr_config = config['LEARNING_RATE'] - lr_config['params'].update({ - 'epochs': config['epochs'], - 'step_each_epoch': - config['total_images'] // config['TRAIN']['batch_size'], - }) - lr = LearningRateBuilder(**lr_config)() - - # create optimizer instance - opt_config = config['OPTIMIZER'] - opt = OptimizerBuilder(**opt_config) - return opt(lr, parameter_list), lr - - -def create_feeds(batch, use_mix, num_classes, multilabel=False): - image = batch[0] - if use_mix: - y_a = to_tensor(batch[1].numpy().astype("int64").reshape(-1, 1)) - y_b = to_tensor(batch[2].numpy().astype("int64").reshape(-1, 1)) - lam = to_tensor(batch[3].numpy().astype("float32").reshape(-1, 1)) - feeds = {"image": image, "y_a": y_a, "y_b": y_b, "lam": lam} - else: - if not multilabel: - label = to_tensor(batch[1].numpy().astype("int64").reshape(-1, 1)) - else: - label = to_tensor(batch[1].numpy().astype('float32').reshape( - -1, num_classes)) - feeds = {"image": image, "label": label} - return feeds - - -total_step = 0 - - -def run(dataloader, - config, - net, - optimizer=None, - lr_scheduler=None, - epoch=0, - mode='train', - vdl_writer=None, - profiler_options=None): - """ - Feed data to the model and fetch the measures and loss - - Args: - dataloader(paddle dataloader): - exe(): - program(): - fetchs(dict): dict of measures and the loss - epoch(int): epoch of training or validation - model(str): log only - - Returns: - """ - print_interval = config.get("print_interval", 10) - use_mix = config.get("use_mix", False) and mode == "train" - multilabel = config.get("multilabel", False) - classes_num = config.get("classes_num") - - metric_list = [ - ("loss", AverageMeter( - 'loss', '7.5f', postfix=",")), - ("lr", AverageMeter( - 'lr', 'f', postfix=",", need_avg=False)), - ("batch_time", AverageMeter( - 'batch_cost', '.5f', postfix=" s,")), - ("reader_time", AverageMeter( - 'reader_cost', '.5f', postfix=" s,")), - ] - if not use_mix: - if not multilabel: - topk_name = 'top{}'.format(config.topk) - metric_list.insert( - 0, (topk_name, AverageMeter( - topk_name, '.5f', postfix=","))) - metric_list.insert( - 0, ("top1", AverageMeter( - "top1", '.5f', postfix=","))) - else: - metric_list.insert( - 0, ("multilabel_accuracy", AverageMeter( - "multilabel_accuracy", '.5f', postfix=","))) - metric_list.insert( - 0, ("hamming_distance", AverageMeter( - "hamming_distance", '.5f', postfix=","))) - - metric_list = OrderedDict(metric_list) - - tic = time.time() - for idx, batch in enumerate(dataloader()): - # avoid statistics from warmup time - if idx == 10: - metric_list["batch_time"].reset() - metric_list["reader_time"].reset() - - profiler.add_profiler_step(profiler_options) - - metric_list['reader_time'].update(time.time() - tic) - batch_size = len(batch[0]) - feeds = create_feeds(batch, use_mix, classes_num, multilabel) - fetchs = create_fetchs(feeds, net, config, mode) - if mode == 'train': - avg_loss = fetchs['loss'] - avg_loss.backward() - - optimizer.step() - optimizer.clear_grad() - lr_value = optimizer._global_learning_rate().numpy()[0] - metric_list['lr'].update(lr_value, batch_size) - - if lr_scheduler is not None: - if lr_scheduler.update_specified: - curr_global_counter = lr_scheduler.step_each_epoch * epoch + idx - update = max( - 0, curr_global_counter - lr_scheduler.update_start_step - ) % lr_scheduler.update_step_interval == 0 - if update: - lr_scheduler.step() - else: - lr_scheduler.step() - - for name, fetch in fetchs.items(): - metric_list[name].update(fetch.numpy()[0], batch_size) - metric_list["batch_time"].update(time.time() - tic) - tic = time.time() - - if vdl_writer and mode == "train": - global total_step - logger.scaler( - name="lr", value=lr_value, step=total_step, writer=vdl_writer) - for name, fetch in fetchs.items(): - logger.scaler( - name="train_{}".format(name), - value=fetch.numpy()[0], - step=total_step, - writer=vdl_writer) - total_step += 1 - - fetchs_str = ' '.join([ - str(metric_list[key].mean) - if "time" in key else str(metric_list[key].value) - for key in metric_list - ]) - - if idx % print_interval == 0: - ips_info = "ips: {:.5f} images/sec".format( - batch_size / metric_list["batch_time"].avg) - - if mode == "train": - epoch_str = "epoch:{:<3d}".format(epoch) - step_str = "{:s} step:{:<4d}".format(mode, idx) - eta_sec = ((config["epochs"] - epoch) * len(dataloader) - idx - ) * metric_list["batch_time"].avg - eta_str = "eta: {:s}".format( - str(datetime.timedelta(seconds=int(eta_sec)))) - logger.info("{:s}, {:s}, {:s} {:s}, {:s}".format( - epoch_str, step_str, fetchs_str, ips_info, eta_str)) - else: - logger.info("{:s} step:{:<4d}, {:s} {:s}".format( - mode, idx, fetchs_str, ips_info)) - - end_str = ' '.join([str(m.mean) for m in metric_list.values()] + - [metric_list['batch_time'].total]) - ips_info = "ips: {:.5f} images/sec.".format( - batch_size * metric_list["batch_time"].count / - metric_list["batch_time"].sum) - - if mode == 'eval': - logger.info("END {:s} {:s} {:s}".format(mode, end_str, ips_info)) - else: - end_epoch_str = "END epoch:{:<3d}".format(epoch) - logger.info("{:s} {:s} {:s} {:s}".format(end_epoch_str, mode, end_str, - ips_info)) - - # return top1_acc in order to save the best model - if mode == 'valid': - if multilabel: - return metric_list['multilabel_accuracy'].avg - else: - return metric_list['top1'].avg diff --git a/tools/run.sh b/tools/run.sh deleted file mode 100755 index 345b62758f447f9084442f4cdd681dd0bbdd8e74..0000000000000000000000000000000000000000 --- a/tools/run.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/usr/bin/env bash - -python3.7 -m paddle.distributed.launch \ - --gpus="0,1,2,3" \ - tools/train.py \ - -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml \ - -o print_interval=10 diff --git a/tools/run_download.sh b/tools/run_download.sh deleted file mode 100755 index ffcbd88c742a023f214e1e91bb5445af63b6a603..0000000000000000000000000000000000000000 --- a/tools/run_download.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/usr/bin/env bash - -python tools/download.py -a ResNet34 -p ./pretrained/ -d 1 diff --git a/tools/train.py b/tools/train.py index aec796c71bf57ab0124844c07db865de916403f4..169678c5a81e651839d62246a99b9f66f27fcdef 100644 --- a/tools/train.py +++ b/tools/train.py @@ -25,6 +25,7 @@ from ppcls.engine.trainer import Trainer if __name__ == "__main__": args = config.parse_args() - config = config.get_config(args.config, overrides=args.override, show=True) + config = config.get_config( + args.config, overrides=args.override, show=False) trainer = Trainer(config, mode="train") trainer.train() diff --git a/tools/train.sh b/tools/train.sh new file mode 100755 index 0000000000000000000000000000000000000000..5fced8636235d533bdadcdbb40769733930a0763 --- /dev/null +++ b/tools/train.sh @@ -0,0 +1,7 @@ +#!/usr/bin/env bash + +# for single card train +# python3.7 tools/train.py -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml + +# for multi-cards train +python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml \ No newline at end of file