提交 4f6372d2 编写于 作者: W weishengyu

Merge branch 'develop' of https://github.com/weisy11/PaddleClas into develop

...@@ -8,17 +8,17 @@ ...@@ -8,17 +8,17 @@
**近期更新** **近期更新**
- 2021.06.22,23,24 PaddleOCR官方研发团队带来技术深入解读三日直播课,6月22日、23日、24日晚上20:30,[直播地址](https://live.bilibili.com/21689802) - 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课,6月22日、23日、24日晚上20:30,[直播地址](https://live.bilibili.com/21689802)
- 2021.06.16 PaddleClas v2.2版本升级,集成Metric learning,向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、Twins、TNT、DLA、HarDNet、RedNet系列30个预训练模型。 - 2021.06.16 PaddleClas v2.2版本升级,集成Metric learning,向量检索等组件。新增商品识别、动漫人物识别、车辆识别和logo识别等4个图像识别应用。新增LeViT、Twins、TNT、DLA、HarDNet、RedNet系列30个预训练模型。
- 2021.05.14 添加`SwinTransformer` 系列模型。 - 2021.05.14 添加`SwinTransformer` 系列模型。
- 2021.04.15 添加`MixNet_L``ReXNet_3_0`系列模型。 - 2021.04.15 添加`MixNet_L``ReXNet_3_0`系列模型。
[more](./docs/zh_CN/update_history.md) [more](./docs/zh_CN/update_history.md)
## 特性 ## 特性
- 实用的图像识别系统:集成了检测、特征学习、检索等模块,广泛适用于各类图像识别任务。 - 实用的图像识别系统:集成了目标检测、特征学习、图像检索等模块,广泛适用于各类图像识别任务。
提供商品识别、车辆识别、logo识别和动漫人物识别等4个示例。 提供商品识别、车辆识别、logo识别和动漫人物识别等4个场景应用示例。
- 丰富的预训练模型库:提供了35个系列共164个ImageNet预训练模型,其中6个精选系列模型支持结构快速修改。 - 丰富的预训练模型库:提供了35个系列共164个ImageNet预训练模型,其中6个精选系列模型支持结构快速修改。
...@@ -36,7 +36,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。 ...@@ -36,7 +36,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
## 欢迎加入技术交流群 ## 欢迎加入技术交流群
*可以扫描下面的微信群二维码, 加入PaddleClas 微信交流群。获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。 * 您可以扫描下面的微信群二维码, 加入PaddleClas 微信交流群。获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center"> <div align="center">
<img src="./docs/images/wx_group.png" width = "200" /> <img src="./docs/images/wx_group.png" width = "200" />
...@@ -51,7 +51,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。 ...@@ -51,7 +51,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
- [图像识别快速体验](./docs/zh_CN/tutorials/quick_start_recognition.md) - [图像识别快速体验](./docs/zh_CN/tutorials/quick_start_recognition.md)
- 算法介绍(更新中) - 算法介绍(更新中)
- [骨干网络和预训练模型库](./docs/zh_CN/ImageNet_models_cn.md) - [骨干网络和预训练模型库](./docs/zh_CN/ImageNet_models_cn.md)
- [主体检测](./docs/zh_CN/application/object_detection.md) - [主体检测](./docs/zh_CN/application/mainbody_detection.md)
- 图像分类 - 图像分类
- [Cifar100分类任务](./docs/zh_CN/tutorials/quick_start_professional.md) - [Cifar100分类任务](./docs/zh_CN/tutorials/quick_start_professional.md)
- 特征学习 - 特征学习
......
...@@ -4,13 +4,13 @@ ...@@ -4,13 +4,13 @@
## Introduction ## Introduction
PaddleClas is a toolset for image classification tasks prepared for the industry and academia. It helps users train better computer vision models and apply them in real scenarios. PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
**Recent update** **Recent updates**
- 2021.06.16 PaddleClas release/2.2. - 2021.06.16 PaddleClas release/2.2.
- Add metric learning and vector search module. - Add metric learning and vector search modules.
- Add product recognition, cartoon character recognition, car recognition and logo recognition. - Add product recognition, animation character recognition, vehicle recognition and logo recognition.
- Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper. - Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- 2021.05.14 - 2021.05.14
...@@ -21,467 +21,97 @@ PaddleClas is a toolset for image classification tasks prepared for the industry ...@@ -21,467 +21,97 @@ PaddleClas is a toolset for image classification tasks prepared for the industry
- [more](./docs/en/update_history_en.md) - [more](./docs/en/update_history_en.md)
## Features ## Features
- Rich model zoo. Based on the ImageNet-1k classification dataset, PaddleClas provides 29 series of classification network structures and training configurations, 134 models' pretrained weights and their evaluation metrics. - A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
- SSLD Knowledge Distillation. Based on this SSLD distillation strategy, the top-1 acc of the distilled model is generally increased by more than 3%.
- Data augmentation: PaddleClas provides detailed introduction of 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, code reproduction and effect evaluation in a unified experimental environment. - Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 34 series, among which 6 selected series of models support fast structural modification.
- Pretrained model with 100,000 categories: Based on `ResNet50_vd` model, Baidu open sourced the `ResNet50_vd` pretrained model trained on a 100,000-category dataset. In some practical scenarios, the accuracy based on the pretrained weights can be increased by up to 30%. - Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
- A variety of training modes, including multi-machine training, mixed precision training, etc. - SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- A variety of inference and deployment solutions, including TensorRT inference, Paddle-Lite inference, model service deployment, model quantification, Paddle Hub, etc. - Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
- Support Linux, Windows, macOS and other systems.
## Image Recognition System Effect Demonstration
<div align="center">
<img src="./docs/images/recognition.gif" width = "400" />
</div>
## Community ## Welcome to Join the Technical Exchange Group
* Scan the QR code below with your Wechat and send the message `分类` out, then you will be invited into the official technical exchange group. * You can also scan the QR code below to join the PaddleClas WeChat group to get more efficient answers to your questions and to communicate with developers from all walks of life. We look forward to hearing from you.
<div align="center"> <div align="center">
<img src="./docs/images/wx_group.jpeg" width = "200" height = "200" /> <img src="./docs/images/wx_group.png" width = "200" />
</div> </div>
## Quick Start
Quick experience of image recognition:[Link](./docs/zh_CN/tutorials/quick_start_recognition.md)
## Tutorials ## Tutorials
- [Installation](./docs/en/tutorials/install_en.md) - [Quick Installatiopn](./docs/zh_CN/tutorials/install.md)
- [Quick start PaddleClas in 30 minutes](./docs/en/tutorials/quick_start_en.md) - [Quick Start of Recognition](./docs/zh_CN/tutorials/quick_start_recognition.md)
- [Model introduction and model zoo](./docs/en/models/models_intro_en.md) - Algorithms Introduction(Updating)
- [Model zoo overview](#Model_zoo_overview) - [Backbone Network and Pre-trained Model Library](./docs/zh_CN/models/models_intro.md)
- [SSLD pretrained models](#SSLD_pretrained_series) - [Mainbody Detection](./docs/zh_CN/application/object_detection.md)
- [ResNet and Vd series](#ResNet_and_Vd_series) - Image Classification
- [Mobile series](#Mobile_series) - [ImageNet Classification](./docs/zh_CN/tutorials/quick_start_professional.md)
- [SEResNeXt and Res2Net series](#SEResNeXt_and_Res2Net_series) - Feature Learning
- [DPN and DenseNet series](#DPN_and_DenseNet_series) - [Product Recognition](./docs/zh_CN/application/product_recognition.md)
- [HRNet series](#HRNet_series) - [Vehicle Recognition](./docs/zh_CN/application/vehicle_reid.md)
- [Inception series](#Inception_series) - [Logo Recognition](./docs/zh_CN/application/logo_recognition.md)
- [EfficientNet and ResNeXt101_wsl series](#EfficientNet_and_ResNeXt101_wsl_series) - [Animation Character Recognition](./docs/zh_CN/application/cartoon_character_recognition.md)
- [ResNeSt and RegNet series](#ResNeSt_and_RegNet_series) - [Vector Retrieval](./deploy/vector_search/README.md)
- [ViT and DeiT series](#ViT_and_DeiT) - Models Training/Evaluation
- [RepVGG series](#RepVGG) - [Image Classification](./docs/zh_CN/tutorials/getting_started.md)
- [MixNet series](#MixNet) - [Feature Learning](./docs/zh_CN/application/feature_learning.md)
- [ReXNet series](#ReXNet) - Inference Model Prediction(Updating)
- [SwinTransformer series](#SwinTransformer) - [Python Inference](./docs/zh_CN/tutorials/getting_started.md)
- [Others](#Others) - [C++ Inference](./deploy/cpp_infer/readme.md)
- HS-ResNet: arxiv link: [https://arxiv.org/pdf/2010.07621.pdf](https://arxiv.org/pdf/2010.07621.pdf). Code and models are coming soon! - [Hub Serving Deployment](./deploy/hubserving/readme.md)
- Model training/evaluation - [Mobile Deployment](./deploy/lite/readme.md)
- [Data preparation](./docs/en/tutorials/data_en.md) - [Inference Using whl](./docs/zh_CN/whl.md)
- [Model training and finetuning](./docs/en/tutorials/getting_started_en.md) - Advanced Tutorial
- [Model evaluation](./docs/en/tutorials/getting_started_en.md) - [Knowledge Distillation](./docs/zh_CN/advanced_tutorials/distillation/distillation.md)
- [Configuration details](./docs/en/tutorials/config_en.md) - [Model Quantization](./docs/zh_CN/extension/paddle_quantization.md)
- Model prediction/inference - [Data Augmentation](./docs/zh_CN/advanced_tutorials/image_augmentation/ImageAugment.md)
- [Prediction based on training engine](./docs/en/tutorials/getting_started_en.md) - FAQ(Suspended Updates)
- [Python inference](./docs/en/tutorials/getting_started_en.md) - [Image Classification FAQ](docs/zh_CN/faq.md)
- [C++ inference](./deploy/cpp_infer/readme_en.md)
- [Serving deployment](./deploy/hubserving/readme_en.md)
- [Mobile](./deploy/lite/readme_en.md)
- [Inference using whl ](./docs/en/whl_en.md)
- [Model Quantization and Compression](deploy/slim/quant/README_en.md)
- Advanced tutorials
- [Knowledge distillation](./docs/en/advanced_tutorials/distillation/distillation_en.md)
- [Data augmentation](./docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md)
- [Multilabel classification](./docs/en/advanced_tutorials/multilabel/multilabel_en.md)
- Applications
- [Transfer learning](./docs/en/application/transfer_learning_en.md)
- [Pretrained model with 100,000 categories](./docs/en/application/transfer_learning_en.md)
- [Generic object detection](./docs/en/application/object_detection_en.md)
- FAQ
- [General image classification problems](./docs/en/faq_en.md)
- [PaddleClas FAQ](./docs/en/faq_en.md)
- [Competition support](./docs/en/competition_support_en.md)
- [License](#License) - [License](#License)
- [Contribution](#Contribution) - [Contribution](#Contribution)
<a name="Model_zoo_overview"></a> ## Introduction to Image Recognition Systems
### Model zoo overview
Based on the ImageNet-1k classification dataset, the 24 classification network structures supported by PaddleClas and the corresponding 122 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The evaluation environment is as follows.
* CPU evaluation environment is based on Snapdragon 855 (SD855).
* The GPU evaluation speed is measured by running 500 times under the FP32+TensorRT configuration (excluding the warmup time of the first 10 times).
Curves of accuracy to the inference time of common server-side models are shown as follows.
![](./docs/images/models/T4_benchmark/t4.fp32.bs1.main_fps_top1.png)
Curves of accuracy to the inference time and storage size of common mobile-side models are shown as follows.
![](./docs/images/models/mobile_arm_storage.png)
![](./docs/images/models/mobile_arm_top1.png)
<a name="SSLD_pretrained_series"></a>
### SSLD pretrained models
Accuracy and inference time of the prtrained models based on SSLD distillation are as follows. More detailed information can be refered to [SSLD distillation tutorial](./docs/en/advanced_tutorials/distillation/distillation_en.md).
* Server-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld | 0.824 | 0.791 | 0.033 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld_v2 | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
| ResNet101_vd_<br>ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) |
| Res2Net50_vd_<br>26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net101_vd_<br>26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 13.707 | 34.435 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
| SE_HRNet_W64_C_ssld | 0.848 | - | - | 31.697 | 94.995 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
* Mobile-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model size(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| MobileNetV1_<br>ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_<br>ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| GhostNet_<br>x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
* Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
<a name="ResNet_and_Vd_series"></a>
### ResNet and Vd series
Accuracy and inference time metrics of ResNet and Vd series models are shown as follows. More detailed information can be refered to [ResNet and Vd series tutorial](./docs/en/models/ResNet_and_vd_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|
| ResNet18 | 0.7098 | 0.8992 | 1.45606 | 3.56305 | 3.66 | 11.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_pretrained.pdparams) |
| ResNet18_vd | 0.7226 | 0.9080 | 1.54557 | 3.85363 | 4.14 | 11.71 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet18_vd_pretrained.pdparams) |
| ResNet34 | 0.7457 | 0.9214 | 2.34957 | 5.89821 | 7.36 | 21.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_pretrained.pdparams) |
| ResNet34_vd | 0.7598 | 0.9298 | 2.43427 | 6.22257 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_pretrained.pdparams) |
| ResNet34_vd_ssld | 0.7972 | 0.9490 | 2.43427 | 6.22257 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet34_vd_ssld_pretrained.pdparams) |
| ResNet50 | 0.7650 | 0.9300 | 3.47712 | 7.84421 | 8.19 | 25.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) |
| ResNet50_vc | 0.7835 | 0.9403 | 3.52346 | 8.10725 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams) |
| ResNet50_vd | 0.7912 | 0.9444 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams) |
| ResNet50_vd_v2 | 0.7984 | 0.9493 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_v2_pretrained.pdparams) |
| ResNet101 | 0.7756 | 0.9364 | 6.07125 | 13.40573 | 15.52 | 44.55 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_pretrained.pdparams) |
| ResNet101_vd | 0.8017 | 0.9497 | 6.11704 | 13.76222 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_pretrained.pdparams) |
| ResNet152 | 0.7826 | 0.9396 | 8.50198 | 19.17073 | 23.05 | 60.19 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_pretrained.pdparams) |
| ResNet152_vd | 0.8059 | 0.9530 | 8.54376 | 19.52157 | 23.53 | 60.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet152_vd_pretrained.pdparams) |
| ResNet200_vd | 0.8093 | 0.9533 | 10.80619 | 25.01731 | 30.53 | 74.74 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet200_vd_pretrained.pdparams) |
| ResNet50_vd_<br>ssld | 0.8239 | 0.9610 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld_v2 | 0.8300 | 0.9640 | 3.53131 | 8.09057 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_v2_pretrained.pdparams) |
| ResNet101_vd_<br>ssld | 0.8373 | 0.9669 | 6.11704 | 13.76222 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet101_vd_ssld_pretrained.pdparams) |
<a name="Mobile_series"></a>
### Mobile series
Accuracy and inference time metrics of Mobile series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/Mobile_en.md).
| Model | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Model storage size(M) | Download Address |
|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|
| MobileNetV1_<br>x0_25 | 0.5143 | 0.7546 | 3.21985 | 0.07 | 0.46 | 1.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_25_pretrained.pdparams) |
| MobileNetV1_<br>x0_5 | 0.6352 | 0.8473 | 9.579599 | 0.28 | 1.31 | 5.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_5_pretrained.pdparams) |
| MobileNetV1_<br>x0_75 | 0.6881 | 0.8823 | 19.436399 | 0.63 | 2.55 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_x0_75_pretrained.pdparams) |
| MobileNetV1 | 0.7099 | 0.8968 | 32.523048 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_pretrained.pdparams) |
| MobileNetV1_<br>ssld | 0.7789 | 0.9394 | 32.523048 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_<br>x0_25 | 0.5321 | 0.7652 | 3.79925 | 0.05 | 1.5 | 6.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams) |
| MobileNetV2_<br>x0_5 | 0.6503 | 0.8572 | 8.7021 | 0.17 | 1.93 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams) |
| MobileNetV2_<br>x0_75 | 0.6983 | 0.8901 | 15.531351 | 0.35 | 2.58 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams) |
| MobileNetV2 | 0.7215 | 0.9065 | 23.317699 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams) |
| MobileNetV2_<br>x1_5 | 0.7412 | 0.9167 | 45.623848 | 1.32 | 6.76 | 26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams) |
| MobileNetV2_<br>x2_0 | 0.7523 | 0.9258 | 74.291649 | 2.32 | 11.13 | 43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams) |
| MobileNetV2_<br>ssld | 0.7674 | 0.9339 | 23.317699 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_25 | 0.7641 | 0.9295 | 28.217701 | 0.714 | 7.44 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_25_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0 | 0.7532 | 0.9231 | 19.30835 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams) |
| MobileNetV3_<br>large_x0_75 | 0.7314 | 0.9108 | 13.5646 | 0.296 | 3.91 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_75_pretrained.pdparams) |
| MobileNetV3_<br>large_x0_5 | 0.6924 | 0.8852 | 7.49315 | 0.138 | 2.67 | 11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams) |
| MobileNetV3_<br>large_x0_35 | 0.6432 | 0.8546 | 5.13695 | 0.077 | 2.1 | 8.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_35_pretrained.pdparams) |
| MobileNetV3_<br>small_x1_25 | 0.7067 | 0.8951 | 9.2745 | 0.195 | 3.62 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_25_pretrained.pdparams) |
| MobileNetV3_<br>small_x1_0 | 0.6824 | 0.8806 | 6.5463 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_75 | 0.6602 | 0.8633 | 5.28435 | 0.088 | 2.37 | 9.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_75_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_5 | 0.5921 | 0.8152 | 3.35165 | 0.043 | 1.9 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_5_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35 | 0.5303 | 0.7637 | 2.6352 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35_ssld | 0.5555 | 0.7771 | 2.6352 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0_ssld | 0.7896 | 0.9448 | 19.30835 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_<br>x1_0_ssld | 0.7129 | 0.9010 | 6.5463 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| ShuffleNetV2 | 0.6880 | 0.8845 | 10.941 | 0.28 | 2.26 | 9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams) |
| ShuffleNetV2_<br>x0_25 | 0.4990 | 0.7379 | 2.329 | 0.03 | 0.6 | 2.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams) |
| ShuffleNetV2_<br>x0_33 | 0.5373 | 0.7705 | 2.64335 | 0.04 | 0.64 | 2.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams) |
| ShuffleNetV2_<br>x0_5 | 0.6032 | 0.8226 | 4.2613 | 0.08 | 1.36 | 5.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams) |
| ShuffleNetV2_<br>x1_5 | 0.7163 | 0.9015 | 19.3522 | 0.58 | 3.47 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams) |
| ShuffleNetV2_<br>x2_0 | 0.7315 | 0.9120 | 34.770149 | 1.12 | 7.32 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams) |
| ShuffleNetV2_<br>swish | 0.7003 | 0.8917 | 16.023151 | 0.29 | 2.26 | 9.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams) |
| GhostNet_<br>x0_5 | 0.6688 | 0.8695 | 5.7143 | 0.082 | 2.6 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams) |
| GhostNet_<br>x1_0 | 0.7402 | 0.9165 | 13.5587 | 0.294 | 5.2 | 20 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams) |
| GhostNet_<br>x1_3 | 0.7579 | 0.9254 | 19.9825 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams) |
| GhostNet_<br>x1_3_ssld | 0.7938 | 0.9449 | 19.9825 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
<a name="SEResNeXt_and_Res2Net_series"></a>
### SEResNeXt and Res2Net series
Accuracy and inference time metrics of SEResNeXt and Res2Net series models are shown as follows. More detailed information can be refered to [SEResNext and_Res2Net series tutorial](./docs/en/models/SEResNext_and_Res2Net_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
| Res2Net50_<br>26w_4s | 0.7933 | 0.9457 | 4.47188 | 9.65722 | 8.52 | 25.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams) |
| Res2Net50_vd_<br>26w_4s | 0.7975 | 0.9491 | 4.52712 | 9.93247 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams) |
| Res2Net50_<br>14w_8s | 0.7946 | 0.9470 | 5.4026 | 10.60273 | 9.01 | 25.72 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams) |
| Res2Net101_vd_<br>26w_4s | 0.8064 | 0.9522 | 8.08729 | 17.31208 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s | 0.8121 | 0.9571 | 14.67806 | 32.35032 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.8513 | 0.9742 | 14.67806 | 32.35032 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| ResNeXt50_<br>32x4d | 0.7775 | 0.9382 | 7.56327 | 10.6134 | 8.02 | 23.64 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams) |
| ResNeXt50_vd_<br>32x4d | 0.7956 | 0.9462 | 7.62044 | 11.03385 | 8.5 | 23.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams) |
| ResNeXt50_<br>64x4d | 0.7843 | 0.9413 | 13.80962 | 18.4712 | 15.06 | 42.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams) |
| ResNeXt50_vd_<br>64x4d | 0.8012 | 0.9486 | 13.94449 | 18.88759 | 15.54 | 42.38 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams) |
| ResNeXt101_<br>32x4d | 0.7865 | 0.9419 | 16.21503 | 19.96568 | 15.01 | 41.54 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams) |
| ResNeXt101_vd_<br>32x4d | 0.8033 | 0.9512 | 16.28103 | 20.25611 | 15.49 | 41.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams) |
| ResNeXt101_<br>64x4d | 0.7835 | 0.9452 | 30.4788 | 36.29801 | 29.05 | 78.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams) |
| ResNeXt101_vd_<br>64x4d | 0.8078 | 0.9520 | 30.40456 | 36.77324 | 29.53 | 78.14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams) |
| ResNeXt152_<br>32x4d | 0.7898 | 0.9433 | 24.86299 | 29.36764 | 22.01 | 56.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams) |
| ResNeXt152_vd_<br>32x4d | 0.8072 | 0.9520 | 25.03258 | 30.08987 | 22.49 | 56.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams) |
| ResNeXt152_<br>64x4d | 0.7951 | 0.9471 | 46.7564 | 56.34108 | 43.03 | 107.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams) |
| ResNeXt152_vd_<br>64x4d | 0.8108 | 0.9534 | 47.18638 | 57.16257 | 43.52 | 107.59 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams) |
| SE_ResNet18_vd | 0.7333 | 0.9138 | 1.7691 | 4.19877 | 4.14 | 11.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams) |
| SE_ResNet34_vd | 0.7651 | 0.9320 | 2.88559 | 7.03291 | 7.84 | 21.98 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams) |
| SE_ResNet50_vd | 0.7952 | 0.9475 | 4.28393 | 10.38846 | 8.67 | 28.09 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams) |
| SE_ResNeXt50_<br>32x4d | 0.7844 | 0.9396 | 8.74121 | 13.563 | 8.02 | 26.16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams) |
| SE_ResNeXt50_vd_<br>32x4d | 0.8024 | 0.9489 | 9.17134 | 14.76192 | 10.76 | 26.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams) |
| SE_ResNeXt101_<br>32x4d | 0.7939 | 0.9443 | 18.82604 | 25.31814 | 15.02 | 46.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams) |
| SENet154_vd | 0.8140 | 0.9548 | 53.79794 | 66.31684 | 45.83 | 114.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams) |
<a name="DPN_and_DenseNet_series"></a>
### DPN and DenseNet series
Accuracy and inference time metrics of DPN and DenseNet series models are shown as follows. More detailed information can be refered to [DPN and DenseNet series tutorial](./docs/en/models/DPN_DenseNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------|
| DenseNet121 | 0.7566 | 0.9258 | 4.40447 | 9.32623 | 5.69 | 7.98 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams) |
| DenseNet161 | 0.7857 | 0.9414 | 10.39152 | 22.15555 | 15.49 | 28.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams) |
| DenseNet169 | 0.7681 | 0.9331 | 6.43598 | 12.98832 | 6.74 | 14.15 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams) |
| DenseNet201 | 0.7763 | 0.9366 | 8.20652 | 17.45838 | 8.61 | 20.01 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams) |
| DenseNet264 | 0.7796 | 0.9385 | 12.14722 | 26.27707 | 11.54 | 33.37 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams) |
| DPN68 | 0.7678 | 0.9343 | 11.64915 | 12.82807 | 4.03 | 10.78 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams) |
| DPN92 | 0.7985 | 0.9480 | 18.15746 | 23.87545 | 12.54 | 36.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams) |
| DPN98 | 0.8059 | 0.9510 | 21.18196 | 33.23925 | 22.22 | 58.46 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams) |
| DPN107 | 0.8089 | 0.9532 | 27.62046 | 52.65353 | 35.06 | 82.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams) |
| DPN131 | 0.8070 | 0.9514 | 28.33119 | 46.19439 | 30.51 | 75.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams) |
<a name="HRNet_series"></a>
### HRNet series
Accuracy and inference time metrics of HRNet series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/HRNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|
| HRNet_W18_C | 0.7692 | 0.9339 | 7.40636 | 13.29752 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_pretrained.pdparams) |
| HRNet_W18_C_ssld | 0.81162 | 0.95804 | 7.40636 | 13.29752 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W18_C_ssld_pretrained.pdparams) |
| HRNet_W30_C | 0.7804 | 0.9402 | 9.57594 | 17.35485 | 16.23 | 37.71 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W30_C_pretrained.pdparams) |
| HRNet_W32_C | 0.7828 | 0.9424 | 9.49807 | 17.72921 | 17.86 | 41.23 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W32_C_pretrained.pdparams) |
| HRNet_W40_C | 0.7877 | 0.9447 | 12.12202 | 25.68184 | 25.41 | 57.55 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W40_C_pretrained.pdparams) |
| HRNet_W44_C | 0.7900 | 0.9451 | 13.19858 | 32.25202 | 29.79 | 67.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W44_C_pretrained.pdparams) |
| HRNet_W48_C | 0.7895 | 0.9442 | 13.70761 | 34.43572 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_pretrained.pdparams) |
| HRNet_W48_C_ssld | 0.8363 | 0.9682 | 13.70761 | 34.43572 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W48_C_ssld_pretrained.pdparams) |
| HRNet_W64_C | 0.7930 | 0.9461 | 17.57527 | 47.9533 | 57.83 | 128.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HRNet_W64_C_pretrained.pdparams) |
| SE_HRNet_W64_C_ssld | 0.8475 |  0.9726 | 31.69770 | 94.99546 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
<a name="Inception_series"></a>
### Inception series
Accuracy and inference time metrics of Inception series models are shown as follows. More detailed information can be refered to [Inception series tutorial](./docs/en/models/Inception_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------|
| GoogLeNet | 0.7070 | 0.8966 | 1.88038 | 4.48882 | 2.88 | 8.46 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams) |
| Xception41 | 0.7930 | 0.9453 | 4.96939 | 17.01361 | 16.74 | 22.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams) |
| Xception41_deeplab | 0.7955 | 0.9438 | 5.33541 | 17.55938 | 18.16 | 26.73 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams) |
| Xception65 | 0.8100 | 0.9549 | 7.26158 | 25.88778 | 25.95 | 35.48 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams) |
| Xception65_deeplab | 0.8032 | 0.9449 | 7.60208 | 26.03699 | 27.37 | 39.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) |
| Xception71 | 0.8111 | 0.9545 | 8.72457 | 31.55549 | 31.77 | 37.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams) |
| InceptionV3 | 0.7914 | 0.9459 | 6.64054 | 13.53630 | 11.46 | 23.83 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV3_pretrained.pdparams) |
| InceptionV4 | 0.8077 | 0.9526 | 12.99342 | 25.23416 | 24.57 | 42.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams) |
<a name="EfficientNet_and_ResNeXt101_wsl_series"></a>
### EfficientNet and ResNeXt101_wsl series
Accuracy and inference time metrics of EfficientNet and ResNeXt101_wsl series models are shown as follows. More detailed information can be refered to [EfficientNet and ResNeXt101_wsl series tutorial](./docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------|
| ResNeXt101_<br>32x8d_wsl | 0.8255 | 0.9674 | 18.52528 | 34.25319 | 29.14 | 78.44 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams) |
| ResNeXt101_<br>32x16d_wsl | 0.8424 | 0.9726 | 25.60395 | 71.88384 | 57.55 | 152.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x16d_wsl_pretrained.pdparams) |
| ResNeXt101_<br>32x32d_wsl | 0.8497 | 0.9759 | 54.87396 | 160.04337 | 115.17 | 303.11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams) |
| ResNeXt101_<br>32x48d_wsl | 0.8537 | 0.9769 | 99.01698256 | 315.91261 | 173.58 | 456.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams) |
| Fix_ResNeXt101_<br>32x48d_wsl | 0.8626 | 0.9797 | 160.0838242 | 595.99296 | 354.23 | 456.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Fix_ResNeXt101_32x48d_wsl_pretrained.pdparams) |
| EfficientNetB0 | 0.7738 | 0.9331 | 3.442 | 6.11476 | 0.72 | 5.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams) |
| EfficientNetB1 | 0.7915 | 0.9441 | 5.3322 | 9.41795 | 1.27 | 7.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams) |
| EfficientNetB2 | 0.7985 | 0.9474 | 6.29351 | 10.95702 | 1.85 | 8.81 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams) |
| EfficientNetB3 | 0.8115 | 0.9541 | 7.67749 | 16.53288 | 3.43 | 11.84 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams) |
| EfficientNetB4 | 0.8285 | 0.9623 | 12.15894 | 30.94567 | 8.29 | 18.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams) |
| EfficientNetB5 | 0.8362 | 0.9672 | 20.48571 | 61.60252 | 19.51 | 29.61 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams) |
| EfficientNetB6 | 0.8400 | 0.9688 | 32.62402 | - | 36.27 | 42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams) |
| EfficientNetB7 | 0.8430 | 0.9689 | 53.93823 | - | 72.35 | 64.92 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams) |
| EfficientNetB0_<br>small | 0.7580 | 0.9258 | 2.3076 | 4.71886 | 0.72 | 4.65 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams) |
<a name="ResNeSt_and_RegNet_series"></a>
### ResNeSt and RegNet series
Accuracy and inference time metrics of ResNeSt and RegNet series models are shown as follows. More detailed information can be refered to [ResNeSt and RegNet series tutorial](./docs/en/models/ResNeSt_RegNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
| ResNeSt50_<br>fast_1s1x64d | 0.8035 | 0.9528 | 3.45405 | 8.72680 | 8.68 | 26.3 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams) |
| ResNeSt50 | 0.8083 | 0.9542 | 6.69042 | 8.01664 | 10.78 | 27.5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams) |
| RegNetX_4GF | 0.785 | 0.9416 | 6.46478 | 11.19862 | 8 | 22.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams) |
<a name="ViT_and_DeiT"></a>
### ViT and DeiT series
Accuracy and inference time metrics of ViT and DeiT series models are shown as follows. More detailed information can be refered to [ViT and DeiT series tutorial](./docs/en/models/ViT_and_DeiT_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|
| ViT_small_<br/>patch16_224 | 0.7769 | 0.9342 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) |
| ViT_base_<br/>patch16_224 | 0.8195 | 0.9617 | - | - | | 86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) |
| ViT_base_<br/>patch16_384 | 0.8414 | 0.9717 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) |
| ViT_base_<br/>patch32_384 | 0.8176 | 0.9613 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) |
| ViT_large_<br/>patch16_224 | 0.8323 | 0.9650 | - | - | | 307 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) |
| ViT_large_<br/>patch16_384 | 0.8513 | 0.9736 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) |
| ViT_large_<br/>patch32_384 | 0.8153 | 0.9608 | - | - | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) |
| | | | | | | | |
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|
| DeiT_tiny_<br>patch16_224 | 0.718 | 0.910 | - | - | | 5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) |
| DeiT_small_<br>patch16_224 | 0.796 | 0.949 | - | - | | 22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>patch16_224 | 0.817 | 0.957 | - | - | | 86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>patch16_384 | 0.830 | 0.962 | - | - | | 87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) |
| DeiT_tiny_<br>distilled_patch16_224 | 0.741 | 0.918 | - | - | | 6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) |
| DeiT_small_<br>distilled_patch16_224 | 0.809 | 0.953 | - | - | | 22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>distilled_patch16_224 | 0.831 | 0.964 | - | - | | 87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) |
| DeiT_base_<br>distilled_patch16_384 | 0.851 | 0.973 | - | - | | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) |
| | | | | | | | |
<a name="RepVGG_series"></a>
### RepVGG
Accuracy and inference time metrics of RepVGG series models are shown as follows. More detailed information can be refered to [RepVGG series tutorial](./docs/en/models/RepVGG_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|
| RepVGG_A0 | 0.7131 | 0.9016 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams) |
| RepVGG_A1 | 0.7380 | 0.9146 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams) |
| RepVGG_A2 | 0.7571 | 0.9264 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams) |
| RepVGG_B0 | 0.7450 | 0.9213 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams) |
| RepVGG_B1 | 0.7773 | 0.9385 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams) |
| RepVGG_B2 | 0.7813 | 0.9410 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams) |
| RepVGG_B1g2 | 0.7732 | 0.9359 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams) |
| RepVGG_B1g4 | 0.7675 | 0.9335 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams) |
| RepVGG_B2g4 | 0.7881 | 0.9448 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams) |
| RepVGG_B3g4 | 0.7965 | 0.9485 | | | | | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams) |
<a name="MixNet"></a>
### MixNet
Accuracy and inference time metrics of MixNet series models are shown as follows. More detailed information can be refered to [MixNet series tutorial](./docs/en/models/MixNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(M) | Params(M) | Download Address |
| -------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| MixNet_S | 0.7628 | 0.9299 | | | 252.977 | 4.167 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams) |
| MixNet_M | 0.7767 | 0.9364 | | | 357.119 | 5.065 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams) |
| MixNet_L | 0.7860 | 0.9437 | | | 579.017 | 7.384 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams) |
<a name="ReXNet"></a>
### ReXNet
Accuracy and inference time metrics of ReXNet series models are shown as follows. More detailed information can be refered to [ReXNet series tutorial](./docs/en/models/ReXNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| ReXNet_1_0 | 0.7746 | 0.9370 | | | 0.415 | 4.838 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams) |
| ReXNet_1_3 | 0.7913 | 0.9464 | | | 0.683 | 7.611 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams) |
| ReXNet_1_5 | 0.8006 | 0.9512 | | | 0.900 | 9.791 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_pretrained.pdparams) |
| ReXNet_2_0 | 0.8122 | 0.9536 | | | 1.561 | 16.449 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams) |
| ReXNet_3_0 | 0.8209 | 0.9612 | | | 3.445 | 34.833 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams) |
<a name="SwinTransformer"></a>
### SwinTransformer
Accuracy and inference time metrics of SwinTransformer series models are shown as follows. More detailed information can be refered to [SwinTransformer series tutorial](./docs/en/models/SwinTransformer_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | | | 4.5 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) |
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | | | 8.7 | 50 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) |
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | | | 15.4 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) |
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | | | 47.1 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) |
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | | | 15.4 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) |
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | | | 47.1 | 88 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) |
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | | | 34.5 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) |
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | | | 103.9 | 197 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) |
[1]: Based on imagenet22k dataset pre-training, and then in imagenet1k dataset transfer learning.
<a name="Others"></a>
### Others
Accuracy and inference time metrics of AlexNet, SqueezeNet series, VGG series and DarkNet53 models are shown as follows. More detailed information can be refered to [Others](./docs/en/models/Others_en.md).
<a name="Introduction to Image Recognition Systems"></a>
<div align="center">
<img src="./docs/images/structure.png" width = "400" />
</div>
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address | Image recognition can be divided into three steps:
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------| - (1)Identify region proposal for target objects through a detection model;
| AlexNet | 0.567 | 0.792 | 1.44993 | 2.46696 | 1.370 | 61.090 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams) | - (2)Extract features for each region proposal;
| SqueezeNet1_0 | 0.596 | 0.817 | 0.96736 | 2.53221 | 1.550 | 1.240 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams) | - (3)Search features in the retrieval database and output results;
| SqueezeNet1_1 | 0.601 | 0.819 | 0.76032 | 1.877 | 0.690 | 1.230 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams) |
| VGG11 | 0.693 | 0.891 | 3.90412 | 9.51147 | 15.090 | 132.850 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG11_pretrained.pdparams) |
| VGG13 | 0.700 | 0.894 | 4.64684 | 12.61558 | 22.480 | 133.030 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG13_pretrained.pdparams) |
| VGG16 | 0.720 | 0.907 | 5.61769 | 16.40064 | 30.810 | 138.340 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG16_pretrained.pdparams) |
| VGG19 | 0.726 | 0.909 | 6.65221 | 20.4334 | 39.130 | 143.650 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/VGG19_pretrained.pdparams) |
| DarkNet53 | 0.780 | 0.941 | 4.10829 | 12.1714 | 18.580 | 41.600 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams) |
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
<a name="License"></a> <a name="License"></a>
## License
PaddleClas is released under the <a href="https://github.com/PaddlePaddle/PaddleClas/blob/master/LICENSE">Apache 2.0 license</a> ## License
PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>
<a name="Contribution"></a> <a name="Contribution"></a>
## Contribution ## Contribution
Contributions are highly welcomed and we would really appreciate your feedback!! Contributions are highly welcomed and we would really appreciate your feedback!!
- Thank [nblib](https://github.com/nblib) to fix bug of RandErasing. - Thank [nblib](https://github.com/nblib) to fix bug of RandErasing.
- Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas. - Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas.
- Thank [jm12138](https://github.com/jm12138) to add ViT, DeiT models and RepVGG models into PaddleClas. - Thank [jm12138](https://github.com/jm12138) to add ViT, DeiT models and RepVGG models into PaddleClas.
- Thank [FutureSI](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/76563) to parse and summarize the PaddleClas code.
Global: Global:
infer_imgs: "./dataset/product_demo_data_v1.0/query" infer_imgs: "./dataset/product_demo_data_v1.0/query/wangzai.jpg"
det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer" det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer"
rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer" rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer"
batch_size: 1 batch_size: 1
image_shape: [3, 640, 640] image_shape: [3, 640, 640]
threshold: 0.2 threshold: 0.2
max_det_results: 2 max_det_results: 1
labe_list: labe_list:
- foreground - foreground
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -18,15 +18,14 @@ sys.path.insert(0, ".") ...@@ -18,15 +18,14 @@ sys.path.insert(0, ".")
import time import time
from paddlehub.utils.log import logger
from paddlehub.module.module import moduleinfo, serving
import cv2
import numpy as np import numpy as np
import paddle.nn as nn import paddle.nn as nn
from paddlehub.module.module import moduleinfo, serving
from tools.infer.predict import Predictor from hubserving.clas.params import get_default_confg
from tools.infer.utils import b64_to_np, postprocess from python.predict_cls import ClsPredictor
from deploy.hubserving.clas.params import read_params from utils import config
from utils.encode_decode import b64_to_np
@moduleinfo( @moduleinfo(
...@@ -41,19 +40,24 @@ class ClasSystem(nn.Layer): ...@@ -41,19 +40,24 @@ class ClasSystem(nn.Layer):
""" """
initialize with the necessary elements initialize with the necessary elements
""" """
cfg = read_params() self._config = self._load_config(
use_gpu=use_gpu, enable_mkldnn=enable_mkldnn)
self.cls_predictor = ClsPredictor(self._config)
def _load_config(self, use_gpu=None, enable_mkldnn=None):
cfg = get_default_confg()
cfg = config.AttrDict(cfg)
config.create_attr_dict(cfg)
if use_gpu is not None: if use_gpu is not None:
cfg.use_gpu = use_gpu cfg.Global.use_gpu = use_gpu
if enable_mkldnn is not None: if enable_mkldnn is not None:
cfg.enable_mkldnn = enable_mkldnn cfg.Global.enable_mkldnn = enable_mkldnn
cfg.hubserving = True
cfg.enable_benchmark = False cfg.enable_benchmark = False
self.args = cfg if cfg.Global.use_gpu:
if cfg.use_gpu:
try: try:
_places = os.environ["CUDA_VISIBLE_DEVICES"] _places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0]) int(_places[0])
print("Use GPU, GPU Memery:{}".format(cfg.gpu_mem)) print("Use GPU, GPU Memery:{}".format(cfg.Global.gpu_mem))
print("CUDA_VISIBLE_DEVICES: ", _places) print("CUDA_VISIBLE_DEVICES: ", _places)
except: except:
raise RuntimeError( raise RuntimeError(
...@@ -62,24 +66,36 @@ class ClasSystem(nn.Layer): ...@@ -62,24 +66,36 @@ class ClasSystem(nn.Layer):
else: else:
print("Use CPU") print("Use CPU")
print("Enable MKL-DNN") if enable_mkldnn else None print("Enable MKL-DNN") if enable_mkldnn else None
self.predictor = Predictor(self.args) return cfg
def predict(self, batch_input_data, top_k=1): def predict(self, inputs):
assert isinstance( if not isinstance(inputs, list):
batch_input_data, raise Exception(
np.ndarray), "The input data is inconsistent with expectations." "The input data is inconsistent with expectations.")
starttime = time.time() starttime = time.time()
batch_outputs = self.predictor.predict(batch_input_data) outputs = self.cls_predictor.predict(inputs)
elapse = time.time() - starttime elapse = time.time() - starttime
batch_result_list = postprocess(batch_outputs, top_k) preds = self.cls_predictor.postprocess(outputs)
return {"prediction": batch_result_list, "elapse": elapse} return {"prediction": preds, "elapse": elapse}
@serving @serving
def serving_method(self, images, revert_params, **kwargs): def serving_method(self, images, revert_params):
""" """
Run as a service. Run as a service.
""" """
input_data = b64_to_np(images, revert_params) input_data = b64_to_np(images, revert_params)
results = self.predict(batch_input_data=input_data, **kwargs) results = self.predict(inputs=list(input_data))
return results return results
if __name__ == "__main__":
import cv2
import paddlehub as hub
module = hub.Module(name="clas_system")
img_path = "./hubserving/ILSVRC2012_val_00006666.JPEG"
img = cv2.imread(img_path)[:, :, ::-1]
img = cv2.resize(img, (224, 224)).transpose((2, 0, 1))
res = module.predict([img.astype(np.float32)])
print("The returned result of {}: {}".format(img_path, res))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -17,28 +17,24 @@ from __future__ import division ...@@ -17,28 +17,24 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
class Config(object): def get_default_confg():
pass return {
'Global': {
"inference_model_dir": "../inference/",
def read_params(): "batch_size": 1,
cfg = Config() 'use_gpu': False,
'use_fp16': False,
cfg.model_file = "./inference/cls_infer.pdmodel" 'enable_mkldnn': False,
cfg.params_file = "./inference/cls_infer.pdiparams" 'cpu_num_threads': 1,
cfg.batch_size = 1 'use_tensorrt': False,
cfg.use_gpu = False 'ir_optim': False,
cfg.enable_mkldnn = False "gpu_mem": 8000,
cfg.ir_optim = True 'enable_profile': False,
cfg.gpu_mem = 8000 "enable_benchmark": False
cfg.use_fp16 = False },
cfg.use_tensorrt = False 'PostProcess': {
cfg.cpu_num_threads = 10 'name': 'Topk',
cfg.enable_profile = False 'topk': 5,
'class_id_map_file': './utils/imagenet1k_label_list.txt'
# params for preprocess }
cfg.resize_short = 256 }
cfg.resize = 224 \ No newline at end of file
cfg.normalize = True
return cfg
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
hubserving服务部署配置服务包`clas`下包含3个必选文件,目录如下: hubserving服务部署配置服务包`clas`下包含3个必选文件,目录如下:
``` ```
deploy/hubserving/clas/ hubserving/clas/
└─ __init__.py 空文件,必选 └─ __init__.py 空文件,必选
└─ config.json 配置文件,可选,使用配置启动服务时作为参数传入 └─ config.json 配置文件,可选,使用配置启动服务时作为参数传入
└─ module.py 主模块,必选,包含服务的完整逻辑 └─ module.py 主模块,必选,包含服务的完整逻辑
...@@ -21,16 +21,16 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s ...@@ -21,16 +21,16 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s
### 2. 下载推理模型 ### 2. 下载推理模型
安装服务模块前,需要准备推理模型并放到正确路径,默认模型路径为: 安装服务模块前,需要准备推理模型并放到正确路径,默认模型路径为:
``` ```
分类推理模型结构文件:./inference/cls_infer.pdmodel 分类推理模型结构文件:PaddleClas/inference/inference.pdmodel
分类推理模型权重文件:./inference/cls_infer.pdiparams 分类推理模型权重文件:PaddleClas/inference/inference.pdiparams
``` ```
**注意** **注意**
* 模型路径可在`./PaddleClas/deploy/hubserving/clas/params.py`中查看和修改。 * 模型文件路径可在`PaddleClas/deploy/hubserving/clas/params.py`中查看和修改:
```python ```python
cfg.model_file = "./inference/cls_infer.pdmodel" "inference_model_dir": "../inference/"
cfg.params_file = "./inference/cls_infer.pdiparams"
``` ```
需要注意,模型文件(包括.pdmodel与.pdiparams)名称必须为`inference`
* 我们也提供了大量基于ImageNet-1k数据集的预训练模型,模型列表及下载地址详见[模型库概览](../../docs/zh_CN/models/models_intro.md),也可以使用自己训练转换好的模型。 * 我们也提供了大量基于ImageNet-1k数据集的预训练模型,模型列表及下载地址详见[模型库概览](../../docs/zh_CN/models/models_intro.md),也可以使用自己训练转换好的模型。
### 3. 安装服务模块 ### 3. 安装服务模块
...@@ -38,14 +38,17 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s ...@@ -38,14 +38,17 @@ pip3 install paddlehub==2.0.0b1 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/s
* 在Linux环境下,安装示例如下: * 在Linux环境下,安装示例如下:
```shell ```shell
# 安装服务模块: cd PaddleClas/deploy
hub install deploy/hubserving/clas/ # 安装服务模块:
hub install hubserving/clas/
``` ```
* 在Windows环境下(文件夹的分隔符为`\`),安装示例如下: * 在Windows环境下(文件夹的分隔符为`\`),安装示例如下:
```shell ```shell
cd PaddleClas\deploy
# 安装服务模块: # 安装服务模块:
hub install deploy\hubserving\clas\ hub install hubserving\clas\
``` ```
### 4. 启动服务 ### 4. 启动服务
...@@ -59,7 +62,6 @@ $ hub serving start --modules Module1==Version1 \ ...@@ -59,7 +62,6 @@ $ hub serving start --modules Module1==Version1 \
``` ```
**参数:** **参数:**
|参数|用途| |参数|用途|
|-|-| |-|-|
|--modules/-m| [**必选**] PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*| |--modules/-m| [**必选**] PaddleHub Serving预安装模型,以多个Module==Version键值对的形式列出<br>*`当不指定Version时,默认选择最新版本`*|
...@@ -108,30 +110,32 @@ $ hub serving start --modules Module1==Version1 \ ...@@ -108,30 +110,32 @@ $ hub serving start --modules Module1==Version1 \
如,使用GPU 3号卡启动串联服务: 如,使用GPU 3号卡启动串联服务:
```shell ```shell
cd PaddleClas/deploy
export CUDA_VISIBLE_DEVICES=3 export CUDA_VISIBLE_DEVICES=3
hub serving start -c deploy/hubserving/clas/config.json hub serving start -c hubserving/clas/config.json
``` ```
## 发送预测请求 ## 发送预测请求
配置好服务端,可使用以下命令发送预测请求,获取预测结果: 配置好服务端,可使用以下命令发送预测请求,获取预测结果:
```python tools/test_hubserving.py server_url image_path``` ```shell
cd PaddleClas/deploy
python hubserving/test_hubserving.py server_url image_path
```
需要给脚本传递2个必须参数: 需要给脚本传递2个必须参数:
- **server_url**:服务地址,格式为 - **server_url**:服务地址,格式为
`http://[ip_address]:[port]/predict/[module_name]` `http://[ip_address]:[port]/predict/[module_name]`
- **image_path**:测试图像路径,可以是单张图片路径,也可以是图像集合目录路径。 - **image_path**:测试图像路径,可以是单张图片路径,也可以是图像集合目录路径。
- **top_k**:[**可选**] 返回前 `top_k` 个 `score` ,默认为 `1`。
- **batch_size**:[**可选**] 以`batch_size`大小为单位进行预测,默认为`1`。 - **batch_size**:[**可选**] 以`batch_size`大小为单位进行预测,默认为`1`。
- **resize_short**:[**可选**] 将图像等比例缩放到最短边为`resize_short`,默认为`256`。
- **resize**:[**可选**] 将图像resize到`resize * resize`尺寸,默认为`224`。
- **normalize**:[**可选**] 是否对图像进行normalize处理,默认为`True`。
**注意**:如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸。需要指定`--resize_short=384 --resize=384`。 **注意**:如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸。需要指定`--resize_short=384 --resize=384`。
访问示例: 访问示例:
```python tools/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./deploy/hubserving/ILSVRC2012_val_00006666.JPEG --top_k 5``` ```shell
python hubserving/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./hubserving/ILSVRC2012_val_00006666.JPEG --batch_size 8
```
### 返回结果格式说明 ### 返回结果格式说明
返回结果为列表(list),包含top-k个分类结果,以及对应的得分,还有此图片预测耗时,具体如下: 返回结果为列表(list),包含top-k个分类结果,以及对应的得分,还有此图片预测耗时,具体如下:
...@@ -143,7 +147,7 @@ list: 返回结果 ...@@ -143,7 +147,7 @@ list: 返回结果
└─ float: 该图分类耗时,单位秒 └─ float: 该图分类耗时,单位秒
``` ```
**说明:** 如果需要增加、删除、修改返回字段,可在相应模块的`module.py`文件中进行修改,完整流程参考下一节自定义修改服务模块。 **说明:** 如果需要增加、删除、修改返回字段,可对相应模块进行修改,完整流程参考下一节自定义修改服务模块。
## 自定义修改服务模块 ## 自定义修改服务模块
如果需要修改服务逻辑,你一般需要操作以下步骤: 如果需要修改服务逻辑,你一般需要操作以下步骤:
...@@ -151,16 +155,30 @@ list: 返回结果 ...@@ -151,16 +155,30 @@ list: 返回结果
- 1、 停止服务 - 1、 停止服务
```hub serving stop --port/-p XXXX``` ```hub serving stop --port/-p XXXX```
- 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。 - 2、 到相应的`module.py`和`params.py`等文件中根据实际需求修改代码。`module.py`修改后需要重新安装(`hub install hubserving/clas/`)并部署。在进行部署前,可通过`python hubserving/clas/module.py`测试已安装服务模块。
例如,例如需要替换部署服务所用模型,则需要到`params.py`中修改模型路径参数`cfg.model_file`和`cfg.params_file`。
修改并安装(`hub install deploy/hubserving/clas/`)完成后,在进行部署前,可通过`python deploy/hubserving/clas/test.py`测试已安装服务模块。
- 3、 卸载旧服务包 - 3、 卸载旧服务包
```hub uninstall clas_system``` ```hub uninstall clas_system```
- 4、 安装修改后的新服务包 - 4、 安装修改后的新服务包
```hub install deploy/hubserving/clas/``` ```hub install hubserving/clas/```
- 5、重新启动服务 - 5、重新启动服务
```hub serving start -m clas_system``` ```hub serving start -m clas_system```
**注意**:
常用参数可在[params.py](./clas/params.py)中修改:
* 更换模型,需要修改模型文件路径参数:
```python
"inference_model_dir":
```
* 更改后处理时返回的`top-k`结果数量:
```python
'topk':
```
* 更改后处理时的lable与class id对应映射文件:
```python
'class_id_map_file':
```
为了避免不必要的延时以及能够以batch_size进行预测,数据预处理逻辑(包括resize、crop等操作)在客户端完成,因此需要在[test_hubserving.py](./test_hubserving.py#L35-L52)中修改。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -15,29 +15,54 @@ ...@@ -15,29 +15,54 @@
import os import os
import sys import sys
__dir__ = os.path.dirname(os.path.abspath(__file__)) __dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__) sys.path.append(os.path.abspath(os.path.join(__dir__, '../')))
sys.path.append(os.path.abspath(os.path.join(__dir__, '..')))
from tools.infer.utils import parse_args, get_image_list, preprocess, np_to_b64
from ppcls.utils import logger
import numpy as np
import cv2
import time import time
import requests import requests
import json import json
import base64 import base64
import argparse
import numpy as np
import cv2
from utils import logger
from utils.get_image_list import get_image_list
from utils import config
from utils.encode_decode import np_to_b64
from python.preprocess import create_operators
preprocess_config = [{
'ResizeImage': {
'resize_short': 256
}
}, {
'CropImage': {
'size': 224
}
}, {
'NormalizeImage': {
'scale': 0.00392157,
'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225],
'order': ''
}
}, {
'ToCHWImage': None
}]
def main(args): def main(args):
image_path_list = get_image_list(args.image_file) image_path_list = get_image_list(args.image_file)
headers = {"Content-type": "application/json"} headers = {"Content-type": "application/json"}
preprocess_ops = create_operators(preprocess_config)
cnt = 0 cnt = 0
predict_time = 0 predict_time = 0
all_score = 0.0 all_score = 0.0
start_time = time.time() start_time = time.time()
batch_input_list = [] img_data_list = []
img_name_list = [] img_name_list = []
cnt = 0 cnt = 0
for idx, img_path in enumerate(image_path_list): for idx, img_path in enumerate(image_path_list):
...@@ -48,22 +73,23 @@ def main(args): ...@@ -48,22 +73,23 @@ def main(args):
format(img_path)) format(img_path))
continue continue
else: else:
img = img[:, :, ::-1] for ops in preprocess_ops:
data = preprocess(img, args) img = ops(img)
batch_input_list.append(data) img = np.array(img)
img_data_list.append(img)
img_name = img_path.split('/')[-1] img_name = img_path.split('/')[-1]
img_name_list.append(img_name) img_name_list.append(img_name)
cnt += 1 cnt += 1
if cnt % args.batch_size == 0 or (idx + 1) == len(image_path_list): if cnt % args.batch_size == 0 or (idx + 1) == len(image_path_list):
batch_input = np.array(batch_input_list) inputs = np.array(img_data_list)
b64str, revert_shape = np_to_b64(batch_input) b64str, revert_shape = np_to_b64(inputs)
data = { data = {
"images": b64str, "images": b64str,
"revert_params": { "revert_params": {
"shape": revert_shape, "shape": revert_shape,
"dtype": str(batch_input.dtype) "dtype": str(inputs.dtype)
}, }
"top_k": args.top_k
} }
try: try:
r = requests.post( r = requests.post(
...@@ -80,24 +106,25 @@ def main(args): ...@@ -80,24 +106,25 @@ def main(args):
continue continue
else: else:
results = r.json()["results"] results = r.json()["results"]
batch_result_list = results["prediction"] preds = results["prediction"]
elapse = results["elapse"] elapse = results["elapse"]
cnt += len(batch_result_list) cnt += len(preds)
predict_time += elapse predict_time += elapse
for number, result_list in enumerate(batch_result_list): for number, result_list in enumerate(preds):
all_score += result_list["scores"][0] all_score += result_list["scores"][0]
result_str = "" result_str = ""
for i in range(len(result_list["clas_ids"])): for i in range(len(result_list["class_ids"])):
result_str += "{}: {:.2f}\t".format( result_str += "{}: {:.2f}\t".format(
result_list["clas_ids"][i], result_list["class_ids"][i],
result_list["scores"][i]) result_list["scores"][i])
logger.info("File:{}, The top-{} result(s): {}".format(
img_name_list[number], args.top_k, result_str)) logger.info("File:{}, The result(s): {}".format(
img_name_list[number], result_str))
finally: finally:
batch_input_list = [] img_data_list = []
img_name_list = [] img_name_list = []
total_time = time.time() - start_time total_time = time.time() - start_time
...@@ -109,5 +136,10 @@ def main(args): ...@@ -109,5 +136,10 @@ def main(args):
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() parser = argparse.ArgumentParser()
parser.add_argument("--server_url", type=str)
parser.add_argument("--image_file", type=str)
parser.add_argument("--batch_size", type=int, default=1)
args = parser.parse_args()
main(args) main(args)
...@@ -24,16 +24,22 @@ from utils import logger ...@@ -24,16 +24,22 @@ from utils import logger
from utils import config from utils import config
from utils.predictor import Predictor from utils.predictor import Predictor
from utils.get_image_list import get_image_list from utils.get_image_list import get_image_list
from preprocess import create_operators from python.preprocess import create_operators
from postprocess import build_postprocess from python.postprocess import build_postprocess
class ClsPredictor(Predictor): class ClsPredictor(Predictor):
def __init__(self, config): def __init__(self, config):
super().__init__(config["Global"]) super().__init__(config["Global"])
self.preprocess_ops = create_operators(config["PreProcess"][
"transform_ops"]) self.preprocess_ops = []
self.postprocess = build_postprocess(config["PostProcess"]) self.postprocess = None
if "PreProcess" in config:
if "transform_ops" in config["PreProcess"]:
self.preprocess_ops = create_operators(config["PreProcess"][
"transform_ops"])
if "PostProcess" in config:
self.postprocess = build_postprocess(config["PostProcess"])
def predict(self, images): def predict(self, images):
input_names = self.paddle_predictor.get_input_names() input_names = self.paddle_predictor.get_input_names()
......
...@@ -26,7 +26,7 @@ import cv2 ...@@ -26,7 +26,7 @@ import cv2
import numpy as np import numpy as np
import importlib import importlib
from det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
def create_operators(params): def create_operators(params):
......
...@@ -2,3 +2,4 @@ from . import logger ...@@ -2,3 +2,4 @@ from . import logger
from . import config from . import config
from . import get_image_list from . import get_image_list
from . import predictor from . import predictor
from . import encode_decode
\ No newline at end of file
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve. # copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -16,7 +16,9 @@ import os ...@@ -16,7 +16,9 @@ import os
import copy import copy
import argparse import argparse
import yaml import yaml
from utils import logger from utils import logger
__all__ = ['get_config'] __all__ = ['get_config']
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -12,24 +12,20 @@ ...@@ -12,24 +12,20 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os import base64
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.abspath(os.path.join(__dir__, '../../../')))
import argparse
import numpy as np import numpy as np
import cv2
import paddlehub as hub
from tools.infer.utils import preprocess
args = argparse.Namespace(resize_short=256, resize=224, normalize=True)
img_path_list = ["./deploy/hubserving/ILSVRC2012_val_00006666.JPEG", ] def np_to_b64(images):
img_str = base64.b64encode(images).decode('utf8')
return img_str, images.shape
module = hub.Module(name="clas_system") def b64_to_np(b64str, revert_params):
for i, img_path in enumerate(img_path_list): shape = revert_params["shape"]
img = cv2.imread(img_path)[:, :, ::-1] dtype = revert_params["dtype"]
img = preprocess(img, args) dtype = getattr(np, dtype) if isinstance(str, type(dtype)) else dtype
batch_input_data = np.expand_dims(img, axis=0) data = base64.b64decode(b64str.encode('utf8'))
res = module.predict(batch_input_data) data = np.fromstring(data, dtype).reshape(shape)
print("The returned result of {}: {}".format(img_path, res)) return data
\ No newline at end of file
0 tench, Tinca tinca
1 goldfish, Carassius auratus
2 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
3 tiger shark, Galeocerdo cuvieri
4 hammerhead, hammerhead shark
5 electric ray, crampfish, numbfish, torpedo
6 stingray
7 cock
8 hen
9 ostrich, Struthio camelus
10 brambling, Fringilla montifringilla
11 goldfinch, Carduelis carduelis
12 house finch, linnet, Carpodacus mexicanus
13 junco, snowbird
14 indigo bunting, indigo finch, indigo bird, Passerina cyanea
15 robin, American robin, Turdus migratorius
16 bulbul
17 jay
18 magpie
19 chickadee
20 water ouzel, dipper
21 kite
22 bald eagle, American eagle, Haliaeetus leucocephalus
23 vulture
24 great grey owl, great gray owl, Strix nebulosa
25 European fire salamander, Salamandra salamandra
26 common newt, Triturus vulgaris
27 eft
28 spotted salamander, Ambystoma maculatum
29 axolotl, mud puppy, Ambystoma mexicanum
30 bullfrog, Rana catesbeiana
31 tree frog, tree-frog
32 tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui
33 loggerhead, loggerhead turtle, Caretta caretta
34 leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea
35 mud turtle
36 terrapin
37 box turtle, box tortoise
38 banded gecko
39 common iguana, iguana, Iguana iguana
40 American chameleon, anole, Anolis carolinensis
41 whiptail, whiptail lizard
42 agama
43 frilled lizard, Chlamydosaurus kingi
44 alligator lizard
45 Gila monster, Heloderma suspectum
46 green lizard, Lacerta viridis
47 African chameleon, Chamaeleo chamaeleon
48 Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis
49 African crocodile, Nile crocodile, Crocodylus niloticus
50 American alligator, Alligator mississipiensis
51 triceratops
52 thunder snake, worm snake, Carphophis amoenus
53 ringneck snake, ring-necked snake, ring snake
54 hognose snake, puff adder, sand viper
55 green snake, grass snake
56 king snake, kingsnake
57 garter snake, grass snake
58 water snake
59 vine snake
60 night snake, Hypsiglena torquata
61 boa constrictor, Constrictor constrictor
62 rock python, rock snake, Python sebae
63 Indian cobra, Naja naja
64 green mamba
65 sea snake
66 horned viper, cerastes, sand viper, horned asp, Cerastes cornutus
67 diamondback, diamondback rattlesnake, Crotalus adamanteus
68 sidewinder, horned rattlesnake, Crotalus cerastes
69 trilobite
70 harvestman, daddy longlegs, Phalangium opilio
71 scorpion
72 black and gold garden spider, Argiope aurantia
73 barn spider, Araneus cavaticus
74 garden spider, Aranea diademata
75 black widow, Latrodectus mactans
76 tarantula
77 wolf spider, hunting spider
78 tick
79 centipede
80 black grouse
81 ptarmigan
82 ruffed grouse, partridge, Bonasa umbellus
83 prairie chicken, prairie grouse, prairie fowl
84 peacock
85 quail
86 partridge
87 African grey, African gray, Psittacus erithacus
88 macaw
89 sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita
90 lorikeet
91 coucal
92 bee eater
93 hornbill
94 hummingbird
95 jacamar
96 toucan
97 drake
98 red-breasted merganser, Mergus serrator
99 goose
100 black swan, Cygnus atratus
101 tusker
102 echidna, spiny anteater, anteater
103 platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus
104 wallaby, brush kangaroo
105 koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus
106 wombat
107 jellyfish
108 sea anemone, anemone
109 brain coral
110 flatworm, platyhelminth
111 nematode, nematode worm, roundworm
112 conch
113 snail
114 slug
115 sea slug, nudibranch
116 chiton, coat-of-mail shell, sea cradle, polyplacophore
117 chambered nautilus, pearly nautilus, nautilus
118 Dungeness crab, Cancer magister
119 rock crab, Cancer irroratus
120 fiddler crab
121 king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica
122 American lobster, Northern lobster, Maine lobster, Homarus americanus
123 spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish
124 crayfish, crawfish, crawdad, crawdaddy
125 hermit crab
126 isopod
127 white stork, Ciconia ciconia
128 black stork, Ciconia nigra
129 spoonbill
130 flamingo
131 little blue heron, Egretta caerulea
132 American egret, great white heron, Egretta albus
133 bittern
134 crane
135 limpkin, Aramus pictus
136 European gallinule, Porphyrio porphyrio
137 American coot, marsh hen, mud hen, water hen, Fulica americana
138 bustard
139 ruddy turnstone, Arenaria interpres
140 red-backed sandpiper, dunlin, Erolia alpina
141 redshank, Tringa totanus
142 dowitcher
143 oystercatcher, oyster catcher
144 pelican
145 king penguin, Aptenodytes patagonica
146 albatross, mollymawk
147 grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus
148 killer whale, killer, orca, grampus, sea wolf, Orcinus orca
149 dugong, Dugong dugon
150 sea lion
151 Chihuahua
152 Japanese spaniel
153 Maltese dog, Maltese terrier, Maltese
154 Pekinese, Pekingese, Peke
155 Shih-Tzu
156 Blenheim spaniel
157 papillon
158 toy terrier
159 Rhodesian ridgeback
160 Afghan hound, Afghan
161 basset, basset hound
162 beagle
163 bloodhound, sleuthhound
164 bluetick
165 black-and-tan coonhound
166 Walker hound, Walker foxhound
167 English foxhound
168 redbone
169 borzoi, Russian wolfhound
170 Irish wolfhound
171 Italian greyhound
172 whippet
173 Ibizan hound, Ibizan Podenco
174 Norwegian elkhound, elkhound
175 otterhound, otter hound
176 Saluki, gazelle hound
177 Scottish deerhound, deerhound
178 Weimaraner
179 Staffordshire bullterrier, Staffordshire bull terrier
180 American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier
181 Bedlington terrier
182 Border terrier
183 Kerry blue terrier
184 Irish terrier
185 Norfolk terrier
186 Norwich terrier
187 Yorkshire terrier
188 wire-haired fox terrier
189 Lakeland terrier
190 Sealyham terrier, Sealyham
191 Airedale, Airedale terrier
192 cairn, cairn terrier
193 Australian terrier
194 Dandie Dinmont, Dandie Dinmont terrier
195 Boston bull, Boston terrier
196 miniature schnauzer
197 giant schnauzer
198 standard schnauzer
199 Scotch terrier, Scottish terrier, Scottie
200 Tibetan terrier, chrysanthemum dog
201 silky terrier, Sydney silky
202 soft-coated wheaten terrier
203 West Highland white terrier
204 Lhasa, Lhasa apso
205 flat-coated retriever
206 curly-coated retriever
207 golden retriever
208 Labrador retriever
209 Chesapeake Bay retriever
210 German short-haired pointer
211 vizsla, Hungarian pointer
212 English setter
213 Irish setter, red setter
214 Gordon setter
215 Brittany spaniel
216 clumber, clumber spaniel
217 English springer, English springer spaniel
218 Welsh springer spaniel
219 cocker spaniel, English cocker spaniel, cocker
220 Sussex spaniel
221 Irish water spaniel
222 kuvasz
223 schipperke
224 groenendael
225 malinois
226 briard
227 kelpie
228 komondor
229 Old English sheepdog, bobtail
230 Shetland sheepdog, Shetland sheep dog, Shetland
231 collie
232 Border collie
233 Bouvier des Flandres, Bouviers des Flandres
234 Rottweiler
235 German shepherd, German shepherd dog, German police dog, alsatian
236 Doberman, Doberman pinscher
237 miniature pinscher
238 Greater Swiss Mountain dog
239 Bernese mountain dog
240 Appenzeller
241 EntleBucher
242 boxer
243 bull mastiff
244 Tibetan mastiff
245 French bulldog
246 Great Dane
247 Saint Bernard, St Bernard
248 Eskimo dog, husky
249 malamute, malemute, Alaskan malamute
250 Siberian husky
251 dalmatian, coach dog, carriage dog
252 affenpinscher, monkey pinscher, monkey dog
253 basenji
254 pug, pug-dog
255 Leonberg
256 Newfoundland, Newfoundland dog
257 Great Pyrenees
258 Samoyed, Samoyede
259 Pomeranian
260 chow, chow chow
261 keeshond
262 Brabancon griffon
263 Pembroke, Pembroke Welsh corgi
264 Cardigan, Cardigan Welsh corgi
265 toy poodle
266 miniature poodle
267 standard poodle
268 Mexican hairless
269 timber wolf, grey wolf, gray wolf, Canis lupus
270 white wolf, Arctic wolf, Canis lupus tundrarum
271 red wolf, maned wolf, Canis rufus, Canis niger
272 coyote, prairie wolf, brush wolf, Canis latrans
273 dingo, warrigal, warragal, Canis dingo
274 dhole, Cuon alpinus
275 African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus
276 hyena, hyaena
277 red fox, Vulpes vulpes
278 kit fox, Vulpes macrotis
279 Arctic fox, white fox, Alopex lagopus
280 grey fox, gray fox, Urocyon cinereoargenteus
281 tabby, tabby cat
282 tiger cat
283 Persian cat
284 Siamese cat, Siamese
285 Egyptian cat
286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
287 lynx, catamount
288 leopard, Panthera pardus
289 snow leopard, ounce, Panthera uncia
290 jaguar, panther, Panthera onca, Felis onca
291 lion, king of beasts, Panthera leo
292 tiger, Panthera tigris
293 cheetah, chetah, Acinonyx jubatus
294 brown bear, bruin, Ursus arctos
295 American black bear, black bear, Ursus americanus, Euarctos americanus
296 ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus
297 sloth bear, Melursus ursinus, Ursus ursinus
298 mongoose
299 meerkat, mierkat
300 tiger beetle
301 ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle
302 ground beetle, carabid beetle
303 long-horned beetle, longicorn, longicorn beetle
304 leaf beetle, chrysomelid
305 dung beetle
306 rhinoceros beetle
307 weevil
308 fly
309 bee
310 ant, emmet, pismire
311 grasshopper, hopper
312 cricket
313 walking stick, walkingstick, stick insect
314 cockroach, roach
315 mantis, mantid
316 cicada, cicala
317 leafhopper
318 lacewing, lacewing fly
319 dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk
320 damselfly
321 admiral
322 ringlet, ringlet butterfly
323 monarch, monarch butterfly, milkweed butterfly, Danaus plexippus
324 cabbage butterfly
325 sulphur butterfly, sulfur butterfly
326 lycaenid, lycaenid butterfly
327 starfish, sea star
328 sea urchin
329 sea cucumber, holothurian
330 wood rabbit, cottontail, cottontail rabbit
331 hare
332 Angora, Angora rabbit
333 hamster
334 porcupine, hedgehog
335 fox squirrel, eastern fox squirrel, Sciurus niger
336 marmot
337 beaver
338 guinea pig, Cavia cobaya
339 sorrel
340 zebra
341 hog, pig, grunter, squealer, Sus scrofa
342 wild boar, boar, Sus scrofa
343 warthog
344 hippopotamus, hippo, river horse, Hippopotamus amphibius
345 ox
346 water buffalo, water ox, Asiatic buffalo, Bubalus bubalis
347 bison
348 ram, tup
349 bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis
350 ibex, Capra ibex
351 hartebeest
352 impala, Aepyceros melampus
353 gazelle
354 Arabian camel, dromedary, Camelus dromedarius
355 llama
356 weasel
357 mink
358 polecat, fitch, foulmart, foumart, Mustela putorius
359 black-footed ferret, ferret, Mustela nigripes
360 otter
361 skunk, polecat, wood pussy
362 badger
363 armadillo
364 three-toed sloth, ai, Bradypus tridactylus
365 orangutan, orang, orangutang, Pongo pygmaeus
366 gorilla, Gorilla gorilla
367 chimpanzee, chimp, Pan troglodytes
368 gibbon, Hylobates lar
369 siamang, Hylobates syndactylus, Symphalangus syndactylus
370 guenon, guenon monkey
371 patas, hussar monkey, Erythrocebus patas
372 baboon
373 macaque
374 langur
375 colobus, colobus monkey
376 proboscis monkey, Nasalis larvatus
377 marmoset
378 capuchin, ringtail, Cebus capucinus
379 howler monkey, howler
380 titi, titi monkey
381 spider monkey, Ateles geoffroyi
382 squirrel monkey, Saimiri sciureus
383 Madagascar cat, ring-tailed lemur, Lemur catta
384 indri, indris, Indri indri, Indri brevicaudatus
385 Indian elephant, Elephas maximus
386 African elephant, Loxodonta africana
387 lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens
388 giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
389 barracouta, snoek
390 eel
391 coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch
392 rock beauty, Holocanthus tricolor
393 anemone fish
394 sturgeon
395 gar, garfish, garpike, billfish, Lepisosteus osseus
396 lionfish
397 puffer, pufferfish, blowfish, globefish
398 abacus
399 abaya
400 academic gown, academic robe, judge's robe
401 accordion, piano accordion, squeeze box
402 acoustic guitar
403 aircraft carrier, carrier, flattop, attack aircraft carrier
404 airliner
405 airship, dirigible
406 altar
407 ambulance
408 amphibian, amphibious vehicle
409 analog clock
410 apiary, bee house
411 apron
412 ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin
413 assault rifle, assault gun
414 backpack, back pack, knapsack, packsack, rucksack, haversack
415 bakery, bakeshop, bakehouse
416 balance beam, beam
417 balloon
418 ballpoint, ballpoint pen, ballpen, Biro
419 Band Aid
420 banjo
421 bannister, banister, balustrade, balusters, handrail
422 barbell
423 barber chair
424 barbershop
425 barn
426 barometer
427 barrel, cask
428 barrow, garden cart, lawn cart, wheelbarrow
429 baseball
430 basketball
431 bassinet
432 bassoon
433 bathing cap, swimming cap
434 bath towel
435 bathtub, bathing tub, bath, tub
436 beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon
437 beacon, lighthouse, beacon light, pharos
438 beaker
439 bearskin, busby, shako
440 beer bottle
441 beer glass
442 bell cote, bell cot
443 bib
444 bicycle-built-for-two, tandem bicycle, tandem
445 bikini, two-piece
446 binder, ring-binder
447 binoculars, field glasses, opera glasses
448 birdhouse
449 boathouse
450 bobsled, bobsleigh, bob
451 bolo tie, bolo, bola tie, bola
452 bonnet, poke bonnet
453 bookcase
454 bookshop, bookstore, bookstall
455 bottlecap
456 bow
457 bow tie, bow-tie, bowtie
458 brass, memorial tablet, plaque
459 brassiere, bra, bandeau
460 breakwater, groin, groyne, mole, bulwark, seawall, jetty
461 breastplate, aegis, egis
462 broom
463 bucket, pail
464 buckle
465 bulletproof vest
466 bullet train, bullet
467 butcher shop, meat market
468 cab, hack, taxi, taxicab
469 caldron, cauldron
470 candle, taper, wax light
471 cannon
472 canoe
473 can opener, tin opener
474 cardigan
475 car mirror
476 carousel, carrousel, merry-go-round, roundabout, whirligig
477 carpenter's kit, tool kit
478 carton
479 car wheel
480 cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM
481 cassette
482 cassette player
483 castle
484 catamaran
485 CD player
486 cello, violoncello
487 cellular telephone, cellular phone, cellphone, cell, mobile phone
488 chain
489 chainlink fence
490 chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour
491 chain saw, chainsaw
492 chest
493 chiffonier, commode
494 chime, bell, gong
495 china cabinet, china closet
496 Christmas stocking
497 church, church building
498 cinema, movie theater, movie theatre, movie house, picture palace
499 cleaver, meat cleaver, chopper
500 cliff dwelling
501 cloak
502 clog, geta, patten, sabot
503 cocktail shaker
504 coffee mug
505 coffeepot
506 coil, spiral, volute, whorl, helix
507 combination lock
508 computer keyboard, keypad
509 confectionery, confectionary, candy store
510 container ship, containership, container vessel
511 convertible
512 corkscrew, bottle screw
513 cornet, horn, trumpet, trump
514 cowboy boot
515 cowboy hat, ten-gallon hat
516 cradle
517 crane
518 crash helmet
519 crate
520 crib, cot
521 Crock Pot
522 croquet ball
523 crutch
524 cuirass
525 dam, dike, dyke
526 desk
527 desktop computer
528 dial telephone, dial phone
529 diaper, nappy, napkin
530 digital clock
531 digital watch
532 dining table, board
533 dishrag, dishcloth
534 dishwasher, dish washer, dishwashing machine
535 disk brake, disc brake
536 dock, dockage, docking facility
537 dogsled, dog sled, dog sleigh
538 dome
539 doormat, welcome mat
540 drilling platform, offshore rig
541 drum, membranophone, tympan
542 drumstick
543 dumbbell
544 Dutch oven
545 electric fan, blower
546 electric guitar
547 electric locomotive
548 entertainment center
549 envelope
550 espresso maker
551 face powder
552 feather boa, boa
553 file, file cabinet, filing cabinet
554 fireboat
555 fire engine, fire truck
556 fire screen, fireguard
557 flagpole, flagstaff
558 flute, transverse flute
559 folding chair
560 football helmet
561 forklift
562 fountain
563 fountain pen
564 four-poster
565 freight car
566 French horn, horn
567 frying pan, frypan, skillet
568 fur coat
569 garbage truck, dustcart
570 gasmask, respirator, gas helmet
571 gas pump, gasoline pump, petrol pump, island dispenser
572 goblet
573 go-kart
574 golf ball
575 golfcart, golf cart
576 gondola
577 gong, tam-tam
578 gown
579 grand piano, grand
580 greenhouse, nursery, glasshouse
581 grille, radiator grille
582 grocery store, grocery, food market, market
583 guillotine
584 hair slide
585 hair spray
586 half track
587 hammer
588 hamper
589 hand blower, blow dryer, blow drier, hair dryer, hair drier
590 hand-held computer, hand-held microcomputer
591 handkerchief, hankie, hanky, hankey
592 hard disc, hard disk, fixed disk
593 harmonica, mouth organ, harp, mouth harp
594 harp
595 harvester, reaper
596 hatchet
597 holster
598 home theater, home theatre
599 honeycomb
600 hook, claw
601 hoopskirt, crinoline
602 horizontal bar, high bar
603 horse cart, horse-cart
604 hourglass
605 iPod
606 iron, smoothing iron
607 jack-o'-lantern
608 jean, blue jean, denim
609 jeep, landrover
610 jersey, T-shirt, tee shirt
611 jigsaw puzzle
612 jinrikisha, ricksha, rickshaw
613 joystick
614 kimono
615 knee pad
616 knot
617 lab coat, laboratory coat
618 ladle
619 lampshade, lamp shade
620 laptop, laptop computer
621 lawn mower, mower
622 lens cap, lens cover
623 letter opener, paper knife, paperknife
624 library
625 lifeboat
626 lighter, light, igniter, ignitor
627 limousine, limo
628 liner, ocean liner
629 lipstick, lip rouge
630 Loafer
631 lotion
632 loudspeaker, speaker, speaker unit, loudspeaker system, speaker system
633 loupe, jeweler's loupe
634 lumbermill, sawmill
635 magnetic compass
636 mailbag, postbag
637 mailbox, letter box
638 maillot
639 maillot, tank suit
640 manhole cover
641 maraca
642 marimba, xylophone
643 mask
644 matchstick
645 maypole
646 maze, labyrinth
647 measuring cup
648 medicine chest, medicine cabinet
649 megalith, megalithic structure
650 microphone, mike
651 microwave, microwave oven
652 military uniform
653 milk can
654 minibus
655 miniskirt, mini
656 minivan
657 missile
658 mitten
659 mixing bowl
660 mobile home, manufactured home
661 Model T
662 modem
663 monastery
664 monitor
665 moped
666 mortar
667 mortarboard
668 mosque
669 mosquito net
670 motor scooter, scooter
671 mountain bike, all-terrain bike, off-roader
672 mountain tent
673 mouse, computer mouse
674 mousetrap
675 moving van
676 muzzle
677 nail
678 neck brace
679 necklace
680 nipple
681 notebook, notebook computer
682 obelisk
683 oboe, hautboy, hautbois
684 ocarina, sweet potato
685 odometer, hodometer, mileometer, milometer
686 oil filter
687 organ, pipe organ
688 oscilloscope, scope, cathode-ray oscilloscope, CRO
689 overskirt
690 oxcart
691 oxygen mask
692 packet
693 paddle, boat paddle
694 paddlewheel, paddle wheel
695 padlock
696 paintbrush
697 pajama, pyjama, pj's, jammies
698 palace
699 panpipe, pandean pipe, syrinx
700 paper towel
701 parachute, chute
702 parallel bars, bars
703 park bench
704 parking meter
705 passenger car, coach, carriage
706 patio, terrace
707 pay-phone, pay-station
708 pedestal, plinth, footstall
709 pencil box, pencil case
710 pencil sharpener
711 perfume, essence
712 Petri dish
713 photocopier
714 pick, plectrum, plectron
715 pickelhaube
716 picket fence, paling
717 pickup, pickup truck
718 pier
719 piggy bank, penny bank
720 pill bottle
721 pillow
722 ping-pong ball
723 pinwheel
724 pirate, pirate ship
725 pitcher, ewer
726 plane, carpenter's plane, woodworking plane
727 planetarium
728 plastic bag
729 plate rack
730 plow, plough
731 plunger, plumber's helper
732 Polaroid camera, Polaroid Land camera
733 pole
734 police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria
735 poncho
736 pool table, billiard table, snooker table
737 pop bottle, soda bottle
738 pot, flowerpot
739 potter's wheel
740 power drill
741 prayer rug, prayer mat
742 printer
743 prison, prison house
744 projectile, missile
745 projector
746 puck, hockey puck
747 punching bag, punch bag, punching ball, punchball
748 purse
749 quill, quill pen
750 quilt, comforter, comfort, puff
751 racer, race car, racing car
752 racket, racquet
753 radiator
754 radio, wireless
755 radio telescope, radio reflector
756 rain barrel
757 recreational vehicle, RV, R.V.
758 reel
759 reflex camera
760 refrigerator, icebox
761 remote control, remote
762 restaurant, eating house, eating place, eatery
763 revolver, six-gun, six-shooter
764 rifle
765 rocking chair, rocker
766 rotisserie
767 rubber eraser, rubber, pencil eraser
768 rugby ball
769 rule, ruler
770 running shoe
771 safe
772 safety pin
773 saltshaker, salt shaker
774 sandal
775 sarong
776 sax, saxophone
777 scabbard
778 scale, weighing machine
779 school bus
780 schooner
781 scoreboard
782 screen, CRT screen
783 screw
784 screwdriver
785 seat belt, seatbelt
786 sewing machine
787 shield, buckler
788 shoe shop, shoe-shop, shoe store
789 shoji
790 shopping basket
791 shopping cart
792 shovel
793 shower cap
794 shower curtain
795 ski
796 ski mask
797 sleeping bag
798 slide rule, slipstick
799 sliding door
800 slot, one-armed bandit
801 snorkel
802 snowmobile
803 snowplow, snowplough
804 soap dispenser
805 soccer ball
806 sock
807 solar dish, solar collector, solar furnace
808 sombrero
809 soup bowl
810 space bar
811 space heater
812 space shuttle
813 spatula
814 speedboat
815 spider web, spider's web
816 spindle
817 sports car, sport car
818 spotlight, spot
819 stage
820 steam locomotive
821 steel arch bridge
822 steel drum
823 stethoscope
824 stole
825 stone wall
826 stopwatch, stop watch
827 stove
828 strainer
829 streetcar, tram, tramcar, trolley, trolley car
830 stretcher
831 studio couch, day bed
832 stupa, tope
833 submarine, pigboat, sub, U-boat
834 suit, suit of clothes
835 sundial
836 sunglass
837 sunglasses, dark glasses, shades
838 sunscreen, sunblock, sun blocker
839 suspension bridge
840 swab, swob, mop
841 sweatshirt
842 swimming trunks, bathing trunks
843 swing
844 switch, electric switch, electrical switch
845 syringe
846 table lamp
847 tank, army tank, armored combat vehicle, armoured combat vehicle
848 tape player
849 teapot
850 teddy, teddy bear
851 television, television system
852 tennis ball
853 thatch, thatched roof
854 theater curtain, theatre curtain
855 thimble
856 thresher, thrasher, threshing machine
857 throne
858 tile roof
859 toaster
860 tobacco shop, tobacconist shop, tobacconist
861 toilet seat
862 torch
863 totem pole
864 tow truck, tow car, wrecker
865 toyshop
866 tractor
867 trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi
868 tray
869 trench coat
870 tricycle, trike, velocipede
871 trimaran
872 tripod
873 triumphal arch
874 trolleybus, trolley coach, trackless trolley
875 trombone
876 tub, vat
877 turnstile
878 typewriter keyboard
879 umbrella
880 unicycle, monocycle
881 upright, upright piano
882 vacuum, vacuum cleaner
883 vase
884 vault
885 velvet
886 vending machine
887 vestment
888 viaduct
889 violin, fiddle
890 volleyball
891 waffle iron
892 wall clock
893 wallet, billfold, notecase, pocketbook
894 wardrobe, closet, press
895 warplane, military plane
896 washbasin, handbasin, washbowl, lavabo, wash-hand basin
897 washer, automatic washer, washing machine
898 water bottle
899 water jug
900 water tower
901 whiskey jug
902 whistle
903 wig
904 window screen
905 window shade
906 Windsor tie
907 wine bottle
908 wing
909 wok
910 wooden spoon
911 wool, woolen, woollen
912 worm fence, snake fence, snake-rail fence, Virginia fence
913 wreck
914 yawl
915 yurt
916 web site, website, internet site, site
917 comic book
918 crossword puzzle, crossword
919 street sign
920 traffic light, traffic signal, stoplight
921 book jacket, dust cover, dust jacket, dust wrapper
922 menu
923 plate
924 guacamole
925 consomme
926 hot pot, hotpot
927 trifle
928 ice cream, icecream
929 ice lolly, lolly, lollipop, popsicle
930 French loaf
931 bagel, beigel
932 pretzel
933 cheeseburger
934 hotdog, hot dog, red hot
935 mashed potato
936 head cabbage
937 broccoli
938 cauliflower
939 zucchini, courgette
940 spaghetti squash
941 acorn squash
942 butternut squash
943 cucumber, cuke
944 artichoke, globe artichoke
945 bell pepper
946 cardoon
947 mushroom
948 Granny Smith
949 strawberry
950 orange
951 lemon
952 fig
953 pineapple, ananas
954 banana
955 jackfruit, jak, jack
956 custard apple
957 pomegranate
958 hay
959 carbonara
960 chocolate sauce, chocolate syrup
961 dough
962 meat loaf, meatloaf
963 pizza, pizza pie
964 potpie
965 burrito
966 red wine
967 espresso
968 cup
969 eggnog
970 alp
971 bubble
972 cliff, drop, drop-off
973 coral reef
974 geyser
975 lakeside, lakeshore
976 promontory, headland, head, foreland
977 sandbar, sand bar
978 seashore, coast, seacoast, sea-coast
979 valley, vale
980 volcano
981 ballplayer, baseball player
982 groom, bridegroom
983 scuba diver
984 rapeseed
985 daisy
986 yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum
987 corn
988 acorn
989 hip, rose hip, rosehip
990 buckeye, horse chestnut, conker
991 coral fungus
992 agaric
993 gyromitra
994 stinkhorn, carrion fungus
995 earthstar
996 hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa
997 bolete
998 ear, spike, capitulum
999 toilet tissue, toilet paper, bathroom tissue
# Mainbody Detection
The mainbody detection technology is currently a very widely used detection technology, which refers to the detect one or some mainbody objects in the picture, crop the corresponding area in the image and carry out recognition, thereby completing the entire recognition process. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
This tutorial will introduce the dataset and model training for mainbody detection in PaddleClas.
## 1. Dataset
The datasets we used for mainbody detection task are shown in the following table.
| Dataset | Image number | Image number used in <<br>>mainbody detection | Scenarios | Dataset link |
| ------------ | ------------- | -------| ------- | -------- |
| Objects365 | 170W | 6k | General Scenarios | [link](https://www.objects365.org/overview.html) |
| COCO2017 | 12W | 5k | General Scenarios | [link](https://cocodataset.org/) |
| iCartoonFace | 2k | 2k | Cartoon Face | [link](https://github.com/luxiangju-PersonAI/iCartoonFace) |
| LogoDet-3k | 3k | 2k | Logo | [link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| RPC | 3k | 3k | Product | [link](https://rpc-dataset.github.io/) |
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified to the category `foreground`, and the detection model we trained just contains one category (`foreground`).
## 2. Model Training
There are many types of object detection methods such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on.
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It deeply optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. Finally, it reached the state of the art in terms of "speed-precision". Specifically, the optimization strategy is as follows.
- Better backbone: ResNet50vd-DCN
- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- Better ImageNet pretrain weights
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md)
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) used for the model training, in which the dagtaset path is modified to the mainbody detection dataset.
The final inference model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).
...@@ -9,18 +9,19 @@ If the image category already exists in the image index database, then you can t ...@@ -9,18 +9,19 @@ If the image category already exists in the image index database, then you can t
* [1. Enviroment Preparation](#enviroment_preperation ) * [1. Enviroment Preparation](#enviroment_preperation )
* [2. Image Recognition Experience](#image_recognition_experience) * [2. Image Recognition Experience](#image_recognition_experience)
* [2.1 Download and Unzip the Inference Model and Demo Data](#download_and_unzip_the_inference_model_and_demo_data) * [2.1 Download and Unzip the Inference Model and Demo Data](#download_and_unzip_the_inference_model_and_demo_data)
* [2.2 Logo Recognition and Retrieval](#Logo_recognition_and_retrival) * [2.2 Product Recognition and Retrieval](#Product_recognition_and_retrival)
* [2.2.1 Single Image Recognition](#recognition_of_single_image) * [2.2.1 Single Image Recognition](#recognition_of_single_image)
* [2.2.2 Folder-based Batch Recognition](#folder_based_batch_recognition) * [2.2.2 Folder-based Batch Recognition](#folder_based_batch_recognition)
* [3. Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience) * [3. Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience)
* [3.1 Build the Base Library Based on Our Own Dataset](#build_the_base_library_based_on_your_own_dataset) * [3.1 Prepare for the new images and labels](#3.1)
* [3.2 ecognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library) * [3.2 Build a new Index Library](#build_a_new_index_library)
* [3.3 Recognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library)
<a name="enviroment_preparation"></a> <a name="enviroment_preparation"></a>
## 1. Enviroment Preparation ## 1. Enviroment Preparation
* Installation:Please take a reference to [Quick Installation ](./installation.md)to configure the PaddleClas environment. * Installation:Please take a reference to [Quick Installation ](./install_en.md)to configure the PaddleClas environment.
* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`. * Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`.
...@@ -65,7 +66,7 @@ cd .. ...@@ -65,7 +66,7 @@ cd ..
<a name="download_and_unzip_the_inference_model_and_demo_data"></a> <a name="download_and_unzip_the_inference_model_and_demo_data"></a>
### 2.1 Download and Unzip the Inference Model and Demo Data ### 2.1 Download and Unzip the Inference Model and Demo Data
Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands. Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands.
```shell ```shell
mkdir models mkdir models
...@@ -73,20 +74,20 @@ cd models ...@@ -73,20 +74,20 @@ cd models
# Download the generic detection inference model and unzip it # Download the generic detection inference model and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# Download and unpack the inference model # Download and unpack the inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd .. cd ..
mkdir dataset mkdir dataset
cd dataset cd dataset
# Download the demo data and unzip it # Download the demo data and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
cd .. cd ..
``` ```
Once unpacked, the `dataset` folder should have the following file structure. Once unpacked, the `dataset` folder should have the following file structure.
``` ```
├── logo_demo_data_v1.0 ├── product_demo_data_v1.0
│ ├── data_file.txt │ ├── data_file.txt
│ ├── gallery │ ├── gallery
│ ├── index │ ├── index
...@@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler ...@@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler
The `models` folder should have the following file structure. The `models` folder should have the following file structure.
``` ```
├── logo_rec_ResNet50_Logo3K_v1.0_infer ├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams │ ├── inference.pdiparams
│ ├── inference.pdiparams.info │ ├── inference.pdiparams.info
│ └── inference.pdmodel │ └── inference.pdmodel
...@@ -109,35 +110,44 @@ The `models` folder should have the following file structure. ...@@ -109,35 +110,44 @@ The `models` folder should have the following file structure.
│ └── inference.pdmodel │ └── inference.pdmodel
``` ```
<a name="Logo_recognition_and_retrival"></a> <a name="Product_recognition_and_retrival"></a>
### 2.2 Logo Recognition and Retrival ### 2.2 Product Recognition and Retrival
Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。 Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
<a name="recognition_of_single_image"></a> <a name="recognition_of_single_image"></a>
#### 2.2.1 Single Image Recognition #### 2.2.1 Single Image Recognition
Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval Run the following command to identify and retrieve the image `./dataset/product_demo_data_v1.0/query/wangzai.jpg` for recognition and retrieval
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml python3.7 python/predict_system.py -c configs/inference_product.yaml
``` ```
The image to be retrieved is shown below. The image to be retrieved is shown below.
<div align="center"> <div align="center">
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" /> <img src="../../images/recognition/product_demo/wangzai.jpg" width = "400" />
</div> </div>
The final output is shown below. The final output is shown below.
``` ```
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}] [{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])}
``` ```
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity. where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
There are 4 `旺仔牛奶` results in 5, the recognition result is correct.
The detection result is also saved in the folder `output`, which is shown as follows.
<div align="center">
<img src="../../images/recognition/product_demo/wangzai_det_result.jpg" width = "400" />
</div>
<a name="folder_based_batch_recognition"></a> <a name="folder_based_batch_recognition"></a>
#### 2.2.2 Folder-based Batch Recognition #### 2.2.2 Folder-based Batch Recognition
...@@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th ...@@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter. If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query" python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/"
``` ```
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field. Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
...@@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th ...@@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th
<a name="unkonw_category_image_recognition_experience"></a> <a name="unkonw_category_image_recognition_experience"></a>
## 3. Recognize Images of Unknown Category ## 3. Recognize Images of Unknown Category
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows: To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows:
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg"
``` ```
The image to be retrieved is shown below. The image to be retrieved is shown below.
<div align="center"> <div align="center">
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" /> <img src="../../images/recognition/product_demo/anmuxi.jpg" width = "400" />
</div> </div>
The output is as follows: The output is as follows:
``` ```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}] [{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}]
``` ```
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database. Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining. When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining.
<a name="build_the_base_library_based_on_your_own_dataset"></a> <a name="3.1"></a>
### 3.1 Build the Base Library Based on Your Own Dataset ### 3.1 Prepare for the new images and labels
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows.
```shell
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/
```
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one.
```shell
# copy the file
cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt
```
Then add some new lines into the new label file, which is shown as follows.
```
gallery/anmuxi/001.jpg 安慕希酸奶
gallery/anmuxi/002.jpg 安慕希酸奶
gallery/anmuxi/003.jpg 安慕希酸奶
gallery/anmuxi/004.jpg 安慕希酸奶
gallery/anmuxi/005.jpg 安慕希酸奶
gallery/anmuxi/006.jpg 安慕希酸奶
```
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `space` here.
First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomation,recording the category of the original images and the label information)store in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt` <a name="build_a_new_index_library"></a>
### 3.2 Build a new Index Base Library
Then use the following command to build the index to accelerate the retrieval process after recognition. Use the following command to build the index to accelerate the retrieval process after recognition.
```shell ```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update" python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
``` ```
Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index. Finally, the new index information is stored in the folder`./dataset/product_demo_data_v1.0/index_update`. Use the new index database for the above index.
<a name="Image_differentiation_based_on_the_new_index_library"></a> <a name="Image_differentiation_based_on_the_new_index_library"></a>
### 3.2 Recognize the Unknown Category Images ### 3.2 Recognize the Unknown Category Images
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows. To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows.
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update" python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
``` ```
The output is as follows: The output is as follows:
``` ```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}] [{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}]
``` ```
The recognition result is correct. There are 4 `安慕希酸奶` results in 5, the recognition result is correct.
# 动漫人物识别 # 动漫人物识别
## 简介 自七十年代以来,人脸识别已经成为了计算机视觉和生物识别领域研究最多的主题之一。近年来,传统的人脸识别方法已经被基于卷积神经网络(CNN)的深度学习方法代替。目前,人脸识别技术广泛应用于安防、商业、金融、智慧自助终端、娱乐等各个领域。而在行业应用强烈需求的推动下,动漫媒体越来越受到关注,动漫人物的人脸识别也成为一个新的研究领域。
自七十年代以来,人脸识别已经成为了计算机视觉和生物识别领域研究最多的主题之一。近年来,传统的人脸识别方法已经被基于卷积神经网络(CNN)的深度学习方法代替。目前,人脸识别技术广泛应用于安防、商业、金融、智慧自助终端、娱乐等各个领域。而在行业应用强烈需求的推动下,动漫媒体越来越受到关注,动漫人物的人脸识别也成为一个新的研究领域。
## 数据集 ## 1 算法介绍
### iCartoonFace数据集
近日,来自爱奇艺的一项新研究提出了一个新的基准数据集,名为iCartoonFace。该数据集由 5013 个动漫角色的 389678 张图像组成,并带有 ID、边界框、姿势和其他辅助属性。 iCartoonFace 是目前图像识别领域规模最大的卡通媒体数据集,而且质量高、注释丰富、内容全面,其中包含相似图像、有遮挡的图像以及外观有变化的图像。
与其他数据集相比,iCartoonFace无论在图像数量还是实体数量上,均具有明显领先的优势:
![icartoon](../../images/icartoon1.png) 算法整体流程,详见[特征学习](./feature_learning.md)整体流程。值得注意的是,本流程没有使用`Neck`模块。
具体配置信息详见[配置文件](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml)
论文地址:https://arxiv.org/pdf/1907.1339 具体模块如下所示,
### 数据预处理 ### 1.1 数据增强
相比于人脸识别任务,动漫人物头像的配饰、道具、发型等因素可以显著提升识别的准确率,因此在原数据集标注框的基础上,长、宽各expand为之前的2倍,并做截断处理,得到了目前训练所有的数据集。
训练集: 5013类,389678张图像; 验证集: query2500张,gallery20000张。训练时,对数据所做的预处理如下:
- 图像`Resize`到224 - 图像`Resize`到224
- 随机水平翻转 - 随机水平翻转
- Normalize:归一化到0~1 - Normalize:归一化到0~1
### 1.2 Backbone的具体设置
## 模型 采用ResNet50作为backbone。并采用大模型进行蒸馏
采用ResNet50作为backbone, 主要的提升策略包括:
- 加载预训练模型
- 分布式训练,更大的batch_size
- 采用大模型进行蒸馏
具体配置信息详见[配置文件](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml) ### 1.3 Metric Learning相关Loss设置
在动漫人物识别中,只使用了`CELoss`
## 2 实验结果
本方法使用iCartoonFace[1]数据集进行验证。该数据集由 5013 个动漫角色的 389678 张图像组成,并带有 ID、边界框、姿势和其他辅助属性。 iCartoonFace 是目前图像识别领域规模最大的卡通媒体数据集,而且质量高、注释丰富、内容全面,其中包含相似图像、有遮挡的图像以及外观有变化的图像。
与其他数据集相比,iCartoonFace无论在图像数量还是实体数量上,均具有明显领先的优势。其中训练集: 5013类,389678张图像; 验证集: query2500张,gallery20000张。
![icartoon](../../images/icartoon1.png)
值得注意的是,相比于人脸识别任务,动漫人物头像的配饰、道具、发型等因素可以显著提升识别的准确率,因此在原数据集标注框的基础上,长、宽各expand为之前的2倍,并做截断处理,得到了目前训练所有的数据集。
在此数据集上,此方法Recall1 达到83.24%。
## 3 参考文献
[1] Cartoon Face Recognition: A Benchmark Dataset. 2020. [下载地址](https://github.com/luxiangju-PersonAI/iCartoonFace)
# 特征学习 # 特征学习
此部分主要是针对`RecModel`的训练模式进行说明。`RecModel`的训练模式,主要是为了支持车辆识别(车辆细分类、ReID)、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此训练模式,主要有以下特征 此部分主要是针对特征学习的训练模式进行说明,即`RecModel`的训练模式。主要是为了支持车辆识别(车辆细分类、ReID)、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此特征学习部分,主要有以下特征
- 支持对`backbone`的输出进行截断,即支持提取任意中间层的特征信息 - 支持对`backbone`的输出进行截断,即支持提取任意中间层的特征信息
- 支持在`backbone`的feature输出层后,添加可配置的网络层,即`Neck`部分 - 支持在`backbone`的feature输出层后,添加可配置的网络层,即`Neck`部分
- 支持`ArcMargin``metric learning` 相关loss函数,提升特征学习能力 - 支持`ArcFace Loss``metric learning` 相关loss函数,提升特征学习能力
## yaml文件说明 ## 整体流程
`RecModel`的训练模式与普通分类训练的配置类似,配置文件主要分为以下几个部分: ![](../../images/recognition/rec_pipeline.png)
### 1 全局设置部分 特征学习的整体结构如上图所示,主要包括:数据增强、Backbone的设置、Neck、Metric Learning等几大部分。其中`Neck`部分为自由添加的网络层,如添加的embedding层等,当然也可以不用此模块。训练时,利用`Metric Learning`部分的Loss对模型进行优化。预测时,一般来说,默认以`Neck`部分的输出作为特征输出。
```yaml 针对不同的应用,可以根据需要,对每一部分自由选择。每一部分的具体配置,如数据增强、Backbone、Neck、Metric Learning相关Loss等设置,详见具体应用:[车辆识别](./vehicle_recognition.md)[Logo识别](./logo_recognition.md)[动漫人物识别](./cartoon_character_recognition.md)[商品识别](./product_recognition.md)
Global:
# 如为null则从头开始训练。若指定中间训练保存的状态地址,则继续训练
checkpoints: null
# pretrained model路径或者 bool类型
pretrained_model: null
# 模型保存路径
output_dir: "./output/"
device: "gpu"
class_num: 30671
# 保存模型的粒度,每个epoch保存一次
save_interval: 1
eval_during_train: True
eval_interval: 1
# 训练的epoch数
epochs: 160
# log输出频率
print_batch_step: 10
# 是否使用visualdl库
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
# 使用retrival的方式进行评测
eval_mode: "retrieval"
```
### 2 数据部分 ## 配置文件说明
```yaml 配置文件说明详见[yaml配置文件说明文档](../tutorials/config.md)。其中模型结构配置,详见文档中**识别模型结构配置**部分。
DataLoader:
Train:
dataset:
# 具体使用的Dataset的的名称
name: "VeriWild"
# 使用此数据集的具体参数
image_root: "./dataset/VeRI-Wild/images/"
cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
# 图像增广策略:ResizeImage、RandFlipImage等
transform_ops:
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
batch_size: 128
num_instances: 2
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
```
`val dataset`设置与`train dataset`除图像增广策略外,设置基本一致
### 3 Backbone的具体设置
```yaml
Arch:
# 使用RecModel模式进行训练
name: "RecModel"
# 导出inference model的具体配置
infer_output_key: "features"
infer_add_softmax: False
# 使用的Backbone
Backbone:
name: "ResNet50"
pretrained: True
# 使用此层作为Backbone的feature输出,name为ResNet50的full_name
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
# Backbone的基础上,新增网络层。此模型添加1x1的卷积层(embedding)
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
# 增加ArcMargin, 即ArcLoss的具体实现
Head:
name: "ArcMargin"
embedding_size: 512
class_num: 431
margin: 0.15
scale: 32
```
`Neck`部分为在`bacbone`基础上,添加的网络层,可根据需求添加。 如在ReID任务中,添加一个输出长度为512的`embedding`层,可由此部分实现。需注意的是,`Neck`部分需对应好`BackboneStopLayer`层的输出维度。一般来说,`Neck`部分为网络的最终特征输出层。
`Head`部分主要是为了支持`metric learning`等具体loss函数,如`ArcMargin`([ArcFace Loss](https://arxiv.org/abs/1801.07698)的fc层的具体实现),在完成训练后,一般将此部分剔除。
### 4 Loss的设置
```yaml
Loss:
Train:
- CELoss:
weight: 1.0
- SupConLoss:
weight: 1.0
# SupConLoss的具体参数
views: 2
Eval:
- CELoss:
weight: 1.0
```
训练时同时使用`CELoss``SupConLoss`,权重比例为`1:1`,测试时只使用`CELoss`
### 5 优化器设置
```yaml
Optimizer:
# 使用的优化器名称
name: Momentum
# 优化器具体参数
momentum: 0.9
lr:
# 使用的学习率调节具体名称
name: MultiStepDecay
# 学习率调节算法具体参数
learning_rate: 0.01
milestones: [30, 60, 70, 80, 90, 100, 120, 140]
gamma: 0.5
verbose: False
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
```
### 6 Eval Metric设置
```yaml
Metric:
Eval:
# 使用Recallk和mAP两种评价指标
- Recallk:
topk: [1, 5]
- mAP: {}
```
# Logo识别 # Logo识别
Logo识别技术,是现实生活中应用很广的一个领域,比如一张照片中是否出现了Adidas或者Nike的商标Logo,或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时,往往采用检测+识别两阶段方式,检测模块负责检测出潜在的Logo区域,根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍,内容包括: Logo识别技术,是现实生活中应用很广的一个领域,比如一张照片中是否出现了Adidas或者Nike的商标Logo,或者一个杯子上是否出现了星巴克或者可口可乐的商标Logo。通常Logo类别数量较多时,往往采用检测+识别两阶段方式,检测模块负责检测出潜在的Logo区域,根据检测区域抠图后输入识别模块进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对Logo图片的特征提取部分进行相关介绍
- 数据集及预处理方式 ## 1 算法介绍
- Backbone的具体设置
- Loss函数的相关设置
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml) 算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
## 1 数据集及预处理 整体设置详见: [ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)
### 1.1 LogoDet-3K数据集 具体模块如下所示
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" /> ### 1.1数据增强
LogoDet-3K数据集是具有完整标注的Logo数据集,有3000个标识类别,约20万个高质量的人工标注的标识对象和158652张图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359) 与普通训练分类不同,此部分主要使用如下图像增强方式:
### 1.2 数据预处理 - 图像`Resize`到224。对于Logo而言,使用的图像,直接为检测器crop之后的图像,因此直接resize到224
- [AugMix](https://arxiv.org/abs/1912.02781v1):模拟Logo图像形变变化等实际场景
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):模拟遮挡等实际情况
由于原始的数据集中,图像包含标注的检测框,在识别阶段只考虑检测器抠图后的logo区域,因此采用原始的标注框抠出Logo区域图像构成训练集,排除背景在识别阶段的影响。对数据集进行划分,产生155427张训练集,覆盖3000个logo类别(同时作为测试时gallery图库),3225张测试集,用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359) ### 1.2 Backbone的具体设置
- 图像`Resize`到224
- 随机水平翻转
- [AugMix](https://arxiv.org/abs/1912.02781v1)
- Normlize:归一化到0~1
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
## 2 Backbone的具体设置 使用`ResNet50`作为backbone,同时做了如下修改:
具体是用`ResNet50`作为backbone,主要做了如下修改: - last stage stride=1, 保持最后输出特征图尺寸14x14。计算量增加较小,但显著提高模型特征提取能力
- 使用ImageNet预训练模型
- last stage stride=1, 保持最后输出特征图尺寸14x14 具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
- 在最后加入一个embedding 卷积层,特征维度为512 ### 1.3 Neck部分
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py) 为了降低inferecne时计算特征距离的复杂度,添加一个embedding 卷积层,特征维度为512。
## 3 Loss的设置 ### 1.4 Metric Learning相关Loss的设置
在Logo识别中,使用了[Pairwise Cosface + CircleMargin](https://arxiv.org/abs/2002.10857) 联合训练,其中权重比例为1:1 在Logo识别中,使用了[Pairwise Cosface + CircleMargin](https://arxiv.org/abs/2002.10857) 联合训练,其中权重比例为1:1
具体代码详见:[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py)[CircleMargin](../../../ppcls/arch/gears/circlemargin.py) 具体代码详见:[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py)[CircleMargin](../../../ppcls/arch/gears/circlemargin.py)
## 2 实验结果
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
使用LogoDet-3K[1]数据集进行实验,此数据集是具有完整标注的Logo数据集,有3000个标识类别,约20万个高质量的人工标注的标识对象和158652张图片。
由于原始的数据集中,图像包含标注的检测框,在识别阶段只考虑检测器抠图后的logo区域,因此采用原始的标注框抠出Logo区域图像构成训练集,排除背景在识别阶段的影响。对数据集进行划分,产生155427张训练集,覆盖3000个logo类别(同时作为测试时gallery图库),3225张测试集,用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359)
在此数据集上,recall1 达到89.8%。
## 3 参考文献
其他部分参数,详见[配置文件](../../../ppcls/configs/Logo/ResNet50_ReID.yaml) [1] LogoDet-3K: A Large-Scale Image Dataset for Logo Detection[J]. arXiv preprint arXiv:2008.05359, 2020.
# 主体检测
主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。
本部分主要从数据集、模型训练2个方面对该部分内容进行介绍。
## 1. 数据集
在PaddleClas的识别任务中,训练主体检测模型时主要用到了以下几个数据集。
| 数据集 | 数据量 | 主体检测任务中使用的数据量 | 场景 | 数据集地址 |
| ------------ | ------------- | -------| ------- | -------- |
| Objects365 | 170W | 6k | 通用场景 | [地址](https://www.objects365.org/overview.html) |
| COCO2017 | 12W | 5k | 通用场景 | [地址](https://cocodataset.org/) |
| iCartoonFace | 2k | 2k | 动漫人脸检测 | [地址](https://github.com/luxiangju-PersonAI/iCartoonFace) |
| LogoDet-3k | 3k | 2k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| RPC | 3k | 3k | 商品检测 | [地址](https://rpc-dataset.github.io/) |
在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为"前景"的类别,最终融合的数据集中只包含1个类别,即前景。
## 2. 模型训练
目标检测方法种类繁多,比较常用的有两阶段检测器(如FasterRCNN系列等);单阶段检测器(如YOLO、SSD等);anchor-free检测器(如FCOS等)。
PP-YOLO由[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)提出,从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对yolov3模型进行深度优化,最终在"速度-精度"方面达到了业界领先的水平。具体地,优化的策略如下。
- 更优的骨干网络: ResNet50vd-DCN
- 更大的训练batch size: 8 GPUs,每GPU batch_size=24,对应调整学习率和迭代轮数
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- 更优的预训练模型
更多关于PP-YOLO的详细介绍可以参考:[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)
在主体检测任务中,为了保证检测效果,我们使用ResNet50vd-DCN的骨干网络,使用配置文件[ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml),更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
主体检测模型的inference模型下载地址为:[链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar)
# 商品识别 # 商品识别
商品识别技术,是现如今应用非常广的一个领域。拍照购物的方式已经被很多人所采纳,无人结算台已经走入各大超市,无人超市更是如火如荼,这背后都是以商品识别技术作为支撑。商品识别技术大概是"商品检测+商品识别"这样的流程,商品检测模块负责检测出潜在的商品区域,商品识别模型负责将商品检测模块检测出的主体进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对商品图片的特征提取部分进行相关介绍,内容包括: 商品识别技术,是现如今应用非常广的一个领域。拍照购物的方式已经被很多人所采纳,无人结算台已经走入各大超市,无人超市更是如火如荼,这背后都是以商品识别技术作为支撑。商品识别技术大概是"商品检测+商品识别"这样的流程,商品检测模块负责检测出潜在的商品区域,商品识别模型负责将商品检测模块检测出的主体进行识别。识别模块多采用检索的方式,根据查询图片和底库图片进行相似度排序获得预测类别。此文档主要对商品图片的特征提取部分进行相关介绍
- 数据集及预处理方式 ## 1 算法介绍
- Backbone的具体设置
- Loss函数的相关设置
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
## 1 Aliproduct 整体设置详见: [ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml)
### 1 数据集 具体细节如下所示。
<img src="../../images/product/aliproduct.png" style="zoom:50%;" /> ### 1.1数据增强
Aliproduct数据是天池竞赛开源的一个数据集,也是目前开源的最大的商品数据集,其有5万多个标识类别,约250万训练图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359)
### 2 图像预处理 - 图像`RandomCrop`到224x224
- 图像`Resize`到224x224
- 图像`RandomFlip` - 图像`RandomFlip`
- Normlize:图像归一化 - Normlize:图像归一化
### 3 Backbone的具体设置 ### 1.2 Backbone的具体设置
具体是用`ResNet50_vd`作为backbone,主要做了如下修改:
- 使用ImageNet预训练模型
- 在GAP后、分类层前加入一个512维的embedding FC层,没有做BatchNorm和激活。
### 4 Loss的设置 具体是用`ResNet50_vd`作为backbone,使用ImageNet预训练模型
在Aliproduct商品识别中,使用了[CELoss](../../../ppcls/loss/celoss.py)训练, 为了获得更加鲁棒的特征,后续会使用其他Loss参与训练,敬请期待。 ### 1.3 Neck部分
全部的超参数及具体配置:[ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml) 加入一个512维的embedding FC层,没有做BatchNorm和激活。
### 1.4 Metric Learning相关Loss的设置
## 2 Inshop 目前使用了[CELoss](../../../ppcls/loss/celoss.py)训练, 为了获得更加鲁棒的特征,后续会使用其他Loss参与训练,敬请期待。
### 1 数据集 ## 2 实验结果
<img src="../../images/product/inshop.png" style="zoom:50%;" /> <img src="../../images/product/aliproduct.png" style="zoom:50%;" />
Inshop数据集是DeepFashion的子集,其是香港中文大学开放的一个large-scale服装数据集,Inshop数据集是其中服装检索数据集,涵盖了大量买家秀的服装。相关数据介绍参考[原论文](https://openaccess.thecvf.com/content_cvpr_2016/papers/Liu_DeepFashion_Powering_Robust_CVPR_2016_paper.pdf)
### 2 图像预处理
数据增强是训练大规模
- 图像`Resize`到224x224
- 图像`RandomFlip`
- Normlize:图像归一化
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
### 3 Backbone的具体设置
具体是用`ResNet50_vd`作为backbone,主要做了如下修改:
- 使用ImageNet预训练模型 此方案在Aliproduct[1]数据集上进行实验。此数据集是天池竞赛开源的一个数据集,也是目前开源的最大的商品数据集,其有5万多个标识类别,约250万训练图片。
- 在GAP后、分类层前加入一个512维的embedding FC层,没有做BatchNorm和激活。 在此数据上,单模型Top 1 Acc:85.67%。
- 分类层采用[Arcmargin Head](../../../ppcls/arch/gears/arcmargin.py),具体原理可参考[原论文](https://arxiv.org/pdf/1801.07698.pdf)
### 4 Loss的设置
在Inshop商品识别中,使用了[CELoss](../../../ppcls/loss/celoss.py)[TripletLossV2](../../../ppcls/loss/triplet.py)联合训练。 ## 3 参考文献
全部的超参数及具体配置:[ResNet50_vd_Inshop.yaml](../../../ppcls/configs/Products/ResNet50_vd_Inshop.yaml) [1] Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV, 2020.
# 车辆细粒度分类
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
其训练过程与车辆ReID相比,有以下不同:
- 数据集不同
- Loss设置不同
其他部分请详见[车辆ReID](./vehicle_reid.md)
整体配置文件:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
## 1 数据集
在此demo中,使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
![](../../images/recognotion/vehicle/CompCars.png)
图像主要来自网络和监控数据,其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性(最大速度、排量、车门数、车座数、汽车类型)。监控数据包含**50,000**张前视角图像。
值得注意的是,此数据集中需要根据自己的需要生成不同的label,如本demo中,将不同年份生产的相同型号的车辆视为同一类,因此,类别总数为:431类。
## 2 Loss设置
与车辆ReID不同,在此分类中,Loss使用的是[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),权重比例1:1。
# 车辆识别
此部分主要包含两部分:车辆细粒度分类、车辆ReID。
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
ReID,也就是 Re-identification,其定义是利用算法,在图像库中找到要搜索的目标的技术,所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像,找出同一摄像头不同的拍摄图像,或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中,如何提取鲁棒特征,尤为重要。
此文档中,使用同一套训练方案对两个细方向分别做了尝试。
## 1 算法介绍
算法整体流程,详见[特征学习](./feature_learning.md)整体流程。
车辆ReID整体设置详见: [ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
车辆细分类整体设置详见:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
具体细节如下所示。
### 1.1数据增强
与普通训练分类不同,此部分主要使用如下图像增强方式:
- 图像`Resize`到224。尤其对于ReID而言,车辆图像已经是由检测器检测后crop出的车辆图像,因此若再使用crop图像增强,会丢失更多的车辆信息
- [AugMix](https://arxiv.org/abs/1912.02781v1):模拟光照变化、摄像头位置变化等实际场景
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):模拟遮挡等实际情况
### 1.2 Backbone的具体设置
使用`ResNet50`作为backbone,同时做了如下修改:
- last stage stride=1, 保持最后输出特征图尺寸14x14。计算量增加较小,但显著提高模型特征提取能力
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
### 1.3 Neck部分
为了降低inferecne时计算特征距离的复杂度,添加一个embedding 卷积层,特征维度为512。
### 1.4 Metric Learning相关Loss的设置
- 车辆ReID中,使用了[SupConLoss](../../../ppcls/loss/supconloss.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),其中权重比例为1:1
- 车辆细分类,使用[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),其中权重比例为1:1
## 2 实验结果
### 2.1 车辆ReID
<img src="../../images/recognition/vehicle/cars.JPG" style="zoom:50%;" />
此方法在VERI-Wild数据集上进行了实验。此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
| **Methods** | **Small** | | |
| :--------------------------: | :-------: | :-------: | :-------: |
| | mAP | Top1 | Top5 |
| Strong baesline(Resnet50)[1] | 76.61 | 90.83 | 97.29 |
| HPGN(Resnet50+PGN)[2] | 80.42 | 91.37 | - |
| GLAMOR(Resnet50+PGN)[3] | 77.15 | 92.13 | 97.43 |
| PVEN(Resnet50)[4] | 79.8 | 94.01 | 98.06 |
| SAVER(VAE+Resnet50)[5] | 80.9 | 93.78 | 97.93 |
| PaddleClas baseline1 | 65.6 | 92.37 | 97.23 |
| PaddleClas baseline2 | 80.09 | **93.81** | **98.26** |
注:baseline1 为目前的开源模型,baseline2即将开源
### 2.2 车辆细分类
车辆细分类中,使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
![](../../images/recognition/vehicle/CompCars.png)
数据集中图像主要来自网络和监控数据,其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性(最大速度、排量、车门数、车座数、汽车类型)。监控数据包含**50,000**张前视角图像。
值得注意的是,此数据集中需要根据自己的需要生成不同的label,如本demo中,将不同年份生产的相同型号的车辆视为同一类,因此,类别总数为:431类。
| **Methods** | Top1 Acc |
| :-----------------------------: | :-------: |
| ResNet101-swp[6] | 97.6% |
| Fine-Tuning DARTS[7] | 95.9% |
| Resnet50 + COOC[8] | 95.6% |
| A3M[9] | 95.4% |
| PaddleClas baseline (ResNet50) | **97.1**% |
## 3 参考文献
[1] Bag of Tricks and a Strong Baseline for Deep Person Re-Identification.CVPR workshop 2019.
[2] Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification. In arXiv preprint arXiv:2005.14684
[3] GLAMORous: Vehicle Re-Id in Heterogeneous Cameras Networks with Global and Local Attention. In arXiv preprint arXiv:2002.02256
[4] Parsing-based view-aware embedding network for vehicle re-identification. CVPR 2020.
[5] The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification. In ECCV 2020.
[6] Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition. IEEE Transactions on Intelligent Transportation Systems, 2017.
[7] Fine-Tuning DARTS for Image Classification. 2020.
[8] Fine-Grained Vehicle Classification with Unsupervised Parts Co-occurrence Learning. 2018
[9] Attribute-Aware Attention Model for Fine-grained Representation Learning. 2019.
# 车辆ReID
ReID,也就是 Re-identification,其定义是利用算法,在图像库中找到要搜索的目标的技术,所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像,找出同一摄像头不同的拍摄图像,或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中,如何提取鲁棒特征,尤为重要。因此,此文档主要对车辆ReID中训练特征提取网络部分做相关介绍,内容如下:
- 数据集及预处理方式
- Backbone的具体设置
- Loss函数的相关设置
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
## 1 数据集及预处理
### 1.1 VERI-Wild数据集
<img src="../../images/recognotion/vehicle/cars.JPG" style="zoom:50%;" />
此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
### 1.2 数据预处理
由于原始的数据集中,车辆图像已经是由检测器检测后crop出的车辆图像,因此无需像训练`ImageNet`中图像crop操作。整体的数据增强方式,按照顺序如下:
- 图像`Resize`到224
- 随机水平翻转
- [AugMix](https://arxiv.org/abs/1912.02781v1)
- Normlize:归一化到0~1
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
## 2 Backbone的具体设置
具体是用`ResNet50`作为backbone,但在`ResNet50`基础上做了如下修改:
- 0在最后加入一个embedding 层,即1x1的卷积层,特征维度为512
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
## 3 Loss的设置
车辆ReID中,使用了[SupConLoss](https://arxiv.org/abs/2004.11362) + [ArcLoss](https://arxiv.org/abs/1801.07698),其中权重比例为1:1
具体代码详见:[SupConLoss代码](../../../ppcls/loss/supconloss.py)[ArcLoss代码](../../../ppcls/arch/gears/arcmargin.py)
其他部分的具体设置,详见[配置文件](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
...@@ -33,14 +33,26 @@ ...@@ -33,14 +33,26 @@
| ls_epsilon | label_smoothing epsilon值| 0 | float | | ls_epsilon | label_smoothing epsilon值| 0 | float |
| use_distillation | 是否进行模型蒸馏 | False | bool | | use_distillation | 是否进行模型蒸馏 | False | bool |
## 结构(ARCHITECTURE) ## 结构(ARCHITECTURE)
### 分类模型结构配置
| 参数名字 | 具体含义 | 默认值 | 可选值 | | 参数名字 | 具体含义 | 默认值 | 可选值 |
|:---:|:---:|:---:|:---:| |:---:|:---:|:---:|:---:|
| name | 模型结构名字 | "ResNet50_vd" | PaddleClas提供的模型结构 | | name | 模型结构名字 | "ResNet50_vd" | PaddleClas提供的模型结构 |
| params | 模型传参 | {} | 模型结构所需的额外字典,如EfficientNet等配置文件中需要传入`padding_type`等参数,可以通过这种方式传入 | | params | 模型传参 | {} | 模型结构所需的额外字典,如EfficientNet等配置文件中需要传入`padding_type`等参数,可以通过这种方式传入 |
### 识别模型结构配置
| 参数名字 | 具体含义 | 默认值 | 可选值 |
| :---------------: | :-----------------------: | :--------: | :----------------------------------------------------------: |
| name | 模型结构 | "RecModel" | ["RecModel"] |
| infer_output_key | inference时的输出值 | “feature” | ["feature", "logits"] |
| infer_add_softmax | infercne是否添加softmax | True | [True, False] |
| Backbone | 使用Backbone的名字 | | 需传入字典结构,包含`name``pretrained`等key值。其中`name`为分类模型名字, `pretrained`为布尔值 |
| BackboneStopLayer | Backbone中的feature输出层 | | 需传入字典结构,包含`name`key值,具体值为Backbone中的特征输出层的`full_name` |
| Neck | 添加的网络Neck部分 | | 需传入字典结构,Neck网络层的具体输入参数 |
| Head | 添加的网络Head部分 | | 需传入字典结构,Head网络层的具体输入参数 |
### 学习率(LEARNING_RATE) ### 学习率(LEARNING_RATE)
......
# 开始使用 # 开始使用
## 注意: 本文主要介绍基于检索方式的识别
--- ---
请参考[安装指南](./install.md)配置运行环境,并根据[快速开始](./quick_start_new_user.md)文档准备flowers102数据集,本章节下面所有的实验均以flowers102数据集为例 首先,请参考[安装指南](./install.md)配置运行环境
PaddleClas目前支持的训练/评估环境如下: PaddleClas图像检索部分目前支持的训练/评估环境如下:
```shell ```shell
└── CPU/单卡GPU └── CPU/单卡GPU
   ├── Linux    ├── Linux
   └── Windows    └── Windows
```
## 目录
* [1. 数据准备与处理](#数据准备与处理)
* [2. 基于单卡GPU上的训练与评估](#基于单卡GPU上的训练与评估)
* [2.1 模型训练](#模型训练)
* [2.2 模型恢复训练](#模型恢复训练)
* [2.3 模型评估](#模型评估)
* [3. 导出inference模型](#导出inference模型)
<a name="数据的准备与处理"></a>
## 1. 数据的准备与处理
└── 多卡GPU * 进入PaddleClas目录。
└── Linux
```bash
## linux or mac, $path_to_PaddleClas表示PaddleClas的根目录,用户需要根据自己的真实目录修改
cd $path_to_PaddleClas
``` ```
## 1. 基于CPU/单卡GPU上的训练与评估 * 进入`dataset`目录,为了快速体验PaddleClas图像检索模块,此处使用的数据集为[CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf),其是一个包含200类鸟的细粒度鸟类数据集。首先,下载CUB_200_2011数据集,下载方式请参考[官网](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html)
在基于CPU/单卡GPU上训练与评估,推荐使用`tools/train.py``tools/eval.py`脚本。关于Linux平台多卡GPU环境下的训练与评估,请参考[2. 基于Linux+GPU的模型训练与评估](#2) ```shell
# linux or mac
cd dataset
<a name="1.1"></a> # 将下载后的数据拷贝到此目录
### 1.1 模型训练 cp {数据存放的路径}/CUB_200_2011.tgz .
准备好配置文件之后,可以使用下面的方式启动训练。 # 解压
tar -xzvf CUB_200_2011.tgz
``` #进入CUB_200_2011目录
python tools/train.py \ cd CUB_200_2011
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.use_gpu=True
``` ```
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o use_gpu=True`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`use_gpu`设置为`False` 该数据集在用作图像检索任务时,通常将前100类当做训练集,后100类当做测试集,所以此处需要将下载的数据集做一些后处理,来更好的适应PaddleClas的图像检索训练
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md) ```shell
#新建train和test目录
mkdir train && mkdir test
#将数据分成训练集和测试集,前100类作为训练集,后100类作为测试集
ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh
ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh
训练期间也可以通过VisualDL实时观察loss变化,详见[VisualDL](../extension/VisualDL.md) #生成train_list和test_list
tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt
tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt
```
### 1.2 模型微调 至此,现在已经得到`CUB_200_2011`的训练集(`train`目录)、测试集(`test`目录)、`train_list.txt``test_list.txt`
根据自己的数据集路径设置好配置文件后,可以通过加载预训练模型的方式进行微调,如下所示。 数据处理完毕后,`CUB_200_2011`中的`train`目录下应有如下结构:
``` ```
python tools/train.py \ ├── 1
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \ │   ├── Black_Footed_Albatross_0001_796111.jpg
-o Arch.Backbone.pretrained=True │   ├── Black_Footed_Albatross_0002_55.jpg
...
├── 10
│   ├── Red_Winged_Blackbird_0001_3695.jpg
│   ├── Red_Winged_Blackbird_0005_5636.jpg
...
``` ```
其中`-o Arch.Backbone.pretrained`用于设置是否加载预训练模型;为True时,会自动下载预训练模型,并加载。 `train_list.txt`应为:
<a name="1.3"></a>
### 1.3 模型恢复训练
如果训练任务因为其他原因被终止,也可以加载断点权重文件,继续训练:
``` ```
python tools/train.py \ train/99/Ovenbird_0137_92639.jpg 99 1
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \ train/99/Ovenbird_0136_92859.jpg 99 2
-o Global.checkpoints="./output/RecModel/epoch_5" \ train/99/Ovenbird_0135_93168.jpg 99 3
train/99/Ovenbird_0131_92559.jpg 99 4
train/99/Ovenbird_0130_92452.jpg 99 5
...
``` ```
只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息 其中,分隔符为空格" ", 三列数据的含义分别是训练数据的路径、训练数据的label信息、训练数据的unique id
<a name="1.4"></a> 测试集格式与训练集格式相同。
### 1.4 模型评估
可以通过以下命令进行模型评估。 **注意**
```bash * 当gallery dataset和query dataset相同时,为了去掉检索得到的第一个数据(检索图片本身无须评估),每个数据需要对应一个unique id,用于后续评测mAP、recall@1等指标。关于gallery dataset与query dataset的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于mAP、recall@1等评测指标请参考[图像检索评价指标](#图像检索评价指标)
python tools/eval.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \ 返回`PaddleClas`根目录
-o Global.pretrained_model="./output/RecModel/best_model"\
```shell
# linux or mac
cd ../../
``` ```
其中`-o Global.pretrained_model`用于设置需要进行评估的模型的路径
<a name="2"></a> <a name="基于单卡GPU上的训练与评估"></a>
## 2. 基于Linux+GPU的模型训练与评估 ## 2. 基于单卡GPU上的训练与评估
如果机器环境为Linux+GPU,那么推荐使用`paddle.distributed.launch`启动模型训练脚本(`tools/train.py`)、评估脚本(`tools/eval.py`),可以更方便地启动多卡训练与评估 在基于单卡GPU上训练与评估,推荐使用`tools/train.py``tools/eval.py`脚本
<a name="模型训练"></a>
### 2.1 模型训练 ### 2.1 模型训练
参考如下方式启动模型训练,`paddle.distributed.launch`通过设置`gpus`指定GPU运行卡号: 准备好配置文件之后,可以使用下面的方式启动图像检索任务的训练。PaddleClas训练图像检索任务的方法是度量学习,关于度量学习的解析请参考[度量学习](#度量学习)
```bash ```
# PaddleClas通过launch方式启动多卡多进程训练 python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Arch.Backbone.pretrained=True \
-o Global.device=gpu
```
export CUDA_VISIBLE_DEVICES=0,1,2,3 其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型,此外,`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`Global.device`设置为`cpu`
python -m paddle.distributed.launch \ 更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml
```
### 2.2 模型微调 运行上述命令,可以看到输出日志,示例如下:
根据自己的数据集配置好配置文件之后,可以加载预训练模型进行微调,如下所示。 ```
...
[Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154
...
[Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238
...
```
此处配置文件的Backbone是MobileNetV1,如果想使用其他Backbone,可以重写参数`Arch.Backbone.name`,比如命令中增加`-o Arch.Backbone.name={其他Backbone}`。此外,由于不同模型`Neck`部分的输入维度不同,更换Backbone后可能需要改写此处的输入大小,改写方式类似替换Backbone的名字。
``` 在训练Loss部分,此处使用了[CELoss](../../../ppcls/loss/celoss.py)[TripletLossV2](../../../ppcls/loss/triplet.py),配置文件如下:
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Arch.Backbone.pretrained=True
``` ```
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
```
最终的总Loss是所有Loss的加权和,其中weight定义了特定Loss在最终总Loss的权重。如果想替换其他Loss,也可以在配置文件中更改Loss字段,目前支持的Loss请参考[Loss](../../../ppcls/loss)
### 2.3 模型恢复训练 <a name="模型恢复训练"></a>
### 2.2 模型恢复训练
如果训练任务因为其他原因被终止,也可以加载断点权重文件继续训练。 如果训练任务因为其他原因被终止,也可以加载断点权重文件,继续训练:
``` ```
export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
python -m paddle.distributed.launch \ -o Global.checkpoints="./output/RecModel/epoch_5" \
--gpus="0,1,2,3" \ -o Global.device=gpu
tools/train.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \
``` ```
### 2.4 模型评估 其中配置文件不需要做任何修改,只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
**注意**
* `-o Global.checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`Global.checkpoints`参数只需设置为`"./output/RecModel/epoch_5"`,PaddleClas会自动补充后缀名。
```shell
output/
└── RecModel
├── best_model.pdopt
├── best_model.pdparams
├── best_model.pdstates
├── epoch_1.pdopt
├── epoch_1.pdparams
├── epoch_1.pdstates
.
.
.
```
<a name="模型评估"></a>
### 2.3 模型评估
可以通过以下命令进行模型评估。 可以通过以下命令进行模型评估。
```bash ```bash
python. -m paddle.distributed.launch \ python3 tools/eval.py \
--gpus="0,1,2,3" \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
tools/eval.py \ -o Global.pretrained_model=./output/RecModel/best_model
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.pretrained_model="./output/RecModel/best_model"\
``` ```
<a name="model_inference"></a> 上述命令将使用`./configs/quick_start/MobileNetV1_retrieval.yaml`作为配置文件,对上述训练得到的模型`./output/RecModel/best_model`进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过`-o`参数更新配置,如上所示。
## 3. 使用inference模型进行模型推理
### 3.1 导出推理模型
通过导出inference模型,PaddlePaddle支持使用预测引擎进行预测推理。接下来介绍如何用预测引擎进行推理: 可配置的部分评估参数说明如下:
首先,对训练好的模型进行转换: * `Arch.name`:模型名称
* `Global.pretrained_model`:待评估的模型的预训练模型文件路径,不同于`Global.Backbone.pretrained`,此处的预训练模型是整个模型的权重,而`Global.Backbone.pretrained`只是Backbone部分的权重。当需要做模型评估时,需要加载整个模型的权重。
* `Metric.Eval`:待评估的指标,默认评估recall@1、recall@5、mAP。当你不准备评测某一项指标时,可以将对应的试标从配置文件中删除;当你想增加某一项评测指标时,也可以参考[Metric](../../../ppcls/metric/metrics.py)部分在配置文件`Metric.Eval`中添加相关的指标。
```bash **注意:**
python tools/export_model.py \
-c ppcls/configs/quick_start/ResNet50_vd_finetune_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model \
-o Global.save_inference_dir=./inference \
```
其中,`--Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[1.3 模型恢复训练](#1.3)),`--Global.save_inference_dir`用于指定转换后模型的存储路径。
`--save_inference_dir=./inference`,则会在`inference`文件夹下生成`inference.pdiparams``inference.pdmodel``inference.pdiparams.info`文件。
### 3.2 构建底库 * 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas会自动补齐`.pdparams`的后缀,如[2.2 模型恢复训练](#模型恢复训练)
通过检索方式来进行图像识别,需要构建底库。
首先, 将生成的模型拷贝到deploy目录下,并进入deploy目录:
```bash
mv ./inference ./deploy
cd deploy
```
其次,构建底库,命令如下: * Metric learning任务一般不评测TopkAcc。
```bash
python python/build_gallery.py \
-c configs/build_flowers.yaml \
-o Global.rec_inference_model_dir="./inference" \
-o IndexProcess.index_path="../dataset/flowers102/index" \
-o IndexProcess.image_root="../dataset/flowers102/" \
-o IndexProcess.data_file="../dataset/flowers102/train_list.txt"
```
其中
+ `Global.rec_inference_model_dir`:3.1生成的推理模型的路径
+ `IndexProcess.index_path`:gallery库index的路径
+ `IndexProcess.image_root`:gallery库图片的根目录
+ `IndexProcess.data_file`:gallery库图片的文件列表
执行完上述命令之后,会在`../dataset/flowers102`目录下面生成`index`目录,index目录下面包含3个文件`index.data`, `1index.graph`, `info.json`
### 3.3 推理预测 <a name="导出inference模型"></a>
## 3. 导出inference模型
通过3.1生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),通过3.2构建好底库, 然后可以使用预测引擎进行推理 通过导出inference模型,PaddlePaddle支持使用预测引擎进行预测推理。对训练好的模型进行转换
```bash ```bash
python python/predict_rec.py \ python3 tools/export_model.py \
-c configs/inference_flowers.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.infer_imgs="./images/image_00002.jpg" \ -o Global.pretrained_model=output/RecModel/best_model \
-o Global.rec_inference_model_dir="./inference" \ -o Global.save_inference_dir=./inference
-o Global.use_gpu=True \
-o Global.use_tensorrt=False
``` ```
其中:
+ `Global.infer_imgs`:待预测的图片文件路径,如 `./images/image_00002.jpg`
+ `Global.rec_inference_model_dir`:预测模型文件路径,如 `./inference/`
+ `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认值:`True`
+ `Global.use_gpu`:是否使用 GPU 预测,默认值:`True`
执行完上述命令之后,会得到输入图片对应的特征信息, 本例子中特征维度为2048, 日志显示如下: 其中,`Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2 模型恢复训练](#模型恢复训练))。当执行后,会在当前目录下生成`./inference`目录,目录下包含`inference.pdiparams``inference.pdiparams.info``inference.pdmodel`文件。`Global.save_inference_dir`可以指定导出inference模型的路径。此处保存的inference模型在embedding特征层做了截断,即模型最终的输出为n维embedding特征。
```
(1, 2048) 上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理。使用inference模型推理的流程可以参考[基于Python预测引擎预测推理](@shengyu)
[[0.00033124 0.00056205 0.00032261 ... 0.00030939 0.00050748 0.00030271]]
``` ## 基础知识
图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次,对这些图像的特征向量用欧式距离或余弦距离进行最近邻搜索以找到底库中相似的图像,最后,可以使用一些后处理技术对检索结果进行微调,确定被识别图像的类别等信息。所以,决定一个图像检索算法性能的关键在于图像对应的特征向量的好坏。
<a name="度量学习"></a>
- 度量学习(Metric Learning)
度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法 (kNN、k-means等) 取得较好的性能。深度度量学习 (Deep Metric Learning )是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间 (嵌入空间,embedding space) 的映射,使得同类对象在嵌入空间上使用常用的距离函数 (欧氏距离、cosine距离等) 计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。
<a name="图像检索数据集介绍"></a>
- 图像检索数据集介绍
- 训练集合(train dataset):用来训练模型,使模型能够学习该集合的图像特征。
- 底库数据集合(gallery dataset):用来提供图像检索任务中的底库数据,该集合可与训练集或测试集相同,也可以不同,当与训练集相同时,测试集的类别体系应与训练集的类别体系相同。
- 测试集合(query dataset):用来测试模型的好坏,通常要对测试集的每一张测试图片进行特征提取,之后和底库数据的特征进行距离匹配,得到识别结果,后根据识别结果计算整个测试集的指标。
<a name="图像检索评价指标"></a>
- 图像检索评价指标
<a name="召回率"></a>
- 召回率(recall):表示预测为正例且标签为正例的个数 / 标签为正例的个数
- recall@1:检索的top-1中预测正例且标签为正例的个数 / 标签为正例的个数
- recall@5:检索的top-5中所有预测正例且标签为正例的个数 / 标签为正例的个数
<a name="平均检索精度"></a>
- 平均检索精度(mAP)
- AP: AP指的是不同召回率上的正确率的平均值
- mAP: 测试集中所有图片对应的AP的的平均值
\ No newline at end of file
...@@ -93,13 +93,13 @@ python3 -c "import paddle; print(paddle.__version__)" ...@@ -93,13 +93,13 @@ python3 -c "import paddle; print(paddle.__version__)"
### 2.1 克隆PaddleClas模型库 ### 2.1 克隆PaddleClas模型库
```bash ```bash
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop git clone https://github.com/PaddlePaddle/PaddleClas.git -b release/2.2
``` ```
如果从github上网速太慢,可以从gitee下载,下载命令如下: 如果从github上网速太慢,可以从gitee下载,下载命令如下:
```bash ```bash
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.2
``` ```
### 2.2 安装Python依赖库 ### 2.2 安装Python依赖库
......
...@@ -9,18 +9,19 @@ ...@@ -9,18 +9,19 @@
* [1. 环境配置](#环境配置) * [1. 环境配置](#环境配置)
* [2. 图像识别体验](#图像识别体验) * [2. 图像识别体验](#图像识别体验)
* [2.1 下载、解压inference 模型与demo数据](#下载、解压inference_模型与demo数据) * [2.1 下载、解压inference 模型与demo数据](#下载、解压inference_模型与demo数据)
* [2.2 Logo识别与检索](#Logo识别与检索) * [2.2 商品别与检索](#商品识别与检索)
* [2.2.1 识别单张图像](#识别单张图像) * [2.2.1 识别单张图像](#识别单张图像)
* [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别) * [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别)
* [3. 未知类别的图像识别体验](#未知类别的图像识别体验) * [3. 未知类别的图像识别体验](#未知类别的图像识别体验)
* [3.1 基于自己的数据集构建索引库](#基于自己的数据集构建索引库) * [3.1 准备新的数据与标签](#准备新的数据与标签)
* [3.2 基于新的索引库的图像识别](#基于新的索引库的图像识别) * [3.2 建立新的索引库](#建立新的索引库)
* [3.3 基于新的索引库的图像识别](#基于新的索引库的图像识别)
<a name="环境配置"></a> <a name="环境配置"></a>
## 1. 环境配置 ## 1. 环境配置
* 安装:请先参考[快速安装](./installation.md)配置PaddleClas运行环境。 * 安装:请先参考[快速安装](./install.md)配置PaddleClas运行环境。
* 进入`deploy`运行目录。本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面的命令进入`deploy`目录。 * 进入`deploy`运行目录。本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面的命令进入`deploy`目录。
...@@ -65,7 +66,7 @@ cd .. ...@@ -65,7 +66,7 @@ cd ..
<a name="下载、解压inference_模型与demo数据"></a> <a name="下载、解压inference_模型与demo数据"></a>
### 2.1 下载、解压inference 模型与demo数据 ### 2.1 下载、解压inference 模型与demo数据
Logo识别为例,下载通用检测、识别模型以及Logo识别demo数据,命令如下。 以商品识别为例,下载通用检测、识别模型以及商品识别demo数据,命令如下。
```shell ```shell
mkdir models mkdir models
...@@ -73,20 +74,20 @@ cd models ...@@ -73,20 +74,20 @@ cd models
# 下载通用检测inference模型并解压 # 下载通用检测inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# 下载识别inference模型并解压 # 下载识别inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd .. cd ..
mkdir dataset mkdir dataset
cd dataset cd dataset
# 下载demo数据并解压 # 下载demo数据并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
cd .. cd ..
``` ```
解压完毕后,`dataset`文件夹下应有如下文件结构: 解压完毕后,`dataset`文件夹下应有如下文件结构:
``` ```
├── logo_demo_data_v1.0 ├── product_demo_data_v1.0
│ ├── data_file.txt │ ├── data_file.txt
│ ├── gallery │ ├── gallery
│ ├── index │ ├── index
...@@ -99,7 +100,7 @@ cd .. ...@@ -99,7 +100,7 @@ cd ..
`models`文件夹下应有如下文件结构: `models`文件夹下应有如下文件结构:
``` ```
├── logo_rec_ResNet50_Logo3K_v1.0_infer ├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams │ ├── inference.pdiparams
│ ├── inference.pdiparams.info │ ├── inference.pdiparams.info
│ └── inference.pdmodel │ └── inference.pdmodel
...@@ -109,35 +110,45 @@ cd .. ...@@ -109,35 +110,45 @@ cd ..
│ └── inference.pdmodel │ └── inference.pdmodel
``` ```
<a name="Logo识别与检索"></a> <a name="商品识别与检索"></a>
### 2.2 Logo识别与检索 ### 2.2 商品识别与检索
Logo识别demo为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测)。 商品识别demo为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测)。
<a name="识别单张图像"></a> <a name="识别单张图像"></a>
#### 2.2.1 识别单张图像 #### 2.2.1 识别单张图像
运行下面的命令,对图像`./dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg`进行识别与检索 运行下面的命令,对图像`./dataset/product_demo_data_v1.0/query/wangzai.jpg`进行识别与检索
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml # 使用下面的命令使用GPU进行预测
python3.7 python/predict_system.py -c configs/inference_product.yaml
# 使用下面的命令使用CPU进行预测
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.use_gpu=False
``` ```
待检索图像如下所示。 待检索图像如下所示。
<div align="center"> <div align="center">
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" /> <img src="../../images/recognition/product_demo/wangzai.jpg" width = "400" />
</div> </div>
最终输出结果如下。 最终输出结果如下。
``` ```
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}] [{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])}
``` ```
其中bbox表示检测出的主体所在位置,rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。由rec_docs字段可以看出,返回的若干个结果均为aux,识别正确。 其中bbox表示检测出的主体所在位置,rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。由rec_docs字段可以看出,返回的5个结果中,有4个为`旺仔牛奶`,识别正确。
检测的可视化结果也保存在`output`文件夹下。
<div align="center">
<img src="../../images/recognition/product_demo/wangzai_det_result.jpg" width = "400" />
</div>
<a name="基于文件夹的批量识别"></a> <a name="基于文件夹的批量识别"></a>
...@@ -146,7 +157,8 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml ...@@ -146,7 +157,8 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml
如果希望预测文件夹内的图像,可以直接修改配置文件中的`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。 如果希望预测文件夹内的图像,可以直接修改配置文件中的`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query" # 使用下面的命令使用GPU进行预测,如果希望使用CPU预测,可以在命令后面添加-o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/"
``` ```
更多地,可以通过修改`Global.rec_inference_model_dir`字段来更改识别inference模型的路径,通过修改`IndexProcess.index_path`字段来更改索引库索引的路径。 更多地,可以通过修改`Global.rec_inference_model_dir`字段来更改识别inference模型的路径,通过修改`IndexProcess.index_path`字段来更改索引库索引的路径。
...@@ -155,56 +167,86 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infe ...@@ -155,56 +167,86 @@ python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infe
<a name="未知类别的图像识别体验"></a> <a name="未知类别的图像识别体验"></a>
## 3. 未知类别的图像识别体验 ## 3. 未知类别的图像识别体验
对图像`./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`进行识别,命令如下 对图像`./dataset/product_demo_data_v1.0/query/anmuxi.jpg`进行识别,命令如下
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" # 使用下面的命令使用GPU进行预测,如果希望使用CPU预测,可以在命令后面添加-o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg"
``` ```
待检索图像如下所示。 待检索图像如下所示。
<div align="center"> <div align="center">
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" /> <img src="../../images/recognition/product_demo/anmuxi.jpg" width = "400" />
</div> </div>
输出结果如下 输出结果如下
``` ```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}] [{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}]
``` ```
由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。 由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。
当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的。 当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的。
<a name="基于自己的数据集构建索引库"></a> <a name="准备新的数据与标签"></a>
### 3.1 基于自己的数据集构建索引库 ### 3.1 准备新的数据与标签
首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹(`./dataset/product_demo_data_v1.0.0/gallery`)中,运行下面的命令拷贝相似图像。
```shell
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/
```
然后需要编辑记录了图像路径和标签信息的文本文件(`./dataset/product_demo_data_v1.0/data_file.txt`),这里基于原始标签文件,新建一个文件。命令如下。
```shell
# 复制文件
cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt
```
然后在文件`dataset/product_demo_data_v1.0/data_file_update.txt`中添加以下的信息,
```
gallery/anmuxi/001.jpg 安慕希酸奶
gallery/anmuxi/002.jpg 安慕希酸奶
gallery/anmuxi/003.jpg 安慕希酸奶
gallery/anmuxi/004.jpg 安慕希酸奶
gallery/anmuxi/005.jpg 安慕希酸奶
gallery/anmuxi/006.jpg 安慕希酸奶
```
每一行的文本中,第一个字段表示图像的相对路径,第二个字段表示图像对应的标签信息,中间用`空格符`分隔开。
首先需要获取待入库的原始图像文件(保存在`./dataset/logo_demo_data_v1.0/gallery`文件夹中)以及对应的标签信息,记录原始图像文件的文件名与标签信息)保存在文本文件`./dataset/logo_demo_data_v1.0/data_file_update.txt`中)。 <a name="建立新的索引库"></a>
### 3.2 建立新的索引库
然后使用下面的命令构建index索引,加速识别后的检索过程。 使用下面的命令构建index索引,加速识别后的检索过程。
```shell ```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update" python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
``` ```
最终新的索引信息保存在文件夹`./dataset/logo_demo_data_v1.0/index_update`中。 最终新的索引信息保存在文件夹`./dataset/product_demo_data_v1.0/index_update`中。
<a name="基于新的索引库的图像识别"></a> <a name="基于新的索引库的图像识别"></a>
### 3.2 基于新的索引库的图像识别 ### 3.3 基于新的索引库的图像识别
使用新的索引库,对上述图像进行识别,运行命令如下。 使用新的索引库,对上述图像进行识别,运行命令如下。
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update" # 使用下面的命令使用GPU进行预测,如果希望使用CPU预测,可以在命令后面添加-o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update"
``` ```
输出结果如下。 输出结果如下。
``` ```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}] [{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}]
``` ```
识别结果正确。 返回的5个结果中,有4个为`安慕希酸奶`识别结果正确。
...@@ -19,6 +19,8 @@ Global: ...@@ -19,6 +19,8 @@ Global:
# model architecture # model architecture
Arch: Arch:
name: "RecModel" name: "RecModel"
infer_output_key: "features"
infer_add_softmax: False
Backbone: Backbone:
name: "ResNet50_last_stage_stride1" name: "ResNet50_last_stage_stride1"
pretrained: True pretrained: True
......
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 101
save_interval: 5
eval_during_train: True
eval_interval: 1
epochs: 50
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
eval_mode: retrieval
# model architecture
Arch:
name: RecModel
infer_output_key: features
infer_add_softmax: False
Backbone:
name: MobileNetV1
pretrained: False
BackboneStopLayer:
name: flatten_0
Neck:
name: FC
embedding_size: 1024
class_num: 512
Head:
name: ArcMargin
embedding_size: 512
class_num: 101
margin: 0.15
scale: 30
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: MultiStepDecay
learning_rate: 0.01
milestones: [20, 30, 40]
gamma: 0.5
verbose: False
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: VeriWild
image_root: ./dataset/CUB_200_2011/
cls_label_path: ./dataset/CUB_200_2011/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
batch_size: 64
num_instances: 2
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
Query:
dataset:
name: VeriWild
image_root: ./dataset/CUB_200_2011/
cls_label_path: ./dataset/CUB_200_2011/test_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/CUB_200_2011/
cls_label_path: ./dataset/CUB_200_2011/test_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Metric:
Eval:
- Recallk:
topk: [1, 5]
- mAP: {}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册