提交 436f64ad 编写于 作者: Y Yang Nie

Merge branch 'develop' into ConvNeXt

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
......@@ -12,3 +12,4 @@ build/
log/
nohup.out
.DS_Store
.idea
include LICENSE.txt
include README.md
include docs/en/whl_en.md
recursive-include deploy/python predict_cls.py preprocess.py postprocess.py det_preprocess.py
recursive-include deploy/python *.py
recursive-include deploy/configs *.yaml
recursive-include deploy/utils get_image_list.py config.py logger.py predictor.py
recursive-include ppcls/ *.py *.txt
\ No newline at end of file
README_ch.md
\ No newline at end of file
README_en.md
\ No newline at end of file
......@@ -4,100 +4,130 @@
## 简介
飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
**近期更新**
- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)
- 2022.1.27 全面升级文档;新增[PaddleServing C++ pipeline部署方式](./deploy/paddleserving)[18M图像识别安卓部署Demo](./deploy/lite_shitu)
- 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo
- 2021.10.23 发布轻量级图像识别系统PP-ShiTu,CPU上0.2s即可完成在10w+库的图像识别。
[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)立即体验
- 2021.09.17 发布PP-LCNet系列超轻量骨干网络模型, 在Intel CPU上,单张图像预测速度约5ms,ImageNet-1K数据集上Top1识别准确率达到80.82%,超越ResNet152的模型效果。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](docs/zh_CN/models/PP-LCNet.md),相关指标和预训练权重可以从 [这里](docs/zh_CN/algorithm_introduction/ImageNet_models.md)下载。
- [more](./docs/zh_CN/others/update_history.md)
## 特性
- PP-ShiTu轻量图像识别系统:集成了目标检测、特征学习、图像检索等模块,广泛适用于各类图像识别任务。cpu上0.2s即可完成在10w+库的图像识别。
<div align="center">
<img src="./docs/images/class_simple.gif" width = "600" />
<p>PULC实用图像分类模型效果展示</p>
</div>
&nbsp;
- PP-LCNet轻量级CPU骨干网络:专门为CPU设备打造轻量级骨干网络,速度、精度均远超竞品。
- 丰富的预训练模型库:提供了36个系列共175个ImageNet预训练模型,其中7个精选系列模型支持结构快速修改。
<div align="center">
<img src="./docs/images/recognition.gif" width = "400" />
<p>PP-ShiTu图像识别系统效果展示</p>
</div>
- 全面易用的特征学习组件:集成arcmargin, triplet loss等12度量学习方法,通过配置文件即可随意组合切换。
- SSLD知识蒸馏:14个分类预训练模型,精度普遍提升3%以上;其中ResNet50_vd模型在ImageNet-1k数据集上的Top-1精度达到了84.0%,
Res2Net200_vd预训练模型Top-1精度高达85.1%。
## 近期更新
- 📢将于**6月15-6月17日晚20:30** 进行为期三天的课程直播,详细介绍超轻量图像分类方案,对各场景模型优化原理及使用方式进行拆解,之后还有产业案例全流程实操,对各类痛难点解决方案进行手把手教学,加上现场互动答疑,抓紧扫码上车吧!
<div align="center">
<img src="./docs/images/recognition.gif" width = "400" />
<img src="https://user-images.githubusercontent.com/45199522/173483779-2332f990-4941-4f8d-baee-69b62035fc31.png" width = "200" height = "200"/>
</div>
- 🔥️ 2022.6.15 发布[PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md),CPU推理3ms,精度比肩SwinTransformer,覆盖人、车、OCR场景九大常见任务。
- 2022.5.26 [飞桨产业实践范例直播课](http://aglc.cn/v-c4FAR),解读**超轻量重点区域人员出入管理方案**
- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475),具体内容可以在 AI Studio 上体验。
- 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNetv2](./docs/zh_CN/models/PP-LCNetV2.md)
- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)
- [more](./docs/zh_CN/others/update_history.md)
## 特性
PaddleClas发布了[PP-HGNet](docs/zh_CN/models/PP-HGNet.md)[PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)[PP-LCNet](docs/zh_CN/models/PP-LCNet.md)[SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)等算法,
并支持多种图像分类、识别相关算法,在此基础上打造[PULC超轻量图像分类方案](docs/zh_CN/PULC/PULC_quickstart.md)[PP-ShiTu图像识别系统](./docs/zh_CN/quick_start/quick_start_recognition.md)
![](https://user-images.githubusercontent.com/19523330/173273046-239a42da-c88d-4c2c-94b1-2134557afa49.png)
## 欢迎加入技术交流群
* 您可以扫描下面的QQ/微信二维码(添加小助手微信并回复“C”),加入PaddleClas微信交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
* 您可以扫描下面的微信/QQ二维码(添加小助手微信并回复“C”),加入PaddleClas微信交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
<div align="center">
<img src="https://user-images.githubusercontent.com/80816848/164383225-e375eb86-716e-41b4-a9e0-4b8a3976c1aa.jpg" width="200"/>
<img src="https://user-images.githubusercontent.com/48054808/160531099-9811bbe6-cfbb-47d5-8bdb-c2b40684d7dd.png" width="200"/>
<img src="https://user-images.githubusercontent.com/80816848/164383225-e375eb86-716e-41b4-a9e0-4b8a3976c1aa.jpg" width="200"/>
</div>
## 快速体验
PULC超轻量图像分类方案快速体验:[点击这里](docs/zh_CN/PULC/PULC_quickstart.md)
PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)
## 文档教程
- 安装说明
- [安装Paddle](./docs/zh_CN/installation/install_paddle.md)
- [安装PaddleClas](./docs/zh_CN/installation/install_paddleclas.md)
- 快速体验
- [PP-ShiTu图像识别快速体验](./docs/zh_CN/quick_start/quick_start_recognition.md)
- 图像分类快速体验
- [尝鲜版](./docs/zh_CN/quick_start/quick_start_classification_new_user.md)
- [进阶版](./docs/zh_CN/quick_start/quick_start_classification_professional.md)
- [多标签分类](./docs/zh_CN/quick_start/quick_start_multilabel_classification.md)
- [环境准备](docs/zh_CN/installation/install_paddleclas.md)
- [PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md)
- [超轻量图像分类快速体验](docs/zh_CN/PULC/PULC_quickstart.md)
- [超轻量图像分类模型库](docs/zh_CN/PULC/PULC_model_list.md)
- [PULC有人/无人分类模型](docs/zh_CN/PULC/PULC_person_exists.md)
- [PULC人体属性识别模型](docs/zh_CN/PULC/PULC_person_attribute.md)
- [PULC佩戴安全帽分类模型](docs/zh_CN/PULC/PULC_safety_helmet.md)
- [PULC交通标志分类模型](docs/zh_CN/PULC/PULC_traffic_sign.md)
- [PULC车辆属性识别模型](docs/zh_CN/PULC/PULC_vehicle_attribute.md)
- [PULC有车/无车分类模型](docs/zh_CN/PULC/PULC_car_exists.md)
- [PULC含文字图像方向分类模型](docs/zh_CN/PULC/PULC_text_image_orientation.md)
- [PULC文本行方向分类模型](docs/zh_CN/PULC/PULC_textline_orientation.md)
- [PULC语种分类模型](docs/zh_CN/PULC/PULC_language_classification.md)
- [模型训练](docs/zh_CN/PULC/PULC_train.md)
- 推理部署
- [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#1)
- [基于C++预测引擎推理](docs/zh_CN/inference_deployment/cpp_deploy.md)
- [服务化部署](docs/zh_CN/inference_deployment/paddle_serving_deploy.md)
- [端侧部署](docs/zh_CN/inference_deployment/paddle_lite_deploy.md)
- [Paddle2ONNX模型转化与预测](deploy/paddle2onnx/readme.md)
- [模型压缩](deploy/slim/README.md)
- [PP-ShiTu图像识别系统介绍](#图像识别系统介绍)
- [图像识别快速体验](docs/zh_CN/quick_start/quick_start_recognition.md)
- 模块介绍
- [主体检测](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)
- [特征提取](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md)
- [特征提取模型](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md)
- [向量检索](./docs/zh_CN/image_recognition_pipeline/vector_search.md)
- [骨干网络和预训练模型库](./docs/zh_CN/algorithm_introduction/ImageNet_models.md)
- 数据准备
- [图像分类数据集介绍](./docs/zh_CN/data_preparation/classification_dataset.md)
- [图像识别数据集介绍](./docs/zh_CN/data_preparation/recognition_dataset.md)
- 模型训练
- [图像分类任务](./docs/zh_CN/models_training/classification.md)
- [图像识别任务](./docs/zh_CN/models_training/recognition.md)
- [训练参数调整策略](./docs/zh_CN/models_training/train_strategy.md)
- [配置文件说明](./docs/zh_CN/models_training/config_description.md)
- 模型预测部署
- [模型导出](./docs/zh_CN/inference_deployment/export_model.md)
- Python/C++ 预测引擎
- [基于Python预测引擎预测推理](./docs/zh_CN/inference_deployment/python_deploy.md)
- [基于C++分类预测引擎预测推理](./docs/zh_CN/inference_deployment/cpp_deploy.md)[基于C++的PP-ShiTu预测引擎预测推理](deploy/cpp_shitu/readme.md)
- 服务化部署
- [Paddle Serving服务化部署(推荐)](./docs/zh_CN/inference_deployment/paddle_serving_deploy.md)
- [Hub serving服务化部署](./docs/zh_CN/inference_deployment/paddle_hub_serving_deploy.md)
- [端侧部署](./deploy/lite/readme.md)
- [whl包预测](./docs/zh_CN/inference_deployment/whl_deploy.md)
- 算法介绍
- [图像分类任务介绍](./docs/zh_CN/algorithm_introduction/image_classification.md)
- [度量学习介绍](./docs/zh_CN/algorithm_introduction/metric_learning.md)
- 高阶使用
- [数据增广](./docs/zh_CN/advanced_tutorials/DataAugmentation.md)
- [模型量化](./docs/zh_CN/advanced_tutorials/model_prune_quantization.md)
- [知识蒸馏](./docs/zh_CN/advanced_tutorials/knowledge_distillation.md)
- [PaddleClas结构解析](./docs/zh_CN/advanced_tutorials/code_overview.md)
- [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
- [哈希编码](docs/zh_CN/image_recognition_pipeline/)
- [模型训练](docs/zh_CN/models_training/recognition.md)
- 推理部署
- [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#2)
- [基于C++预测引擎推理](deploy/cpp_shitu/readme.md)
- [服务化部署](docs/zh_CN/inference_deployment/paddle_serving_deploy.md)
- [端侧部署](deploy/lite_shitu/README.md)
- PP系列骨干网络模型
- [PP-HGNet](docs/zh_CN/models/PP-HGNet.md)
- [PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)
- [PP-LCNet](docs/zh_CN/models/PP-LCNet.md)
- [SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)
- 前沿算法
- [骨干网络和预训练模型库](docs/zh_CN/algorithm_introduction/ImageNet_models.md)
- [度量学习](docs/zh_CN/algorithm_introduction/metric_learning.md)
- [模型压缩](docs/zh_CN/algorithm_introduction/model_prune_quantization.md)
- [模型蒸馏](docs/zh_CN/algorithm_introduction/knowledge_distillation.md)
- [数据增强](docs/zh_CN/advanced_tutorials/DataAugmentation.md)
- [产业实用范例库](docs/zh_CN/samples)
- [30分钟快速体验图像分类](docs/zh_CN/quick_start/quick_start_classification_new_user.md)
- FAQ
- [图像识别精选问题](docs/zh_CN/faq_series/faq_2021_s2.md)
- [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
- [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
- [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
- [图像识别精选问题](docs/zh_CN/faq_series/faq_2021_s2.md)
- [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
- [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
- [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
- [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
- [许可证书](#许可证书)
- [贡献代码](#贡献代码)
<a name="PULC超轻量图像分类方案"></a>
## PULC超轻量图像分类方案
<div align="center">
<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png" width = "800" />
</div>
PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法,可以自动训练得到轻量且高精度的图像分类模型。
PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型,CPU推理3ms,精度比肩SwinTransformer。
<a name="图像识别系统介绍"></a>
## PP-ShiTu图像识别系统介绍
## PP-ShiTu图像识别系统
<div align="center">
<img src="./docs/images/structure.jpg" width = "800" />
......@@ -105,6 +135,11 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick
PP-ShiTu是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化8个方面,采用多种策略,对各个模块的模型进行优化,最终得到在CPU上仅0.2s即可完成10w+库的图像识别的系统。更多细节请参考[PP-ShiTu技术方案](https://arxiv.org/pdf/2111.00775.pdf)
<a name="分类效果展示"></a>
## PULC实用图像分类模型效果展示
<div align="center">
<img src="docs/images/classification.gif">
</div>
<a name="识别效果展示"></a>
## PP-ShiTu图像识别系统效果展示
......
......@@ -4,39 +4,41 @@
## Introduction
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
**Recent updates**
- 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs.
For the introduction of PP-LCNet, please refer to [paper](https://arxiv.org/pdf/2109.15099.pdf) or [PP-LCNet model introduction](docs/en/models/PP-LCNet_en.md). The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
- 2021.06.29 Add Swin-transformer series model,Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- [more](./docs/en/update_history_en.md)
<div align="center">
<img src="./docs/images/class_simple_en.gif" width = "600" />
## Features
PULC demo images
</div>
&nbsp;
- A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 35 series, among which 6 selected series of models support fast structural modification.
<div align="center">
<img src="./docs/images/recognition.gif" width = "400" />
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
PP-ShiTu demo images
</div>
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
**Recent updates**
- 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario.
- 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs.
For the introduction of PP-LCNet, please refer to [paper](https://arxiv.org/pdf/2109.15099.pdf) or [PP-LCNet model introduction](docs/en/models/PP-LCNet_en.md). The metrics and pretrained model are available [here](docs/en/algorithm_introduction/ImageNet_models_en.md).
- 2021.06.29 Add [Swin-transformer](docs/en/models/SwinTransformer_en.md)) series model,Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/algorithm_introduction/ImageNet_models_en.md#16).
- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- [more](./docs/en/others/update_history_en.md)
## Features
PaddleClas release PP-HGNet、PP-LCNetv2、 PP-LCNet and **S**imple **S**emi-supervised **L**abel **D**istillation algorithms, and support plenty of
image classification and image recognition algorithms.
Based on th algorithms above, PaddleClas release PP-ShiTu image recognition system and [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](docs/en/PULC/PULC_quickstart_en.md).
<div align="center">
<img src="./docs/images/recognition_en.gif" width = "400" />
</div>
![](https://user-images.githubusercontent.com/19523330/173539361-68cf7ab1-7e3b-4e5e-b00f-1500719bd2a2.png)
## Welcome to Join the Technical Exchange Group
......@@ -48,41 +50,57 @@ Four sample solutions are provided, including product recognition, vehicle recog
</div>
## Quick Start
Quick experience of image recognition:[Link](./docs/en/tutorials/quick_start_recognition_en.md)
Quick experience of PP-ShiTu image recognition system:[Link](./docs/en/quick_start/quick_start_recognition_en.md)
Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassification models:[Link](docs/en/PULC/PULC_quickstart_en.md)
## Tutorials
- [Quick Installation](./docs/en/tutorials/install_en.md)
- [Quick Start of Recognition](./docs/en/tutorials/quick_start_recognition_en.md)
- [Install Paddle](./docs/en/installation/install_paddle_en.md)
- [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md)
- [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md)
- [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md)
- [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md)
- [PULC Classification Model of Someone or Nobody](docs/en/PULC/PULC_person_exists_en.md)
- [PULC Recognition Model of Person Attribute](docs/en/PULC/PULC_person_attribute_en.md)
- [PULC Classification Model of Wearing or Unwearing Safety Helmet](docs/en/PULC/PULC_safety_helmet_en.md)
- [PULC Classification Model of Traffic Sign](docs/en/PULC/PULC_traffic_sign_en.md)
- [PULC Recognition Model of Vehicle Attribute](docs/en/PULC/PULC_vehicle_attribute_en.md)
- [PULC Classification Model of Containing or Uncontaining Car](docs/en/PULC/PULC_car_exists_en.md)
- [PULC Classification Model of Text Image Orientation](docs/en/PULC/PULC_text_image_orientation_en.md)
- [PULC Classification Model of Textline Orientation](docs/en/PULC/PULC_textline_orientation_en.md)
- [PULC Classification Model of Language](docs/en/PULC/PULC_language_classification_en.md)
- [Quick Start of Recognition](./docs/en/quick_start/quick_start_recognition_en.md)
- [Introduction to Image Recognition Systems](#Introduction_to_Image_Recognition_Systems)
- [Demo images](#Demo_images)
- [Image Recognition Demo images](#Rec_Demo_images)
- [PULC demo images](#Clas_Demo_images)
- Algorithms Introduction
- [Backbone Network and Pre-trained Model Library](./docs/en/ImageNet_models_en.md)
- [Mainbody Detection](./docs/en/application/mainbody_detection_en.md)
- [Image Classification](./docs/en/tutorials/image_classification_en.md)
- [Feature Learning](./docs/en/application/feature_learning_en.md)
- [Product Recognition](./docs/en/application/product_recognition_en.md)
- [Vehicle Recognition](./docs/en/application/vehicle_recognition_en.md)
- [Logo Recognition](./docs/en/application/logo_recognition_en.md)
- [Animation Character Recognition](./docs/en/application/cartoon_character_recognition_en.md)
- [Backbone Network and Pre-trained Model Library](./docs/en/algorithm_introduction/ImageNet_models_en.md)
- [Mainbody Detection](./docs/en/image_recognition_pipeline/mainbody_detection_en.md)
- [Feature Learning](./docs/en/image_recognition_pipeline/feature_extraction_en.md)
- [Vector Search](./deploy/vector_search/README.md)
- Models Training/Evaluation
- [Image Classification](./docs/en/tutorials/getting_started_en.md)
- [Feature Learning](./docs/en/tutorials/getting_started_retrieval_en.md)
- Inference Model Prediction
- [Python Inference](./docs/en/inference.md)
- [Python Inference](./docs/en/inference_deployment/python_deploy_en.md)
- [C++ Classfication Inference](./deploy/cpp/readme_en.md)[C++ PP-ShiTu Inference](deploy/cpp_shitu/readme_en.md)
- Model Deploy (only support classification for now, recognition coming soon)
- [Hub Serving Deployment](./deploy/hubserving/readme_en.md)
- [Mobile Deployment](./deploy/lite/readme_en.md)
- [Inference Using whl](./docs/en/whl_en.md)
- [Inference Using whl](./docs/en/inference_deployment/whl_deploy_en.md)
- Advanced Tutorial
- [Knowledge Distillation](./docs/en/advanced_tutorials/distillation/distillation_en.md)
- [Model Quantization](./docs/en/extension/paddle_quantization_en.md)
- [Data Augmentation](./docs/en/advanced_tutorials/image_augmentation/ImageAugment_en.md)
- [Model Quantization](./docs/en/algorithm_introduction/model_prune_quantization_en.md)
- [Data Augmentation](./docs/en/advanced_tutorials/DataAugmentation_en.md)
- [License](#License)
- [Contribution](#Contribution)
<a name="Introduction_to_PULC"></a>
## Introduction to Practical Ultra Light-weight image Classification solutions
<div align="center">
<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png" width = "800" />
</div>
PULC solutions consists of PP-LCNet light-weight backbone, SSLD pretrained models, Ensemble of Data Augmentation strategy and SKL-UGI knowledge distillation.
PULC models inference within 3ms on CPU devices, with accuracy comparable with SwinTransformer. We also release 9 practical models covering pedestrian, vehicle and OCR.
<a name="Introduction_to_Image_Recognition_Systems"></a>
## Introduction to Image Recognition Systems
......@@ -97,8 +115,14 @@ Image recognition can be divided into three steps:
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
<a name="Demo_images"></a>
## Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
<a name="Clas_Demo_images"></a>
## PULC demo images
<div align="center">
<img src="docs/images/classification_en.gif">
</div>
<a name="Rec_Demo_images"></a>
## Image Recognition Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
- Product recognition
<div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif" width = "400" />
......
Global:
infer_imgs: "./images/PULC/car_exists/objects365_00001507.jpeg"
inference_model_dir: "./models/car_exists_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: ThreshOutput
ThreshOutput:
threshold: 0.5
label_0: no_car
label_1: contains_car
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/language_classification/word_35404.png"
inference_model_dir: "./models/language_classification_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [160, 80]
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: Topk
Topk:
topk: 2
class_id_map_file: "../ppcls/utils/PULC_label_list/language_classification_label_list.txt"
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/person_attribute/090004.jpg"
inference_model_dir: "./models/person_attribute_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: True
cpu_num_threads: 10
benchmark: False
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [192, 256]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: PersonAttribute
PersonAttribute:
threshold: 0.5 #default threshold
glasses_threshold: 0.3 #threshold only for glasses
hold_threshold: 0.6 #threshold only for hold
Global:
infer_imgs: "./images/PULC/person_exists/objects365_02035329.jpg"
inference_model_dir: "./models/person_exists_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: ThreshOutput
ThreshOutput:
threshold: 0.5
label_0: nobody
label_1: someone
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/safety_helmet/safety_helmet_test_1.png"
inference_model_dir: "./models/safety_helmet_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: ThreshOutput
ThreshOutput:
threshold: 0.5
label_0: wearing_helmet
label_1: unwearing_helmet
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/text_image_orientation/img_rot0_demo.jpg"
inference_model_dir: "./models/text_image_orientation_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: Topk
Topk:
topk: 2
class_id_map_file: "../ppcls/utils/PULC_label_list/text_image_orientation_label_list.txt"
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/textline_orientation/textline_orientation_test_0_0.png"
inference_model_dir: "./models/textline_orientation_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: True
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [160, 80]
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: Topk
Topk:
topk: 1
class_id_map_file: "../ppcls/utils/PULC_label_list/textline_orientation_label_list.txt"
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/traffic_sign/99603_17806.jpg"
inference_model_dir: "./models/traffic_sign_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: True
cpu_num_threads: 10
benchmark: False
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: Topk
Topk:
topk: 5
class_id_map_file: "../ppcls/utils/PULC_label_list/traffic_sign_label_list.txt"
SavePreLabel:
save_dir: ./pre_label/
Global:
infer_imgs: "./images/PULC/vehicle_attribute/0002_c002_00030670_0.jpg"
inference_model_dir: "./models/vehicle_attribute_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: True
cpu_num_threads: 10
benchmark: False
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [256, 192]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: VehicleAttribute
VehicleAttribute:
color_threshold: 0.5
type_threshold: 0.5
Global:
infer_imgs: "./images/Pedestrain_Attr.jpg"
inference_model_dir: "../inference/"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [192, 256]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: PersonAttribute
PersonAttribute:
threshold: 0.5 #default threshold
glasses_threshold: 0.3 #threshold only for glasses
hold_threshold: 0.6 #threshold only for hold
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
use_gpu: True
......
Global:
infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
infer_imgs: "./images/ImageNet/ILSVRC2012_val_00000010.jpeg"
inference_model_dir: "./models"
batch_size: 1
use_gpu: True
......@@ -32,4 +32,4 @@ PostProcess:
topk: 5
class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
SavePreLabel:
save_dir: ./pre_label/
\ No newline at end of file
save_dir: ./pre_label/
Global:
infer_imgs: "./images/ILSVRC2012_val_00000010.jpeg"
infer_imgs: "./images/ImageNet/ILSVRC2012_val_00000010.jpeg"
inference_model_dir: "./models"
batch_size: 1
use_gpu: True
......@@ -32,4 +32,4 @@ PostProcess:
topk: 5
class_id_map_file: "../ppcls/utils/imagenet1k_label_list.txt"
SavePreLabel:
save_dir: ./pre_label/
\ No newline at end of file
save_dir: ./pre_label/
......@@ -5,7 +5,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 1
labe_list:
label_list:
- foreground
# inference engine config
......
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
use_gpu: True
......
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
use_gpu: True
......
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
use_gpu: True
......
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
use_gpu: True
......
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
# inference engine config
......
......@@ -8,7 +8,7 @@ Global:
image_shape: [3, 640, 640]
threshold: 0.2
max_det_results: 5
labe_list:
label_list:
- foreground
use_gpu: True
......
......@@ -33,106 +33,106 @@ using namespace paddle_infer;
namespace Detection {
// Object Detection Result
struct ObjectResult {
// Rectangle coordinates of detected object: left, right, top, down
std::vector<int> rect;
// Class id of detected object
int class_id;
// Confidence of detected object
float confidence;
};
struct ObjectResult {
// Rectangle coordinates of detected object: left, right, top, down
std::vector<int> rect;
// Class id of detected object
int class_id;
// Confidence of detected object
float confidence;
};
// Generate visualization colormap for each class
std::vector<int> GenerateColorMap(int num_class);
std::vector<int> GenerateColorMap(int num_class);
// Visualiztion Detection Result
cv::Mat VisualizeResult(const cv::Mat &img,
const std::vector <ObjectResult> &results,
const std::vector <std::string> &lables,
const std::vector<int> &colormap, const bool is_rbox);
class ObjectDetector {
public:
explicit ObjectDetector(const YAML::Node &config_file) {
this->use_gpu_ = config_file["Global"]["use_gpu"].as<bool>();
if (config_file["Global"]["gpu_id"].IsDefined())
this->gpu_id_ = config_file["Global"]["gpu_id"].as<int>();
this->gpu_mem_ = config_file["Global"]["gpu_mem"].as<int>();
this->cpu_math_library_num_threads_ =
config_file["Global"]["cpu_num_threads"].as<int>();
this->use_mkldnn_ = config_file["Global"]["enable_mkldnn"].as<bool>();
this->use_tensorrt_ = config_file["Global"]["use_tensorrt"].as<bool>();
this->use_fp16_ = config_file["Global"]["use_fp16"].as<bool>();
this->model_dir_ =
config_file["Global"]["det_inference_model_dir"].as<std::string>();
this->threshold_ = config_file["Global"]["threshold"].as<float>();
this->max_det_results_ = config_file["Global"]["max_det_results"].as<int>();
this->image_shape_ =
config_file["Global"]["image_shape"].as < std::vector < int >> ();
this->label_list_ =
config_file["Global"]["labe_list"].as < std::vector < std::string >> ();
this->ir_optim_ = config_file["Global"]["ir_optim"].as<bool>();
this->batch_size_ = config_file["Global"]["batch_size"].as<int>();
preprocessor_.Init(config_file["DetPreProcess"]["transform_ops"]);
LoadModel(model_dir_, batch_size_, run_mode);
}
// Load Paddle inference model
void LoadModel(const std::string &model_dir, const int batch_size = 1,
const std::string &run_mode = "fluid");
// Run predictor
void Predict(const std::vector <cv::Mat> imgs, const int warmup = 0,
const int repeats = 1,
std::vector <ObjectResult> *result = nullptr,
std::vector<int> *bbox_num = nullptr,
std::vector<double> *times = nullptr);
const std::vector <std::string> &GetLabelList() const {
return this->label_list_;
}
const float &GetThreshold() const { return this->threshold_; }
private:
bool use_gpu_ = true;
int gpu_id_ = 0;
int gpu_mem_ = 800;
int cpu_math_library_num_threads_ = 6;
std::string run_mode = "fluid";
bool use_mkldnn_ = false;
bool use_tensorrt_ = false;
bool batch_size_ = 1;
bool use_fp16_ = false;
std::string model_dir_;
float threshold_ = 0.5;
float max_det_results_ = 5;
std::vector<int> image_shape_ = {3, 640, 640};
std::vector <std::string> label_list_;
bool ir_optim_ = true;
bool det_permute_ = true;
bool det_postprocess_ = true;
int min_subgraph_size_ = 30;
bool use_dynamic_shape_ = false;
int trt_min_shape_ = 1;
int trt_max_shape_ = 1280;
int trt_opt_shape_ = 640;
bool trt_calib_mode_ = false;
// Preprocess image and copy data to input buffer
void Preprocess(const cv::Mat &image_mat);
// Postprocess result
void Postprocess(const std::vector <cv::Mat> mats,
std::vector <ObjectResult> *result, std::vector<int> bbox_num,
bool is_rbox);
std::shared_ptr <Predictor> predictor_;
Preprocessor preprocessor_;
ImageBlob inputs_;
std::vector<float> output_data_;
std::vector<int> out_bbox_num_data_;
};
cv::Mat VisualizeResult(const cv::Mat &img,
const std::vector<ObjectResult> &results,
const std::vector<std::string> &lables,
const std::vector<int> &colormap, const bool is_rbox);
class ObjectDetector {
public:
explicit ObjectDetector(const YAML::Node &config_file) {
this->use_gpu_ = config_file["Global"]["use_gpu"].as<bool>();
if (config_file["Global"]["gpu_id"].IsDefined())
this->gpu_id_ = config_file["Global"]["gpu_id"].as<int>();
this->gpu_mem_ = config_file["Global"]["gpu_mem"].as<int>();
this->cpu_math_library_num_threads_ =
config_file["Global"]["cpu_num_threads"].as<int>();
this->use_mkldnn_ = config_file["Global"]["enable_mkldnn"].as<bool>();
this->use_tensorrt_ = config_file["Global"]["use_tensorrt"].as<bool>();
this->use_fp16_ = config_file["Global"]["use_fp16"].as<bool>();
this->model_dir_ =
config_file["Global"]["det_inference_model_dir"].as<std::string>();
this->threshold_ = config_file["Global"]["threshold"].as<float>();
this->max_det_results_ = config_file["Global"]["max_det_results"].as<int>();
this->image_shape_ =
config_file["Global"]["image_shape"].as<std::vector<int>>();
this->label_list_ =
config_file["Global"]["label_list"].as<std::vector<std::string>>();
this->ir_optim_ = config_file["Global"]["ir_optim"].as<bool>();
this->batch_size_ = config_file["Global"]["batch_size"].as<int>();
preprocessor_.Init(config_file["DetPreProcess"]["transform_ops"]);
LoadModel(model_dir_, batch_size_, run_mode);
}
// Load Paddle inference model
void LoadModel(const std::string &model_dir, const int batch_size = 1,
const std::string &run_mode = "fluid");
// Run predictor
void Predict(const std::vector<cv::Mat> imgs, const int warmup = 0,
const int repeats = 1,
std::vector<ObjectResult> *result = nullptr,
std::vector<int> *bbox_num = nullptr,
std::vector<double> *times = nullptr);
const std::vector<std::string> &GetLabelList() const {
return this->label_list_;
}
const float &GetThreshold() const { return this->threshold_; }
private:
bool use_gpu_ = true;
int gpu_id_ = 0;
int gpu_mem_ = 800;
int cpu_math_library_num_threads_ = 6;
std::string run_mode = "fluid";
bool use_mkldnn_ = false;
bool use_tensorrt_ = false;
bool batch_size_ = 1;
bool use_fp16_ = false;
std::string model_dir_;
float threshold_ = 0.5;
float max_det_results_ = 5;
std::vector<int> image_shape_ = {3, 640, 640};
std::vector<std::string> label_list_;
bool ir_optim_ = true;
bool det_permute_ = true;
bool det_postprocess_ = true;
int min_subgraph_size_ = 30;
bool use_dynamic_shape_ = false;
int trt_min_shape_ = 1;
int trt_max_shape_ = 1280;
int trt_opt_shape_ = 640;
bool trt_calib_mode_ = false;
// Preprocess image and copy data to input buffer
void Preprocess(const cv::Mat &image_mat);
// Postprocess result
void Postprocess(const std::vector<cv::Mat> mats,
std::vector<ObjectResult> *result, std::vector<int> bbox_num,
bool is_rbox);
std::shared_ptr<Predictor> predictor_;
Preprocessor preprocessor_;
ImageBlob inputs_;
std::vector<float> output_data_;
std::vector<int> out_bbox_num_data_;
};
} // namespace Detection
......@@ -95,7 +95,7 @@ def main():
config_json["Global"]["det_model_path"] = args.det_model_path
config_json["Global"]["rec_model_path"] = args.rec_model_path
config_json["Global"]["rec_label_path"] = args.rec_label_path
config_json["Global"]["label_list"] = config_yaml["Global"]["labe_list"]
config_json["Global"]["label_list"] = config_yaml["Global"]["label_list"]
config_json["Global"]["rec_nms_thresold"] = config_yaml["Global"][
"rec_nms_thresold"]
config_json["Global"]["max_det_results"] = config_yaml["Global"][
......
# paddle2onnx 模型转化与预测
本章节介绍 ResNet50_vd 模型如何转化为 ONNX 模型,并基于 ONNX 引擎预测。
## 目录
- [paddle2onnx 模型转化与预测](#paddle2onnx-模型转化与预测)
- [1. 环境准备](#1-环境准备)
- [2. 模型转换](#2-模型转换)
- [3. onnx 预测](#3-onnx-预测)
## 1. 环境准备
需要准备 Paddle2ONNX 模型转化环境,和 ONNX 模型预测环境。
Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式,算子目前稳定支持导出 ONNX Opset 9~11,部分Paddle算子支持更低的ONNX Opset转换
更多细节可参考 [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
Paddle2ONNX 支持将 PaddlePaddle inference 模型格式转化到 ONNX 模型格式,算子目前稳定支持导出 ONNX Opset 9~11
更多细节可参考 [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX#paddle2onnx)
- 安装 Paddle2ONNX
```
python3.7 -m pip install paddle2onnx
```
```shell
python3.7 -m pip install paddle2onnx
```
- 安装 ONNX 运行时
```
python3.7 -m pip install onnxruntime
```
- 安装 ONNX 推理引擎
```shell
python3.7 -m pip install onnxruntime
```
下面以 ResNet50_vd 为例,介绍如何将 PaddlePaddle inference 模型转换为 ONNX 模型,并基于 ONNX 引擎预测。
## 2. 模型转换
- ResNet50_vd inference模型下载
```
cd deploy
mkdir models && cd models
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
cd ..
```
```shell
cd deploy
mkdir models && cd models
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
cd ..
```
- 模型转换
使用 Paddle2ONNX 将 Paddle 静态图模型转换为 ONNX 模型格式:
```
paddle2onnx --model_dir=./models/ResNet50_vd_infer/ \
--model_filename=inference.pdmodel \
--params_filename=inference.pdiparams \
--save_file=./models/ResNet50_vd_infer/inference.onnx \
--opset_version=10 \
--enable_onnx_checker=True
```
使用 Paddle2ONNX 将 Paddle 静态图模型转换为 ONNX 模型格式:
```shell
paddle2onnx --model_dir=./models/ResNet50_vd_infer/ \
--model_filename=inference.pdmodel \
--params_filename=inference.pdiparams \
--save_file=./models/ResNet50_vd_infer/inference.onnx \
--opset_version=10 \
--enable_onnx_checker=True
```
执行完毕后,ONNX 模型 `inference.onnx` 会被保存在 `./models/ResNet50_vd_infer/` 路径下
转换完毕后,生成的ONNX 模型 `inference.onnx` 会被保存在 `./models/ResNet50_vd_infer/` 路径下
## 3. onnx 预测
执行如下命令:
```
```shell
python3.7 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.use_onnx=True \
......
# Paddle2ONNX: Converting To ONNX and Deployment
This section introduce that how to convert the Paddle Inference Model ResNet50_vd to ONNX model and deployment based on ONNX engine.
## 1. Installation
First, you need to install Paddle2ONNX and onnxruntime. Paddle2ONNX is a toolkit to convert Paddle Inference Model to ONNX model. Please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_en.md) for more information.
- Paddle2ONNX Installation
```
python3.7 -m pip install paddle2onnx
```
- ONNX Installation
```
python3.7 -m pip install onnxruntime
```
## 2. Converting to ONNX
Download the Paddle Inference Model ResNet50_vd:
```
cd deploy
mkdir models && cd models
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
cd ..
```
Converting to ONNX model:
```
paddle2onnx --model_dir=./models/ResNet50_vd_infer/ \
--model_filename=inference.pdmodel \
--params_filename=inference.pdiparams \
--save_file=./models/ResNet50_vd_infer/inference.onnx \
--opset_version=10 \
--enable_onnx_checker=True
```
After running the above command, the ONNX model file converted would be save in `./models/ResNet50_vd_infer/`.
## 3. Deployment
Deployment with ONNX model, command is as shown below.
```
python3.7 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.use_onnx=True \
-o Global.use_gpu=False \
-o Global.inference_model_dir=./models/ResNet50_vd_infer
```
The prediction results:
```
ILSVRC2012_val_00000010.jpeg: class id(s): [153, 204, 229, 332, 155], score(s): [0.69, 0.10, 0.02, 0.01, 0.01], label_name(s): ['Maltese dog, Maltese terrier, Maltese', 'Lhasa, Lhasa apso', 'Old English sheepdog, bobtail', 'Angora, Angora rabbit', 'Shih-Tzu']
```
# 使用镜像:
# registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda10.1-cudnn7-gcc82
# 编译Serving Server:
# client和app可以直接使用release版本
# server因为加入了自定义OP,需要重新编译
# 默认编译时的${PWD}=PaddleClas/deploy/paddleserving/
python_name=${1:-'python'}
apt-get update
apt install -y libcurl4-openssl-dev libbz2-dev
wget -nc https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar
tar xf centos_ssl.tar
rm -rf centos_ssl.tar
mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k
mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k
ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10
ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10
ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so
ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so
# 安装go依赖
rm -rf /usr/local/go
wget -qO- https://paddle-ci.cdn.bcebos.com/go1.17.2.linux-amd64.tar.gz | tar -xz -C /usr/local
export GOROOT=/usr/local/go
export GOPATH=/root/gopath
export PATH=$PATH:$GOPATH/bin:$GOROOT/bin
go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct
go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway@v1.15.2
go install github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger@v1.15.2
go install github.com/golang/protobuf/protoc-gen-go@v1.4.3
go install google.golang.org/grpc@v1.33.0
go env -w GO111MODULE=auto
# 下载opencv库
wget https://paddle-qa.bj.bcebos.com/PaddleServing/opencv3.tar.gz
tar -xvf opencv3.tar.gz
rm -rf opencv3.tar.gz
export OPENCV_DIR=$PWD/opencv3
# clone Serving
git clone https://github.com/PaddlePaddle/Serving.git -b develop --depth=1
cd Serving # PaddleClas/deploy/paddleserving/Serving
export Serving_repo_path=$PWD
git submodule update --init --recursive
${python_name} -m pip install -r python/requirements.txt
# set env
export PYTHON_INCLUDE_DIR=$(${python_name} -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())")
export PYTHON_LIBRARIES=$(${python_name} -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
export PYTHON_EXECUTABLE=`which ${python_name}`
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY='/usr/local/cuda/lib64/'
export TENSORRT_LIBRARY_PATH='/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/'
# cp 自定义OP代码
\cp ../preprocess/general_clas_op.* ${Serving_repo_path}/core/general-server/op
\cp ../preprocess/preprocess_op.* ${Serving_repo_path}/core/predictor/tools/pp_shitu_tools
# 编译Server
mkdir server-build-gpu-opencv
cd server-build-gpu-opencv
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
-DOPENCV_DIR=${OPENCV_DIR} \
-DWITH_OPENCV=ON \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j32
${python_name} -m pip install python/dist/paddle*
# export SERVING_BIN
export SERVING_BIN=$PWD/core/general-server/serving
cd ../../
\ No newline at end of file
......@@ -30,4 +30,4 @@ op:
client_type: local_predictor
#Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["prediction"]
fetch_list: ["prediction"]
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "core/general-server/op/general_clas_op.h"
#include "core/predictor/framework/infer.h"
#include "core/predictor/framework/memory.h"
#include "core/predictor/framework/resource.h"
#include "core/util/include/timer.h"
#include <algorithm>
#include <iostream>
#include <memory>
#include <sstream>
namespace baidu {
namespace paddle_serving {
namespace serving {
using baidu::paddle_serving::Timer;
using baidu::paddle_serving::predictor::MempoolWrapper;
using baidu::paddle_serving::predictor::general_model::Tensor;
using baidu::paddle_serving::predictor::general_model::Response;
using baidu::paddle_serving::predictor::general_model::Request;
using baidu::paddle_serving::predictor::InferManager;
using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralClasOp::inference() {
VLOG(2) << "Going to run inference";
const std::vector<std::string> pre_node_names = pre_names();
if (pre_node_names.size() != 1) {
LOG(ERROR) << "This op(" << op_name()
<< ") can only have one predecessor op, but received "
<< pre_node_names.size();
return -1;
}
const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
if (!input_blob) {
LOG(ERROR) << "input_blob is nullptr,error";
return -1;
}
uint64_t log_id = input_blob->GetLogId();
VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>();
if (!output_blob) {
LOG(ERROR) << "output_blob is nullptr,error";
return -1;
}
output_blob->SetLogId(log_id);
if (!input_blob) {
LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable depended argument, op:" << pre_name;
return -1;
}
const TensorVector *in = &input_blob->tensor_vector;
TensorVector *out = &output_blob->tensor_vector;
int batch_size = input_blob->_batch_size;
output_blob->_batch_size = batch_size;
VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size;
Timer timeline;
int64_t start = timeline.TimeStampUS();
timeline.Start();
// only support string type
char *total_input_ptr = static_cast<char *>(in->at(0).data.data());
std::string base64str = total_input_ptr;
cv::Mat img = Base2Mat(base64str);
// RGB2BGR
cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
// Resize
cv::Mat resize_img;
resize_op_.Run(img, resize_img, resize_short_size_);
// CenterCrop
crop_op_.Run(resize_img, crop_size_);
// Normalize
normalize_op_.Run(&resize_img, mean_, scale_, is_scale_);
// Permute
std::vector<float> input(1 * 3 * resize_img.rows * resize_img.cols, 0.0f);
permute_op_.Run(&resize_img, input.data());
float maxValue = *max_element(input.begin(), input.end());
float minValue = *min_element(input.begin(), input.end());
TensorVector *real_in = new TensorVector();
if (!real_in) {
LOG(ERROR) << "real_in is nullptr,error";
return -1;
}
std::vector<int> input_shape;
int in_num = 0;
void *databuf_data = NULL;
char *databuf_char = NULL;
size_t databuf_size = 0;
input_shape = {1, 3, resize_img.rows, resize_img.cols};
in_num = std::accumulate(input_shape.begin(), input_shape.end(), 1,
std::multiplies<int>());
databuf_size = in_num * sizeof(float);
databuf_data = MempoolWrapper::instance().malloc(databuf_size);
if (!databuf_data) {
LOG(ERROR) << "Malloc failed, size: " << databuf_size;
return -1;
}
memcpy(databuf_data, input.data(), databuf_size);
databuf_char = reinterpret_cast<char *>(databuf_data);
paddle::PaddleBuf paddleBuf(databuf_char, databuf_size);
paddle::PaddleTensor tensor_in;
tensor_in.name = in->at(0).name;
tensor_in.dtype = paddle::PaddleDType::FLOAT32;
tensor_in.shape = {1, 3, resize_img.rows, resize_img.cols};
tensor_in.lod = in->at(0).lod;
tensor_in.data = paddleBuf;
real_in->push_back(tensor_in);
if (InferManager::instance().infer(engine_name().c_str(), real_in, out,
batch_size)) {
LOG(ERROR) << "(logid=" << log_id
<< ") Failed do infer in fluid model: " << engine_name().c_str();
return -1;
}
int64_t end = timeline.TimeStampUS();
CopyBlobInfo(input_blob, output_blob);
AddBlobInfo(output_blob, start);
AddBlobInfo(output_blob, end);
return 0;
}
cv::Mat GeneralClasOp::Base2Mat(std::string &base64_data) {
cv::Mat img;
std::string s_mat;
s_mat = base64Decode(base64_data.data(), base64_data.size());
std::vector<char> base64_img(s_mat.begin(), s_mat.end());
img = cv::imdecode(base64_img, cv::IMREAD_COLOR); // CV_LOAD_IMAGE_COLOR
return img;
}
std::string GeneralClasOp::base64Decode(const char *Data, int DataByte) {
const char DecodeTable[] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
62, // '+'
0, 0, 0,
63, // '/'
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, // '0'-'9'
0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, // 'A'-'Z'
0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, // 'a'-'z'
};
std::string strDecode;
int nValue;
int i = 0;
while (i < DataByte) {
if (*Data != '\r' && *Data != '\n') {
nValue = DecodeTable[*Data++] << 18;
nValue += DecodeTable[*Data++] << 12;
strDecode += (nValue & 0x00FF0000) >> 16;
if (*Data != '=') {
nValue += DecodeTable[*Data++] << 6;
strDecode += (nValue & 0x0000FF00) >> 8;
if (*Data != '=') {
nValue += DecodeTable[*Data++];
strDecode += nValue & 0x000000FF;
}
}
i += 4;
} else // 回车换行,跳过
{
Data++;
i++;
}
}
return strDecode;
}
DEFINE_OP(GeneralClasOp);
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "core/general-server/general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h"
#include "core/predictor/tools/pp_shitu_tools/preprocess_op.h"
#include "paddle_inference_api.h" // NOLINT
#include <string>
#include <vector>
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <chrono>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <vector>
#include <cstring>
#include <fstream>
#include <numeric>
namespace baidu {
namespace paddle_serving {
namespace serving {
class GeneralClasOp
: public baidu::paddle_serving::predictor::OpWithChannel<GeneralBlob> {
public:
typedef std::vector<paddle::PaddleTensor> TensorVector;
DECLARE_OP(GeneralClasOp);
int inference();
private:
// clas preprocess
std::vector<float> mean_ = {0.485f, 0.456f, 0.406f};
std::vector<float> scale_ = {0.229f, 0.224f, 0.225f};
bool is_scale_ = true;
int resize_short_size_ = 256;
int crop_size_ = 224;
PaddleClas::ResizeImg resize_op_;
PaddleClas::Normalize normalize_op_;
PaddleClas::Permute permute_op_;
PaddleClas::CenterCropImg crop_op_;
// read pics
cv::Mat Base2Mat(std::string &base64_data);
std::string base64Decode(const char *Data, int DataByte);
};
} // namespace serving
} // namespace paddle_serving
} // namespace baidu
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include "paddle_api.h"
#include "paddle_inference_api.h"
#include <chrono>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <vector>
#include <cstring>
#include <fstream>
#include <math.h>
#include <numeric>
#include "preprocess_op.h"
namespace Feature {
void Permute::Run(const cv::Mat *im, float *data) {
int rh = im->rows;
int rw = im->cols;
int rc = im->channels();
for (int i = 0; i < rc; ++i) {
cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i);
}
}
void Normalize::Run(cv::Mat *im, const std::vector<float> &mean,
const std::vector<float> &std, float scale) {
(*im).convertTo(*im, CV_32FC3, scale);
for (int h = 0; h < im->rows; h++) {
for (int w = 0; w < im->cols; w++) {
im->at<cv::Vec3f>(h, w)[0] =
(im->at<cv::Vec3f>(h, w)[0] - mean[0]) / std[0];
im->at<cv::Vec3f>(h, w)[1] =
(im->at<cv::Vec3f>(h, w)[1] - mean[1]) / std[1];
im->at<cv::Vec3f>(h, w)[2] =
(im->at<cv::Vec3f>(h, w)[2] - mean[2]) / std[2];
}
}
}
void CenterCropImg::Run(cv::Mat &img, const int crop_size) {
int resize_w = img.cols;
int resize_h = img.rows;
int w_start = int((resize_w - crop_size) / 2);
int h_start = int((resize_h - crop_size) / 2);
cv::Rect rect(w_start, h_start, crop_size, crop_size);
img = img(rect);
}
void ResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
int resize_short_size, int size) {
int resize_h = 0;
int resize_w = 0;
if (size > 0) {
resize_h = size;
resize_w = size;
} else {
int w = img.cols;
int h = img.rows;
float ratio = 1.f;
if (h < w) {
ratio = float(resize_short_size) / float(h);
} else {
ratio = float(resize_short_size) / float(w);
}
resize_h = round(float(h) * ratio);
resize_w = round(float(w) * ratio);
}
cv::resize(img, resize_img, cv::Size(resize_w, resize_h));
}
} // namespace Feature
namespace PaddleClas {
void Permute::Run(const cv::Mat *im, float *data) {
int rh = im->rows;
int rw = im->cols;
int rc = im->channels();
for (int i = 0; i < rc; ++i) {
cv::extractChannel(*im, cv::Mat(rh, rw, CV_32FC1, data + i * rh * rw), i);
}
}
void Normalize::Run(cv::Mat *im, const std::vector<float> &mean,
const std::vector<float> &scale, const bool is_scale) {
double e = 1.0;
if (is_scale) {
e /= 255.0;
}
(*im).convertTo(*im, CV_32FC3, e);
for (int h = 0; h < im->rows; h++) {
for (int w = 0; w < im->cols; w++) {
im->at<cv::Vec3f>(h, w)[0] =
(im->at<cv::Vec3f>(h, w)[0] - mean[0]) / scale[0];
im->at<cv::Vec3f>(h, w)[1] =
(im->at<cv::Vec3f>(h, w)[1] - mean[1]) / scale[1];
im->at<cv::Vec3f>(h, w)[2] =
(im->at<cv::Vec3f>(h, w)[2] - mean[2]) / scale[2];
}
}
}
void CenterCropImg::Run(cv::Mat &img, const int crop_size) {
int resize_w = img.cols;
int resize_h = img.rows;
int w_start = int((resize_w - crop_size) / 2);
int h_start = int((resize_h - crop_size) / 2);
cv::Rect rect(w_start, h_start, crop_size, crop_size);
img = img(rect);
}
void ResizeImg::Run(const cv::Mat &img, cv::Mat &resize_img,
int resize_short_size) {
int w = img.cols;
int h = img.rows;
float ratio = 1.f;
if (h < w) {
ratio = float(resize_short_size) / float(h);
} else {
ratio = float(resize_short_size) / float(w);
}
int resize_h = round(float(h) * ratio);
int resize_w = round(float(w) * ratio);
cv::resize(img, resize_img, cv::Size(resize_w, resize_h));
}
} // namespace PaddleClas
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "opencv2/core.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <chrono>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <vector>
#include <cstring>
#include <fstream>
#include <numeric>
namespace Feature {
class Normalize {
public:
virtual void Run(cv::Mat *im, const std::vector<float> &mean,
const std::vector<float> &std, float scale);
};
// RGB -> CHW
class Permute {
public:
virtual void Run(const cv::Mat *im, float *data);
};
class CenterCropImg {
public:
virtual void Run(cv::Mat &im, const int crop_size = 224);
};
class ResizeImg {
public:
virtual void Run(const cv::Mat &img, cv::Mat &resize_img, int max_size_len,
int size = 0);
};
} // namespace Feature
namespace PaddleClas {
class Normalize {
public:
virtual void Run(cv::Mat *im, const std::vector<float> &mean,
const std::vector<float> &scale, const bool is_scale = true);
};
// RGB -> CHW
class Permute {
public:
virtual void Run(const cv::Mat *im, float *data);
};
class CenterCropImg {
public:
virtual void Run(cv::Mat &im, const int crop_size = 224);
};
class ResizeImg {
public:
virtual void Run(const cv::Mat &img, cv::Mat &resize_img, int max_size_len);
};
} // namespace PaddleClas
......@@ -31,7 +31,7 @@ op:
#Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["features"]
det:
concurrency: 1
local_service_conf:
......
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
feed_var {
name: "boxes"
alias_name: "boxes"
is_lod_tensor: false
feed_type: 1
shape: 6
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: false
fetch_type: 1
shape: 512
}
fetch_var {
name: "boxes"
alias_name: "boxes"
is_lod_tensor: false
fetch_type: 1
shape: 6
}
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
feed_var {
name: "boxes"
alias_name: "boxes"
is_lod_tensor: false
feed_type: 1
shape: 6
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: false
fetch_type: 1
shape: 512
}
fetch_var {
name: "boxes"
alias_name: "boxes"
is_lod_tensor: false
fetch_type: 1
shape: 6
}
feed_var {
name: "im_shape"
alias_name: "im_shape"
is_lod_tensor: false
feed_type: 1
shape: 2
}
feed_var {
name: "image"
alias_name: "image"
is_lod_tensor: false
feed_type: 7
shape: -1
shape: -1
shape: 3
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "save_infer_model/scale_0.tmp_1"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
fetch_var {
name: "save_infer_model/scale_1.tmp_1"
alias_name: "save_infer_model/scale_1.tmp_1"
is_lod_tensor: false
fetch_type: 2
}
feed_var {
name: "im_shape"
alias_name: "im_shape"
is_lod_tensor: false
feed_type: 1
shape: 2
}
feed_var {
name: "image"
alias_name: "image"
is_lod_tensor: false
feed_type: 7
shape: -1
shape: -1
shape: 3
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "save_infer_model/scale_0.tmp_1"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
fetch_var {
name: "save_infer_model/scale_1.tmp_1"
alias_name: "save_infer_model/scale_1.tmp_1"
is_lod_tensor: false
fetch_type: 2
}
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import numpy as np
from paddle_serving_client import Client
......@@ -22,181 +21,101 @@ import faiss
import os
import pickle
class MainbodyDetect():
"""
pp-shitu mainbody detect.
include preprocess, process, postprocess
return detect results
Attention: Postprocess include num limit and box filter; no nms
"""
def __init__(self):
self.preprocess = DetectionSequential([
DetectionFile2Image(), DetectionNormalize(
[0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True),
DetectionResize(
(640, 640), False, interpolation=2), DetectionTranspose(
(2, 0, 1))
])
self.client = Client()
self.client.load_client_config(
"../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/serving_client_conf.prototxt"
)
self.client.connect(['127.0.0.1:9293'])
self.max_det_result = 5
self.conf_threshold = 0.2
def predict(self, imgpath):
im, im_info = self.preprocess(imgpath)
im_shape = np.array(im.shape[1:]).reshape(-1)
scale_factor = np.array(list(im_info['scale_factor'])).reshape(-1)
fetch_map = self.client.predict(
feed={
"image": im,
"im_shape": im_shape,
"scale_factor": scale_factor,
},
fetch=["save_infer_model/scale_0.tmp_1"],
batch=False)
return self.postprocess(fetch_map, imgpath)
def postprocess(self, fetch_map, imgpath):
#1. get top max_det_result
det_results = fetch_map["save_infer_model/scale_0.tmp_1"]
if len(det_results) > self.max_det_result:
boxes_reserved = fetch_map[
"save_infer_model/scale_0.tmp_1"][:self.max_det_result]
else:
boxes_reserved = det_results
#2. do conf threshold
boxes_list = []
for i in range(boxes_reserved.shape[0]):
if (boxes_reserved[i, 1]) > self.conf_threshold:
boxes_list.append(boxes_reserved[i, :])
#3. add origin image box
origin_img = cv2.imread(imgpath)
boxes_list.append(
np.array([0, 1.0, 0, 0, origin_img.shape[1], origin_img.shape[0]]))
return np.array(boxes_list)
class ObjectRecognition():
"""
pp-shitu object recognion for all objects detected by MainbodyDetect.
include preprocess, process, postprocess
preprocess include preprocess for each image and batching.
Batch process
postprocess include retrieval and nms
"""
def __init__(self):
self.client = Client()
self.client.load_client_config(
"../../models/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt"
)
self.client.connect(["127.0.0.1:9294"])
self.seq = Sequential([
BGR2RGB(), Resize((224, 224)), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
False), Transpose((2, 0, 1))
])
self.searcher, self.id_map = self.init_index()
self.rec_nms_thresold = 0.05
self.rec_score_thres = 0.5
self.feature_normalize = True
self.return_k = 1
def init_index(self):
index_dir = "../../drink_dataset_v1.0/index"
assert os.path.exists(os.path.join(
index_dir, "vector.index")), "vector.index not found ..."
assert os.path.exists(os.path.join(
index_dir, "id_map.pkl")), "id_map.pkl not found ... "
searcher = faiss.read_index(os.path.join(index_dir, "vector.index"))
with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
id_map = pickle.load(fd)
return searcher, id_map
def predict(self, det_boxes, imgpath):
#1. preprocess
batch_imgs = []
origin_img = cv2.imread(imgpath)
for i in range(det_boxes.shape[0]):
box = det_boxes[i]
x1, y1, x2, y2 = [int(x) for x in box[2:]]
cropped_img = origin_img[y1:y2, x1:x2, :].copy()
tmp = self.seq(cropped_img)
batch_imgs.append(tmp)
batch_imgs = np.array(batch_imgs)
#2. process
fetch_map = self.client.predict(
feed={"x": batch_imgs}, fetch=["features"], batch=True)
batch_features = fetch_map["features"]
#3. postprocess
if self.feature_normalize:
feas_norm = np.sqrt(
np.sum(np.square(batch_features), axis=1, keepdims=True))
batch_features = np.divide(batch_features, feas_norm)
scores, docs = self.searcher.search(batch_features, self.return_k)
results = []
for i in range(scores.shape[0]):
pred = {}
if scores[i][0] >= self.rec_score_thres:
pred["bbox"] = [int(x) for x in det_boxes[i, 2:]]
pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
pred["rec_scores"] = scores[i][0]
results.append(pred)
return self.nms_to_rec_results(results)
def nms_to_rec_results(self, results):
filtered_results = []
x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
scores = np.array([r["rec_scores"] for r in results])
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
while order.size > 0:
i = order[0]
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= self.rec_nms_thresold)[0]
order = order[inds + 1]
filtered_results.append(results[i])
return filtered_results
rec_nms_thresold = 0.05
rec_score_thres = 0.5
feature_normalize = True
return_k = 1
index_dir = "../../drink_dataset_v1.0/index"
def init_index(index_dir):
assert os.path.exists(os.path.join(
index_dir, "vector.index")), "vector.index not found ..."
assert os.path.exists(os.path.join(
index_dir, "id_map.pkl")), "id_map.pkl not found ... "
searcher = faiss.read_index(os.path.join(index_dir, "vector.index"))
with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
id_map = pickle.load(fd)
return searcher, id_map
#get box
def nms_to_rec_results(results, thresh=0.1):
filtered_results = []
x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
scores = np.array([r["rec_scores"] for r in results])
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
while order.size > 0:
i = order[0]
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
filtered_results.append(results[i])
return filtered_results
def postprocess(fetch_dict, feature_normalize, det_boxes, searcher, id_map,
return_k, rec_score_thres, rec_nms_thresold):
batch_features = fetch_dict["features"]
#do feature norm
if feature_normalize:
feas_norm = np.sqrt(
np.sum(np.square(batch_features), axis=1, keepdims=True))
batch_features = np.divide(batch_features, feas_norm)
scores, docs = searcher.search(batch_features, return_k)
results = []
for i in range(scores.shape[0]):
pred = {}
if scores[i][0] >= rec_score_thres:
pred["bbox"] = [int(x) for x in det_boxes[i, 2:]]
pred["rec_docs"] = id_map[docs[i][0]].split()[1]
pred["rec_scores"] = scores[i][0]
results.append(pred)
#do nms
results = nms_to_rec_results(results, rec_nms_thresold)
return results
#do client
if __name__ == "__main__":
det = MainbodyDetect()
rec = ObjectRecognition()
#1. get det_results
imgpath = "../../drink_dataset_v1.0/test_images/001.jpeg"
det_results = det.predict(imgpath)
#2. get rec_results
rec_results = rec.predict(det_results, imgpath)
print(rec_results)
client = Client()
client.load_client_config([
"../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client",
"../../models/general_PPLCNet_x2_5_lite_v1.0_client"
])
client.connect(['127.0.0.1:9400'])
im = cv2.imread("../../drink_dataset_v1.0/test_images/001.jpeg")
im_shape = np.array(im.shape[:2]).reshape(-1)
fetch_map = client.predict(
feed={"image": im,
"im_shape": im_shape},
fetch=["features", "boxes"],
batch=False)
#add retrieval procedure
det_boxes = fetch_map["boxes"]
searcher, id_map = init_index(index_dir)
results = postprocess(fetch_map, feature_normalize, det_boxes, searcher,
id_map, return_k, rec_score_thres, rec_nms_thresold)
print(results)
......@@ -12,16 +12,20 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import base64
import time
from paddle_serving_client import Client
#app
from paddle_serving_app.reader import Sequential, URL2Image, Resize
from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
import time
def bytes_to_base64(image: bytes) -> str:
"""encode bytes into base64 string
"""
return base64.b64encode(image).decode('utf8')
client = Client()
client.load_client_config("./ResNet50_vd_serving/serving_server_conf.prototxt")
client.load_client_config("./ResNet50_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
label_dict = {}
......@@ -31,22 +35,17 @@ with open("imagenet.label") as fin:
label_dict[label_idx] = line.strip()
label_idx += 1
#preprocess
seq = Sequential([
URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
start = time.time()
image_file = "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"
image_file = "./daisy.jpg"
for i in range(1):
img = seq(image_file)
fetch_map = client.predict(
feed={"inputs": img}, fetch=["prediction"], batch=False)
prob = max(fetch_map["prediction"][0])
label = label_dict[fetch_map["prediction"][0].tolist().index(prob)].strip(
).replace(",", "")
print("prediction: {}, probability: {}".format(label, prob))
end = time.time()
print(end - start)
start = time.time()
with open(image_file, 'rb') as img_file:
image_data = img_file.read()
image = bytes_to_base64(image_data)
fetch_dict = client.predict(
feed={"inputs": image}, fetch=["prediction"], batch=False)
prob = max(fetch_dict["prediction"][0])
label = label_dict[fetch_dict["prediction"][0].tolist().index(
prob)].strip().replace(",", "")
print("prediction: {}, probability: {}".format(label, prob))
end = time.time()
print(end - start)
......@@ -53,6 +53,34 @@ class PostProcesser(object):
return rtn
class ThreshOutput(object):
def __init__(self, threshold, label_0="0", label_1="1"):
self.threshold = threshold
self.label_0 = label_0
self.label_1 = label_1
def __call__(self, x, file_names=None):
y = []
for idx, probs in enumerate(x):
score = probs[1]
if score < self.threshold:
result = {
"class_ids": [0],
"scores": [1 - score],
"label_names": [self.label_0]
}
else:
result = {
"class_ids": [1],
"scores": [score],
"label_names": [self.label_1]
}
if file_names is not None:
result["file_name"] = file_names[idx]
y.append(result)
return y
class Topk(object):
def __init__(self, topk=1, class_id_map_file=None):
assert isinstance(topk, (int, ))
......@@ -159,3 +187,136 @@ class Binarize(object):
byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)
return byte
class PersonAttribute(object):
def __init__(self,
threshold=0.5,
glasses_threshold=0.3,
hold_threshold=0.6):
self.threshold = threshold
self.glasses_threshold = glasses_threshold
self.hold_threshold = hold_threshold
def __call__(self, batch_preds, file_names=None):
# postprocess output of predictor
age_list = ['AgeLess18', 'Age18-60', 'AgeOver60']
direct_list = ['Front', 'Side', 'Back']
bag_list = ['HandBag', 'ShoulderBag', 'Backpack']
upper_list = ['UpperStride', 'UpperLogo', 'UpperPlaid', 'UpperSplice']
lower_list = [
'LowerStripe', 'LowerPattern', 'LongCoat', 'Trousers', 'Shorts',
'Skirt&Dress'
]
batch_res = []
for res in batch_preds:
res = res.tolist()
label_res = []
# gender
gender = 'Female' if res[22] > self.threshold else 'Male'
label_res.append(gender)
# age
age = age_list[np.argmax(res[19:22])]
label_res.append(age)
# direction
direction = direct_list[np.argmax(res[23:])]
label_res.append(direction)
# glasses
glasses = 'Glasses: '
if res[1] > self.glasses_threshold:
glasses += 'True'
else:
glasses += 'False'
label_res.append(glasses)
# hat
hat = 'Hat: '
if res[0] > self.threshold:
hat += 'True'
else:
hat += 'False'
label_res.append(hat)
# hold obj
hold_obj = 'HoldObjectsInFront: '
if res[18] > self.hold_threshold:
hold_obj += 'True'
else:
hold_obj += 'False'
label_res.append(hold_obj)
# bag
bag = bag_list[np.argmax(res[15:18])]
bag_score = res[15 + np.argmax(res[15:18])]
bag_label = bag if bag_score > self.threshold else 'No bag'
label_res.append(bag_label)
# upper
upper_res = res[4:8]
upper_label = 'Upper:'
sleeve = 'LongSleeve' if res[3] > res[2] else 'ShortSleeve'
upper_label += ' {}'.format(sleeve)
for i, r in enumerate(upper_res):
if r > self.threshold:
upper_label += ' {}'.format(upper_list[i])
label_res.append(upper_label)
# lower
lower_res = res[8:14]
lower_label = 'Lower: '
has_lower = False
for i, l in enumerate(lower_res):
if l > self.threshold:
lower_label += ' {}'.format(lower_list[i])
has_lower = True
if not has_lower:
lower_label += ' {}'.format(lower_list[np.argmax(lower_res)])
label_res.append(lower_label)
# shoe
shoe = 'Boots' if res[14] > self.threshold else 'No boots'
label_res.append(shoe)
threshold_list = [0.5] * len(res)
threshold_list[1] = self.glasses_threshold
threshold_list[18] = self.hold_threshold
pred_res = (np.array(res) > np.array(threshold_list)
).astype(np.int8).tolist()
batch_res.append({"attributes": label_res, "output": pred_res})
return batch_res
class VehicleAttribute(object):
def __init__(self, color_threshold=0.5, type_threshold=0.5):
self.color_threshold = color_threshold
self.type_threshold = type_threshold
self.color_list = [
"yellow", "orange", "green", "gray", "red", "blue", "white",
"golden", "brown", "black"
]
self.type_list = [
"sedan", "suv", "van", "hatchback", "mpv", "pickup", "bus",
"truck", "estate"
]
def __call__(self, batch_preds, file_names=None):
# postprocess output of predictor
batch_res = []
for res in batch_preds:
res = res.tolist()
label_res = []
color_idx = np.argmax(res[:10])
type_idx = np.argmax(res[10:])
if res[color_idx] >= self.color_threshold:
color_info = f"Color: ({self.color_list[color_idx]}, prob: {res[color_idx]})"
else:
color_info = "Color unknown"
if res[type_idx + 10] >= self.type_threshold:
type_info = f"Type: ({self.type_list[type_idx]}, prob: {res[type_idx + 10]})"
else:
type_info = "Type unknown"
label_res = f"{color_info}, {type_info}"
threshold_list = [self.color_threshold
] * 10 + [self.type_threshold] * 9
pred_res = (np.array(res) > np.array(threshold_list)
).astype(np.int8).tolist()
batch_res.append({"attributes": label_res, "output": pred_res})
return batch_res
......@@ -49,10 +49,15 @@ class ClsPredictor(Predictor):
pid = os.getpid()
size = config["PreProcess"]["transform_ops"][1]["CropImage"][
"size"]
if config["Global"].get("use_int8", False):
precision = "int8"
elif config["Global"].get("use_fp16", False):
precision = "fp16"
else:
precision = "fp32"
self.auto_logger = auto_log.AutoLogger(
model_name=config["Global"].get("model_name", "cls"),
model_precision='fp16'
if config["Global"]["use_fp16"] else 'fp32',
model_precision=precision,
batch_size=config["Global"].get("batch_size", 1),
data_shape=[3, size, size],
save_path=config["Global"].get("save_log_path",
......@@ -133,13 +138,20 @@ def main(config):
continue
batch_results = cls_predictor.predict(batch_imgs)
for number, result_dict in enumerate(batch_results):
filename = batch_names[number]
clas_ids = result_dict["class_ids"]
scores_str = "[{}]".format(", ".join("{:.2f}".format(
r) for r in result_dict["scores"]))
label_names = result_dict["label_names"]
print("{}:\tclass id(s): {}, score(s): {}, label_name(s): {}".
format(filename, clas_ids, scores_str, label_names))
if "PersonAttribute" in config[
"PostProcess"] or "VehicleAttribute" in config[
"PostProcess"]:
filename = batch_names[number]
print("{}:\t {}".format(filename, result_dict))
else:
filename = batch_names[number]
clas_ids = result_dict["class_ids"]
scores_str = "[{}]".format(", ".join("{:.2f}".format(
r) for r in result_dict["scores"]))
label_names = result_dict["label_names"]
print(
"{}:\tclass id(s): {}, score(s): {}, label_name(s): {}".
format(filename, clas_ids, scores_str, label_names))
batch_imgs = []
batch_names = []
if cls_predictor.benchmark:
......
......@@ -128,13 +128,10 @@ class DetPredictor(Predictor):
results = []
if reduce(lambda x, y: x * y, np_boxes.shape) < 6:
print('[WARNNING] No object detected.')
results = np.array([])
else:
results = np_boxes
results = self.parse_det_results(results,
self.config["Global"]["threshold"],
self.config["Global"]["labe_list"])
results = self.parse_det_results(
np_boxes, self.config["Global"]["threshold"],
self.config["Global"]["label_list"])
return results
......
......@@ -27,6 +27,7 @@ import cv2
import numpy as np
import importlib
from PIL import Image
from paddle.vision.transforms import ToTensor, Normalize
from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
......@@ -53,13 +54,14 @@ def create_operators(params):
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
def __init__(self, interpolation=None, backend="cv2", return_numpy=True):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
'lanczos': cv2.INTER_LANCZOS4,
'random': (cv2.INTER_LINEAR, cv2.INTER_CUBIC)
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
......@@ -67,13 +69,26 @@ class UnifiedResize(object):
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
'hamming': Image.HAMMING,
'random': (Image.BILINEAR, Image.BICUBIC)
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
def _cv2_resize(src, size, resample):
if isinstance(resample, tuple):
resample = random.choice(resample)
return cv2.resize(src, size, interpolation=resample)
def _pil_resize(src, size, resample, return_numpy=True):
if isinstance(resample, tuple):
resample = random.choice(resample)
if isinstance(src, np.ndarray):
pil_img = Image.fromarray(src)
else:
pil_img = src
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if return_numpy:
return np.asarray(pil_img)
return pil_img
if backend.lower() == "cv2":
if isinstance(interpolation, str):
......@@ -81,11 +96,12 @@ class UnifiedResize(object):
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
self.resize_func = partial(_cv2_resize, resample=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
self.resize_func = partial(
_pil_resize, resample=interpolation, return_numpy=return_numpy)
else:
logger.warning(
f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
......@@ -93,6 +109,8 @@ class UnifiedResize(object):
self.resize_func = cv2.resize
def __call__(self, src, size):
if isinstance(size, list):
size = tuple(size)
return self.resize_func(src, size)
......@@ -137,7 +155,8 @@ class ResizeImage(object):
size=None,
resize_short=None,
interpolation=None,
backend="cv2"):
backend="cv2",
return_numpy=True):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
......@@ -151,10 +170,18 @@ class ResizeImage(object):
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(
interpolation=interpolation, backend=backend)
interpolation=interpolation,
backend=backend,
return_numpy=return_numpy)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if isinstance(img, np.ndarray):
# numpy input
img_h, img_w = img.shape[:2]
else:
# PIL image input
img_w, img_h = img.size
if self.resize_short is not None:
percent = float(self.resize_short) / min(img_w, img_h)
w = int(round(img_w * percent))
......
......@@ -41,8 +41,11 @@ def main():
'inference.pdmodel')) and os.path.exists(
os.path.join(config["Global"]["save_inference_dir"],
'inference.pdiparams'))
if "Query" in config["DataLoader"]["Eval"]:
config["DataLoader"]["Eval"] = config["DataLoader"]["Eval"]["Query"]
config["DataLoader"]["Eval"]["sampler"]["batch_size"] = 1
config["DataLoader"]["Eval"]["loader"]["num_workers"] = 0
init_logger()
device = paddle.set_device("cpu")
train_dataloader = build_dataloader(config["DataLoader"], "Eval", device,
......@@ -67,6 +70,7 @@ def main():
quantize_model_path=os.path.join(
config["Global"]["save_inference_dir"], "quant_post_static_model"),
sample_generator=sample_generator(train_dataloader),
batch_size=config["DataLoader"]["Eval"]["sampler"]["batch_size"],
batch_nums=10)
......
......@@ -42,8 +42,22 @@ class Predictor(object):
def create_paddle_predictor(self, args, inference_model_dir=None):
if inference_model_dir is None:
inference_model_dir = args.inference_model_dir
params_file = os.path.join(inference_model_dir, "inference.pdiparams")
model_file = os.path.join(inference_model_dir, "inference.pdmodel")
if "inference_int8.pdiparams" in os.listdir(inference_model_dir):
params_file = os.path.join(inference_model_dir,
"inference_int8.pdiparams")
model_file = os.path.join(inference_model_dir,
"inference_int8.pdmodel")
assert args.get(
"use_fp16", False
) is False, "fp16 mode is not supported for int8 model inference, please set use_fp16 as False during inference."
else:
params_file = os.path.join(inference_model_dir,
"inference.pdiparams")
model_file = os.path.join(inference_model_dir, "inference.pdmodel")
assert args.get(
"use_int8", False
) is False, "int8 mode is not supported for fp32 model inference, please set use_int8 as False during inference."
config = Config(model_file, params_file)
if args.use_gpu:
......@@ -63,12 +77,18 @@ class Predictor(object):
config.disable_glog_info()
config.switch_ir_optim(args.ir_optim) # default true
if args.use_tensorrt:
precision = Config.Precision.Float32
if args.get("use_int8", False):
precision = Config.Precision.Int8
elif args.get("use_fp16", False):
precision = Config.Precision.Half
config.enable_tensorrt_engine(
precision_mode=Config.Precision.Half
if args.use_fp16 else Config.Precision.Float32,
precision_mode=precision,
max_batch_size=args.batch_size,
workspace_size=1 << 30,
min_subgraph_size=30)
min_subgraph_size=30,
use_calib_mode=False)
config.enable_memory_optim()
# use zero copy
......
此差异已折叠。
# PULC Classification Model of Language
------
## Catalogue
- [1. Introduction](#1)
- [2. Quick Start](#2)
- [2.1 PaddlePaddle Installation](#2.1)
- [2.2 PaddleClas Installation](#2.2)
- [2.3 Prediction](#2.3)
- [3. Training, Evaluation and Inference](#3)
- [3.1 Installation](#3.1)
- [3.2 Dataset](#3.2)
- [3.2.1 Dataset Introduction](#3.2.1)
- [3.2.2 Getting Dataset](#3.2.2)
- [3.3 Training](#3.3)
- [3.4 Evaluation](#3.4)
- [3.5 Inference](#3.5)
- [4. Model Compression](#4)
- [4.1 SKL-UGI Knowledge Distillation](#4.1)
- [4.1.1 Teacher Model Training](#4.1.1)
- [4.1.2 Knowledge Distillation Training](#4.1.2)
- [5. SHAS](#5)
- [6. Inference Deployment](#6)
- [6.1 Getting Paddle Inference Model](#6.1)
- [6.1.1 Exporting Paddle Inference Model](#6.1.1)
- [6.1.2 Downloading Inference Model](#6.1.2)
- [6.2 Prediction with Python](#6.2)
- [6.2.1 Image Prediction](#6.2.1)
- [6.2.2 Images Prediction](#6.2.2)
- [6.3 Deployment with C++](#6.3)
- [6.4 Deployment as Service](#6.4)
- [6.5 Deployment on Mobile](#6.5)
- [6.6 Converting To ONNX and Deployment](#6.6)
<a name="1"></a>
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of language in the image using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in various scenarios involving multilingual OCR processing, such as finance and government affairs.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy. When replacing the backbone with PPLCNet_x1_0, the input shape of model is changed to [192, 48], and the stride of the network is changed to [2, [2, 1], [2, 1], [2, 1]].
| Backbone | Top1-Acc(%) | Latency(ms) | Size(M)| Training Strategy |
| ----------------------- | --------- | -------- | ------- | ---------------------------------------------- |
| SwinTranformer_tiny | 98.12 | 89.09 | 111 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 95.92 | 2.98 | 3.7 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.35 | 2.58 | 7.1 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.7 | 2.58 | 7.1 | using SSLD pretrained model |
| PPLCNet_x1_0 | 99.12 | 2.58 | 7.1 | using SSLD pretrained model + EDA strategy |
| **PPLCNet_x1_0** | **99.26** | **2.58** | **7.1** | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0 and changing the input shape and stride of network, the accuracy is higher more 2.43 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.35 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the accuracy can be increased by 0.42 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the accuracy can be further improved by 0.14 percentage points. At this point, the accuracy is higher than that of SwinTranformer_tiny, but the speed is more faster. The training method and deployment instructions of PULC will be introduced in detail below.
**Note**:
* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
<a name="2"></a>
## 2. Quick Start
<a name="2.1"></a>
### 2.1 PaddlePaddle Installation
- Run the following command to install if CUDA9 or CUDA10 is available.
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- Run the following command to install if GPU device is unavailable.
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
<a name="2.2"></a>
### 2.2 PaddleClas wheel Installation
The command of PaddleClas installation as bellow:
```bash
pip3 install paddleclas
```
<a name="2.3"></a>
### 2.3 Prediction
First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
* Prediction with CLI
```bash
paddleclas --model_name=language_classification --infer_imgs=pulc_demo_imgs/language_classification/word_35404.png
```
Results:
```
>>> result
class_ids: [4, 6], scores: [0.88672, 0.01434], label_names: ['japan', 'korean'], filename: pulc_demo_imgs/language_classification/word_35404.png
Predict complete!
```
**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
* Prediction in Python
```python
import paddleclas
model = paddleclas.PaddleClas(model_name="language_classification")
result = model.predict(input_data="pulc_demo_imgs/language_classification/word_35404.png")
print(next(result))
```
**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="language_classification", batch_size=2)`. The result of demo above:
```
>>> result
[{'class_ids': [4, 6], 'scores': [0.88672, 0.01434], 'label_names': ['japan', 'korean'], 'filename': 'pulc_demo_imgs/language_classification/word_35404.png'}]
```
<a name="3"></a>
## 3. Training, Evaluation and Inference
<a name="3.1"></a>
### 3.1 Installation
Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
<a name="3.2"></a>
### 3.2 Dataset
<a name="3.2.1"></a>
#### 3.2.1 Dataset Introduction
The models wo provided are trained with internal data, which is not open source yet. So it is suggested that constructing dataset based on open source dataset [Multi-lingual scene text detection and recognition](https://rrc.cvc.uab.es/?ch=15&com=downloads) to experience the this case.
Some image of the processed dataset is as follows:
![](../../images/PULC/docs/language_classification_original_data.png)
<a name="3.2.2"></a>
#### 3.2.2 Getting Dataset
The models provided support to classcify 10 languages, which as shown in the following list:
`0` : means Arabic
`1` : means chinese_cht
`2` : means cyrillic
`3` : means devanagari
`4` : means Japanese
`5` : means ka
`6` : means Korean
`7` : means ta
`8` : means te
`9` : means Latin
In the `Multi-lingual scene text detection and recognition`, only Arabic, Japanese, Korean and Latin data are included. 1600 images from each of the four languages are taken as the training data of this case, 300 images as the evaluation data, and 400 images as the supplementary data is used for the `SKL-UGI Knowledge Distillation`.
Therefore, for the demo dataset in this case, the language categories are shown in following list:
`0` : means arabic
`4` : means japan
`6` : means korean
`9` : means latin
**Note**: The images used in this task should be cropped by text from original image. Only the text line part is used as the image data.
If you want to create your own dataset, you can collect and sort out the data of the required languages in your task as required. And you can also download the data processed directly.
```
cd path_to_PaddleClas
```
Enter the `dataset/` directory, download and unzip the dataset.
```shell
cd dataset
wget https://paddleclas.bj.bcebos.com/data/PULC/language_classification.tar
tar -xf language_classification.tar
cd ../
```
The datas under `language_classification` directory:
```
├── img
│ ├── word_1.png
│ ├── word_2.png
...
├── train_list.txt
├── train_list_for_distill.txt
├── test_list.txt
└── label_list.txt
```
Where `img/` is the directory including 9200 images in 4 languages. The `train_list.txt` and `test_list.txt` are label files of training data and validation data respectively. `label_list.txt` is the mapping file corresponding to the four languages. `train_list_for_distill.txt` is the label list of images used for `SKL-UGI Knowledge Distillation`.
**Note**:
* About the contents format of `train_list.txt` and `val_list.txt`, please refer to [Description about Classification Dataset in PaddleClas](../data_preparation/classification_dataset_en.md).
* About the `train_list_for_distill.txt`, please refer to [Knowledge Distillation Label](../advanced_tutorials/distillation/distillation_en.md).
<a name="3.3"></a>
### 3.3 Training
The details of training config in `ppcls/configs/PULC/person_exists/PPLCNet_x1_0.yaml`. The command about training as follows:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
-o Arch.class_num=4
```
**Note**: Because the class num of demo dataset is 4, the argument `-o Arch.class_num=4` should be specifed to change the prediction class num of model to 4.
<a name="3.4"></a>
### 3.4 Evaluation
After training, you can use the following commands to evaluate the model.
```bash
python3 tools/eval.py \
-c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
-o Global.pretrained_model="output/PPLCNet_x1_0/best_model" \
-o Arch.class_num=4
```
Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
<a name="3.5"></a>
### 3.5 Inference
After training, you can use the model that trained to infer. Command is as follow:
```bash
python3 tools/infer.py \
-c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
-o Global.pretrained_model="output/PPLCNet_x1_0/best_model" \
-o Arch.class_num=4
```
The results:
```
[{'class_ids': [4, 9], 'scores': [0.96809, 0.01001], 'file_name': 'deploy/images/PULC/language_classification/word_35404.png', 'label_names': ['japan', 'latin']}]
```
**Note**:
* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
* The default test image is `deploy/images/PULC/person_exists/objects365_02035329.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
* Among the prediction results, `japan` means japanese and `korean` means korean.
<a name="4"></a>
## 4. Model Compression
<a name="4.1"></a>
### 4.1 SKL-UGI Knowledge Distillation
SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
<!-- todo -->
<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
<a name="4.1.1"></a>
#### 4.1.1 Teacher Model Training
Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/language_classification/PPLCNet/PPLCNet_x1_0.yaml`. The command is as follow:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
-o Arch.name=ResNet101_vd \
-o Arch.class_num=4
```
The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
**Note**: Training the ResNet101_vd model requires more GPU memory. If the memory is not enough, you can reduce the learning rate and batch size in the same proportion.
<a name="4.1.2"></a>
#### 4.1.2 Knowledge Distillation Training
The training strategy, specified in training config file `ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0`. The command is as follow:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0_distillation.yaml \
-o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model \
-o Arch.class_num=4
```
The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
<a name="5"></a>
## 5. Hyperparameters Searching
The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
<a name="6"></a>
## 6. Inference Deployment
<a name="6.1"></a>
### 6.1 Getting Paddle Inference Model
Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
<a name="6.1.1"></a>
### 6.1.1 Exporting Paddle Inference Model
The command about exporting Paddle Inference Model is as follow:
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/language_classification/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/DistillationModel/best_model_student \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_language_classification_infer
```
After running above command, the inference model files would be saved in `deploy/models/PPLCNet_x1_0_language_classification_infer`, as shown below:
```
├── PPLCNet_x1_0_language_classification_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
<a name="6.1.2"></a>
### 6.1.2 Downloading Inference Model
You can also download directly.
```
cd deploy/models
# download the inference model and decompression
wget https://paddleclas.bj.bcebos.com/models/PULC/language_classification_infer.tar && tar -xf language_classification_infer.tar
```
After decompression, the directory `models` should be shown below.
```
├── language_classification_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
<a name="6.2"></a>
### 6.2 Prediction with Python
<a name="6.2.1"></a>
#### 6.2.1 Image Prediction
Return the directory `deploy`:
```
cd ../
```
Run the following command to classify language about the image `./images/PULC/language_classification/word_35404.png`.
```shell
# Use the following command to predict with GPU.
python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml
# Use the following command to predict with CPU.
python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml -o Global.use_gpu=False
```
The prediction results:
```
word_35404.png: class id(s): [4, 6], score(s): [0.89, 0.01], label_name(s): ['japan', 'korean']
```
**Note**: Among the prediction results, `japan` means japanese and `korean` means korean.
<a name="6.2.2"></a>
#### 6.2.2 Images Prediction
If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
```shell
# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
python3.7 python/predict_cls.py -c configs/PULC/language_classification/inference_language_classification.yaml -o Global.infer_imgs="./images/PULC/language_classification/"
```
All prediction results will be printed, as shown below.
```
word_17.png: class id(s): [9, 4], score(s): [0.80, 0.09], label_name(s): ['latin', 'japan']
word_20.png: class id(s): [0, 4], score(s): [0.91, 0.02], label_name(s): ['arabic', 'japan']
word_35404.png: class id(s): [4, 6], score(s): [0.89, 0.01], label_name(s): ['japan', 'korean']
```
Among the prediction results above, `japan` means japanese, `latin` means latin, `arabic` means arabic and `korean` means korean.
<a name="6.3"></a>
### 6.3 Deployment with C++
PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
<a name="6.4"></a>
### 6.4 Deployment as Service
Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
<a name="6.5"></a>
### 6.5 Deployment on Mobile
Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
<a name="6.6"></a>
### 6.6 Converting To ONNX and Deployment
Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
# PULC Model Zoo
------
The PULC model zoo is provided here, mainly providing indicators, model storage size, and download links of the model. The pre-trained model can be used for fine-tuning training, and the inference model can be directly used for prediction and deployment.
|Model name| Model Description | Metrics |Storage Size| Latency| Download Address|
| --- | --- | --- | --- | --- | --- |
| person_exists |[Human Exists Classification](PULC_person_exists_en.md)| 96.23 |7.0M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_exists_pretrained.pdparams)|
| person_attribute |[Pedestrian Attribute Classification](PULC_person_attribute_en.md)| 78.59 |7.2M|2.01ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_attribute_pretrained.pdparams)|
| safety_helmet |[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)| 99.38 |7.1M|2.03ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/safety_helmet_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/safety_helmet_pretrained.pdparams)|
| traffic_sign |[Traffic Sign Classification](PULC_traffic_sign_en.md)| 98.35 |8.2M|2.10ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/traffic_sign_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/traffic_sign_pretrained.pdparams)|
| vehicle_attribute |[Vehicle Attribute Classification](PULC_vehicle_attribute_en.md)| 90.81 |7.2M|2.36ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/vehicle_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/vehicle_attribute_pretrained.pdparams)|
| car_exists |[Car Exists Classification](PULC_car_exists_en.md) | 95.92 | 7.1M | 2.38ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/car_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/car_exists_pretrained.pdparams)|
| text_image_orientation |[Text Image Orientation Classification](PULC_text_image_orientation_en.md)| 99.06 | 7.1M | 2.16ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/text_image_orientation_pretrained.pdparams)|
| textline_orientation |[Text-line Orientation Classification](PULC_textline_orientation_en.md)| 96.01 |7.0M|2.72ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/textline_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/textline_orientation_pretrained.pdparams)|
| language_classification |[Language Classification](PULC_language_classification_en.md)| 99.26 |7.1M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/language_classification_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/language_classification_pretrained.pdparams)|
**Note:**
* The backbone of all the above models is PPLCNet_x1_0. The different sizes of some models are caused by the different output sizes of the classification layer. The inference time is tested on the Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. During the test process, the MKLDNN acceleration strategy is turned on, and the number of threads is 10. There will be slight fluctuations during the speed test process.
* The evaluation indicators of person_exists, safety_helmet, and car_exists are TprAtFpr. The evaluation indicators of person_attribute and vehicle_attribute are ma. The evaluation indicators of traffic_sign, text_image_orientation, textline_orientation and language_classification are Top-1 Acc.
# PULC Recognition Model of Person Attribute
------
## Catalogue
- [1. Introduction](#1)
- [2. Quick Start](#2)
- [2.1 PaddlePaddle Installation](#2.1)
- [2.2 PaddleClas Installation](#2.2)
- [2.3 Prediction](#2.3)
- [3. Training, Evaluation and Inference](#3)
- [3.1 Installation](#3.1)
- [3.2 Dataset](#3.2)
- [3.2.1 Dataset Introduction](#3.2.1)
- [3.2.2 Getting Dataset](#3.2.2)
- [3.3 Training](#3.3)
- [3.4 Evaluation](#3.4)
- [3.5 Inference](#3.5)
- [4. Model Compression](#4)
- [4.1 SKL-UGI Knowledge Distillation](#4.1)
- [4.1.1 Teacher Model Training](#4.1.1)
- [4.1.2 Knowledge Distillation Training](#4.1.2)
- [5. SHAS](#5)
- [6. Inference Deployment](#6)
- [6.1 Getting Paddle Inference Model](#6.1)
- [6.1.1 Exporting Paddle Inference Model](#6.1.1)
- [6.1.2 Downloading Inference Model](#6.1.2)
- [6.2 Prediction with Python](#6.2)
- [6.2.1 Image Prediction](#6.2.1)
- [6.2.2 Images Prediction](#6.2.2)
- [6.3 Deployment with C++](#6.3)
- [6.4 Deployment as Service](#6.4)
- [6.5 Deployment on Mobile](#6.5)
- [6.6 Converting To ONNX and Deployment](#6.6)
<a name="1"></a>
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of person attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in
Pedestrian analysis scenarios, pedestrian tracking scenarios, etc.
The following table lists the relevant indicators of the model. The first three lines means that using Res2Net200_vd_26w_4s, SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | ma(%) | Latency(ms) | Size(M) | Training Strategy |
|-------|-----------|----------|---------------|---------------|
| Res2Net200_vd_26w_4s | 81.25 | 77.51 | 293 | using ImageNet pretrained |
| SwinTransformer_tiny | 80.17 | 89.51 | 111 | using ImageNet pretrained |
| MobileNetV3_small_x0_35 | 70.79 | 2.90 | 1.7 | using ImageNet pretrained |
| PPLCNet_x1_0 | 76.31 | 2.01 | 7.1 | using ImageNet pretrained |
| PPLCNet_x1_0 | 77.31 | 2.01 | 7.1 | using SSLD pretrained |
| PPLCNet_x1_0 | 77.71 | 2.01 | 7.1 | using SSLD pretrained + EDA strategy|
| <b>PPLCNet_x1_0<b> | <b>78.59<b> | <b>2.01<b> | <b>7.1<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high ma metric can be getted when backbone are Res2Net200_vd_26w_4s and SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the ma metric will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the ma metric is higher more 5.5 percentage points higher than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the ma metric can be improved by about 1 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the ma metric can be increased by 0.4 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the ma matric can be further improved by 0.88 percentage points. At this time, the ma metric of PPLCNet_x1_0 is only 1.58% different from SwinTransformer_tiny, but the speed is more than 44 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
**Note**:
* The Latency is tested on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. The MKLDNN is enabled and the number of threads is 10.
* About PP-LCNet, please refer to [PP-LCNet Introduction](../models/PP-LCNet_en.md) and [PP-LCNet Paper](https://arxiv.org/abs/2109.15099).
<a name="2"></a>
## 2. Quick Start
<a name="2.1"></a>
### 2.1 PaddlePaddle Installation
- Run the following command to install if CUDA9 or CUDA10 is available.
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- Run the following command to install if GPU device is unavailable.
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
<a name="2.2"></a>
### 2.2 PaddleClas wheel Installation
The command of PaddleClas installation as bellow:
```bash
pip3 install paddleclas
```
<a name="2.3"></a>
### 2.3 Prediction
First, please click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download and unzip to get the test demo images.
* Prediction with CLI
```bash
paddleclas --model_name=person_attribute --infer_imgs=pulc_demo_imgs/person_attribute/090004.jpg
```
Results:
```
>>> result
attributes: ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower: Trousers', 'No boots'], output: [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], filename: pulc_demo_imgs/person_attribute/090004.jpg
Predict complete!
```
**Note**: If you want to test other images, only need to specify the `--infer_imgs` argument, and the directory containing images is also supported.
* Prediction in Python
```python
import paddleclas
model = paddleclas.PaddleClas(model_name="person_attribute")
result = model.predict(input_data="pulc_demo_imgs/person_attribute/090004.jpg")
print(next(result))
```
**Note**: The `result` returned by `model.predict()` is a generator, so you need to use the `next()` function to call it or `for` loop to loop it. And it will predict with `batch_size` size batch and return the prediction results when called. The default `batch_size` is 1, and you also specify the `batch_size` when instantiating, such as `model = paddleclas.PaddleClas(model_name="person_attribute", batch_size=2)`. The result of demo above:
```
>>> result
[{'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower: Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1], 'filename': 'pulc_demo_imgs/person_attribute/090004.jpg'}]
```
<a name="3"></a>
## 3. Training, Evaluation and Inference
<a name="3.1"></a>
### 3.1 Installation
Please refer to [Installation](../installation/install_paddleclas_en.md) to get the description about installation.
<a name="3.2"></a>
### 3.2 Dataset
<a name="3.2.1"></a>
#### 3.2.1 Dataset Introduction
The data used in this case is the [pa100k dataset](https://www.v7labs.com/open-datasets/pa-100k).
<a name="3.2.2"></a>
#### 3.2.2 Getting Dataset
Some image of the processed dataset is as follows:
![](../../images/PULC/docs/person_attribute_data_demo.png)
We converted the data into a PaddleClas multi-label readable data format that can be downloaded directly.
```
cd path_to_PaddleClas
```
Enter the `dataset/` directory, download and unzip the dataset.
```shell
cd dataset
wget https://paddleclas.bj.bcebos.com/data/PULC/pa100k.tar
tar -xf pa100k.tar
cd ../
```
The datas under `pa100k` directory:
```
pa100k
├── train
│   ├── 000001.jpg
│   ├── 000002.jpg
...
├── val
│   ├── 080001.jpg
│   ├── 080002.jpg
...
├── test
│   ├── 090001.jpg
│   ├── 090002.jpg
...
...
├── train_list.txt
├── train_val_list.txt
├── val_list.txt
├── test_list.txt
```
Where `train/`, `val/`, `test/` are training set, validation set and test set respectively. `train_list.txt`, `val_list.txt`, `test_list.txt` are the label files of the training set, validation set, and test set, respectively. In this example, `test_list.txt` is not used for now.
<a name="3.3"></a>
### 3.3 Training
The details of training config in ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`. The command about training as follows:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
```
The best metric for the validation set is around `77.71%` (the dataset is small and generally fluctuates around 0.3%).
<a name="3.4"></a>
### 3.4 Evaluation
After training, you can use the following commands to evaluate the model.
```bash
python3 tools/eval.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"
```
Among the above command, the argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
<a name="3.5"></a>
### 3.5 Inference
After training, you can use the model that trained to infer. Command is as follow:
```python
python3 tools/infer.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/PPLCNet_x1_0/best_model
```
The results:
```
[{'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower: Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}]
```
**Note**:
* Among the above command, argument `-o Global.pretrained_model="output/PPLCNet_x1_0/best_model"` specify the path of the best model weight file. You can specify other path if needed.
* The default test image is `deploy/images/PULC/person_attribute/090004.jpg`. And you can test other image, only need to specify the argument `-o Infer.infer_imgs=path_to_test_image`.
<a name="4"></a>
## 4. Model Compression
<a name="4.1"></a>
### 4.1 SKL-UGI Knowledge Distillation
SKL-UGI is a simple but effective knowledge distillation algrithem proposed by PaddleClas.
<!-- todo -->
<!-- Please refer to [SKL-UGI](../advanced_tutorials/distillation/distillation_en.md) for more details. -->
<a name="4.1.1"></a>
#### 4.1.1 Teacher Model Training
Training the teacher model with hyperparameters specified in `ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`. The command is as follow:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Arch.name=ResNet101_vd
```
The best metric for the validation set is around `80.10%`. The best teacher model weight would be saved in file `output/ResNet101_vd/best_model.pdparams`.
<a name="4.1.2"></a>
#### 4.1.2 Knowledge Distillation Training
The training strategy, specified in training config file `ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml`, the teacher model is `ResNet101_vd`, the student model is `PPLCNet_x1_0`. The command is as follow:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0_Distillation.yaml \
-o Arch.models.0.Teacher.pretrained=output/ResNet101_vd/best_model
```
The best metric for the validation set is around `78.5%`. The best student model weight would be saved in file `output/DistillationModel/best_model_student.pdparams`.
<a name="5"></a>
## 5. Hyperparameters Searching
The hyperparameters used by [3.2 section](#3.2) and [4.1 section](#4.1) are according by `Hyperparameters Searching` in PaddleClas. If you want to get better results on your own dataset, you can refer to [Hyperparameters Searching](PULC_train_en.md#4) to get better hyperparameters.
**Note**: This section is optional. Because the search process will take a long time, you can selectively run according to your specific. If not replace the dataset, you can ignore this section.
<a name="6"></a>
## 6. Inference Deployment
<a name="6.1"></a>
### 6.1 Getting Paddle Inference Model
Paddle Inference is the original Inference Library of the PaddlePaddle, provides high-performance inference for server deployment. And compared with directly based on the pretrained model, Paddle Inference can use tools to accelerate prediction, so as to achieve better inference performance. Please refer to [Paddle Inference](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html) for more information.
Paddle Inference need Paddle Inference Model to predict. Two process provided to get Paddle Inference Model. If want to use the provided by PaddleClas, you can download directly, click [Downloading Inference Model](#6.1.2).
<a name="6.1.1"></a>
### 6.1.1 Exporting Paddle Inference Model
The command about exporting Paddle Inference Model is as follow:
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
-o Global.pretrained_model=output/DistillationModel/best_model_student \
-o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
```
After running above command, the inference model files would be saved in `PPLCNet_x1_0_person_attribute_infer`, as shown below:
```
├── PPLCNet_x1_0_person_attribute_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
**Note**: The best model is from knowledge distillation training. If knowledge distillation training is not used, the best model would be saved in `output/PPLCNet_x1_0/best_model.pdparams`.
<a name="6.1.2"></a>
### 6.1.2 Downloading Inference Model
You can also download directly.
```
cd deploy/models
# download the inference model and decompression
wget https://paddleclas.bj.bcebos.com/models/PULC/person_attribute_infer.tar && tar -xf person_attribute_infer.tar
```
After decompression, the directory `models` should be shown below.
```
├── person_attribute_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
<a name="6.2"></a>
### 6.2 Prediction with Python
<a name="6.2.1"></a>
#### 6.2.1 Image Prediction
Return the directory `deploy`:
```
cd ../
```
Run the following command to classify whether there are human in the image `./images/PULC/person_attribute/090004.jpg`.
```shell
# Use the following command to predict with GPU.
python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.use_gpu=True
# Use the following command to predict with CPU.
python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.use_gpu=False
```
The prediction results:
```
090004.jpg: {'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower: Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}
```
<a name="6.2.2"></a>
#### 6.2.2 Images Prediction
If you want to predict images in directory, please specify the argument `Global.infer_imgs` as directory path by `-o Global.infer_imgs`. The command is as follow.
```shell
# Use the following command to predict with GPU. If want to replace with CPU, you can add argument -o Global.use_gpu=False
python3.7 python/predict_cls.py -c configs/PULC/person_attribute/inference_person_attribute.yaml -o Global.infer_imgs="./images/PULC/person_attribute/"
```
All prediction results will be printed, as shown below.
```
090004.jpg: {'attributes': ['Male', 'Age18-60', 'Back', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'Backpack', 'Upper: LongSleeve UpperPlaid', 'Lower: Trousers', 'No boots'], 'output': [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]}
090007.jpg: {'attributes': ['Female', 'Age18-60', 'Side', 'Glasses: False', 'Hat: False', 'HoldObjectsInFront: False', 'No bag', 'Upper: ShortSleeve', 'Lower: Skirt&Dress', 'No boots'], 'output': [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0]}
```
Among the prediction results above, `someone` means that there is a human in the image, `nobody` means that there is no human in the image.
<a name="6.3"></a>
### 6.3 Deployment with C++
PaddleClas provides an example about how to deploy with C++. Please refer to [Deployment with C++](../inference_deployment/cpp_deploy_en.md).
<a name="6.4"></a>
### 6.4 Deployment as Service
Paddle Serving is a flexible, high-performance carrier for machine learning models, and supports different protocol, such as RESTful, gRPC, bRPC and so on, which provides different deployment solutions for a variety of heterogeneous hardware and operating system environments. Please refer [Paddle Serving](https://github.com/PaddlePaddle/Serving) for more information.
PaddleClas provides an example about how to deploy as service by Paddle Serving. Please refer to [Paddle Serving Deployment](../inference_deployment/paddle_serving_deploy_en.md).
<a name="6.5"></a>
### 6.5 Deployment on Mobile
Paddle-Lite is an open source deep learning framework that designed to make easy to perform inference on mobile, embeded, and IoT devices. Please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) for more information.
PaddleClas provides an example of how to deploy on mobile by Paddle-Lite. Please refer to [Paddle-Lite deployment](../inference_deployment/paddle_lite_deploy_en.md).
<a name="6.6"></a>
### 6.6 Converting To ONNX and Deployment
Paddle2ONNX support convert Paddle Inference model to ONNX model. And you can deploy with ONNX model on different inference engine, such as TensorRT, OpenVINO, MNN/TNN, NCNN and so on. About Paddle2ONNX details, please refer to [Paddle2ONNX](https://github.com/PaddlePaddle/Paddle2ONNX).
PaddleClas provides an example of how to convert Paddle Inference model to ONNX model by paddle2onnx toolkit and predict by ONNX model. You can refer to [paddle2onnx](../../../deploy/paddle2onnx/readme_en.md) for deployment details.
此差异已折叠。
# PULC Quick Start
------
This document introduces the prediction using PULC series model based on PaddleClas wheel.
## Catalogue
- [1. Installation](#1)
- [1.1 PaddlePaddle Installation](#11)
- [1.2 PaddleClas wheel Installation](#12)
- [2. Quick Start](#2)
- [2.1 Predicion with Command Line](#2.1)
- [2.2 Predicion with Python](#2.2)
- [2.3 Supported Model List](#2.3)
- [3. Summary](#3)
<a name="1"></a>
## 1. Installation
<a name="1.1"></a>
### 1.1 PaddlePaddle Installation
- Run the following command to install if CUDA9 or CUDA10 is available.
```bash
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
```
- Run the following command to install if GPU device is unavailable.
```bash
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
<a name="1.2"></a>
### 1.2 PaddleClas wheel Installation
```bash
pip3 install paddleclas
```
<a name="2"></a>
## 2. Quick Start
PaddleClas provides a series of test cases, which contain demos of different scenes about people, cars, OCR, etc. Click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download the data.
<a name="2.1"></a>
### 2.1 Predicion with Command Line
```
cd /path/to/pulc_demo_imgs
```
The prediction command:
```bash
paddleclas --model_name=person_exists --infer_imgs=pulc_demo_imgs/person_exists/objects365_01780782.jpg
```
Result:
```
>>> result
class_ids: [0], scores: [0.9955421453341842], label_names: ['nobody'], filename: pulc_demo_imgs/person_exists/objects365_01780782.jpg
Predict complete!
```
`Nobody` means there is no one in the image, `someone` means there is someone in the image. Therefore, the prediction result indicates that there is no one in the figure.
**Note**: The "--infer_imgs" argument specify the image(s) to be predict, and you can also specify a directoy contains images. If use other model, you can specify the `--model_name` argument. Please refer to [2.3 Supported Model List](#2.3) for the supported model list.
<a name="2.2"></a>
### 2.2 Predicion with Python
You can also use in Python:
```python
import paddleclas
model = paddleclas.PaddleClas(model_name="person_exists")
result = model.predict(input_data="pulc_demo_imgs/person_exists/objects365_01780782.jpg")
print(next(result))
```
The printed result information:
```
>>> result
[{'class_ids': [0], 'scores': [0.9955421453341842], 'label_names': ['nobody'], 'filename': 'pulc_demo_imgs/person_exists/objects365_01780782.jpg'}]
```
**Note**: `model.predict()` is a generator, so `next()` or `for` is needed to call it. This would to predict by batch that length is `batch_size`, default by 1. You can specify the argument `batch_size` and `model_name` when instantiating PaddleClas object, for example: `model = paddleclas.PaddleClas(model_name="person_exists", batch_size=2)`. Please refer to [2.3 Supported Model List](#2.3) for the supported model list.
<a name="2.3"></a>
### 2.3 Supported Model List
The name of PULC series models are as follows:
| Name | Intro |
| --- | --- |
| person_exists | Human Exists Classification |
| person_attribute | Pedestrian Attribute Classification |
| safety_helmet | Classification of Wheather Wearing Safety Helmet |
| traffic_sign | Traffic Sign Classification |
| vehicle_attribute | Vehicle Attribute Classification |
| car_exists | Car Exists Classification |
| text_image_orientation | Text Image Orientation Classification |
| textline_orientation | Text-line Orientation Classification |
| language_classification | Language Classification |
<a name="3"></a>
## 3. Summary
The PULC series models have been verified to be effective in different scenarios about people, vehicles, OCR, etc. The ultra lightweight model can achieve the accuracy close to SwinTransformer model, and the speed is increased by 40+ times. And PULC also provides the whole process of dataset getting, model training, model compression and deployment. Please refer to [Human Exists Classification](PULC_person_exists_en.md)[Pedestrian Attribute Classification](PULC_person_attribute_en.md)[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)[Traffic Sign Classification](PULC_traffic_sign_en.md)[Vehicle Attribute Classification](PULC_vehicle_attribute_en.md)[Car Exists Classification](PULC_car_exists_en.md)[Text Image Orientation Classification](PULC_text_image_orientation_en.md)[Text-line Orientation Classification](PULC_textline_orientation_en.md)[Language Classification](PULC_language_classification_en.md) for more information about different scenarios.
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
## Practical Ultra Lightweight Classification scheme PULC
------
## Catalogue
- [1. Introduction of PULC solution](#1)
- [2. Data preparation](#2)
- [2.1 Dataset format description](#2.1)
- [2.2 Annotation file generation method](#2.2)
- [3. Training with standard classification configuration](#3)
- [3.1 PP-LCNet as backbone](#3.1)
- [3.2 SSLD pretrained model](#3.2)
- [3.3 EDA strategy](#3.3)
- [3.4 SKL-UGI knowledge distillation](#3.4)
- [3.5 Summary](#3.5)
- [4. Hyperparameters Searching](#4)
- [4.1 Search based on default configuration](#4.1)
- [4.2 Custom search configuration](#4.2)
<a name="1"></a>
### 1. Introduction of PULC solution
Image classification is one of the basic algorithms of computer vision, and it is also the most common algorithm in enterprise applications, and further, it is also an important part of many CV applications. In recent years, the backbone network model has developed rapidly, and the accuracy record of ImageNet has been continuously refreshed. However, the performance of these models in practical scenarios is sometimes unsatisfactory. On the one hand, models with high precision tend to have large storage and slow inference speed, which are often difficult to meet actual deployment requirements; on the other hand, after selecting a suitable model, experienced engineers are often required to adjust parameters, which is time-consuming and labor-intensive. In order to solve the problems of enterprise application and make the training and parameter adjustment of classification models easier, PaddleClas summarized and launched a Practical Ultra Lightweight Classification (PULC) solution. PULC integrates various state-of-the-art algorithms such as backbone network, data augmentation and distillation, etc., and finally can automatically obtain a lightweight and high-precision image classification model.
The PULC solution has been verified to be effective in many scenarios, such as human-related scenarios, car-related scenarios, and OCR-related scenarios. With an ultra-lightweight model, the accuracy close to SwinTransformer can be achieved, and the inference speed can be 40+ times faster.
<div align="center">
<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png" width = "800" />
</div>
The solution mainly includes 4 parts, namely: PP-LCNet lightweight backbone network, SSLD pre-trained model, Ensemble Data Augmentation (EDA) and SKL-UGI knowledge distillation algorithm. In addition, we also adopt the method of hyperparameters searching to efficiently optimize the hyperparameters in training. Below, we take the person exists or not scene as an example to illustrate the solution.
**Note**:For some specific scenarios, we provide basic training documents for reference, such as [person exists or not classification model](PULC_person_exists_en.md), etc. You can find these documents [here](./PULC_model_list_en.md). If the methods in these documents do not meet your needs, or if you need a custom training task, you can refer to this document.
<a name="2"></a>
### 2. Data preparation
<a name="2.1"></a>
#### 2.1 Dataset format description
PaddleClas uses the `txt` format file to specify the training set and validation set. Take the person exists or not scene as an example, you need to specify `train_list.txt` and `val_list.txt` as the data labels of the training set and validation set. The format is in the form of as follows:
```
# Each line uses "space" to separate the image path and label
train/1.jpg 0
train/10.jpg 1
...
```
If you want to get more information about common classification datasets, you can refer to the document [PaddleClas Classification Dataset Format Description](../data_preparation/classification_dataset_en.md).
<a name="2.2"></a>
#### 2.2 Annotation file generation method
If you already have the data in the actual scene, you can label it according to the format in the previous section. Here, we provide a script to quickly generate annotation files. You only need to put different categories of data in folders and run the script to generate annotation files.
First, assume that the path where you store the data is `./train`, `train/` contains the data of each category, the category number starts from 0, and the folder of each category contains specific image data.
```shell
train
├── 0
│   ├── 0.jpg
│   ├── 1.jpg
│   └── ...
└── 1
├── 0.jpg
├── 1.jpg
└── ...
└── ...
```
```shell
tree -r -i -f train | grep -E "jpg|JPG|jpeg|JPEG|png|PNG" | awk -F "/" '{print $0" "$2}' > train_list.txt
```
Among them, if more image name suffixes are involved, the content after `grep -E` can be added, and the `2` in `$2` is the level of the category number folder.
**Note:** The above is an introduction to the method of dataset acquisition and generation. Here you can directly download the person exists or not scene data to quickly start the experience.
Go to the PaddleClas directory.
```
cd path_to_PaddleClas
```
Go to the `dataset/` directory, download and unzip the data.
```shell
cd dataset
wget https://paddleclas.bj.bcebos.com/data/PULC/person_exists.tar
tar -xf person_exists.tar
cd ../
```
<a name="3"></a>
### 3. Training with standard classification configuration
<a name="3.1"></a>
#### 3.1 PP-LCNet as backbone
PULC adopts the lightweight backbone network PP-LCNet, which is 50% faster than other networks with the same accuracy. You can view the detailed introduction of the backbone network in [PP-LCNet Introduction](../models/PP-LCNet_en.md).
The command to train with PP-LCNet is:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_exists/PPLCNet_x1_0_search.yaml
```
For performance comparison, we also provide configuration files for the large model SwinTransformer_tiny and the lightweight model MobileNetV3_small_x0_35, which you can train with the command:
SwinTransformer_tiny:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_exists/SwinTransformer_tiny_patch4_window7_224.yaml
```
MobileNetV3_small_x0_35:
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/PULC/person_exists/MobileNetV3_small_x0_35.yaml
```
The accuracy of the trained models is compared in the following table.
| Model | Tpr(%) | Latency(ms) | Storage Size(M) | Strategy |
|-------|-----------|----------|---------------|---------------|
| SwinTranformer_tiny | 95.69 | 95.30 | 107 | Use ImageNet pretrained model|
| MobileNetV3_small_x0_35 | 68.25 | 2.85 | 1.6 | Use ImageNet pretrained model |
| PPLCNet_x1_0 | 89.57 | 2.12 | 6.5 | Use ImageNet pretrained model |
It can be seen that PP-LCNet is much faster than SwinTransformer, but the accuracy is also slightly lower. Below we improve the accuracy of the PP-LCNet model through a series of optimizations.
<a name="3.2"></a>
#### 3.2 SSLD pretrained model
SSLD is a semi-supervised distillation algorithm developed by Baidu. On the ImageNet dataset, the model accuracy can be improved by 3-7 points. You can find a detailed introduction in [SSLD introduction](../advanced_tutorials/distillation/distillation_en.md). We found that using SSLD pre-trained weights can effectively improve the accuracy of the applied classification model. In addition, using a smaller resolution in training can effectively improve model accuracy. At the same time, we also optimize the learning rate.
Based on the above three improvements, the accuracy of our trained model is 92.1%, an increase of 2.6%.
<a name="3.3"></a>
#### 3.3 EDA strategy
Data augmentation is a commonly used optimization strategy in vision algorithms, which can significantly improve the accuracy of the model. In addition to the traditional RandomCrop, RandomFlip, etc. methods, we also apply RandomAugment and RandomErasing. You can find a detailed introduction at [Data Augmentation Introduction](../advanced_tutorials/DataAugmentation_en.md).
Since these two kinds of data augmentation greatly modify the picture, making the classification task more difficult, it may lead to under-fitting of the model on some datasets. We will set the probability of enabling these two methods in advance.
Based on the above improvements, we obtained a model accuracy of 93.43%, an increase of 1.3%.
<a name="3.4"></a>
#### 3.4 SKL-UGI knowledge distillation
Knowledge distillation is a method that can effectively improve the accuracy of small models. You can find a detailed introduction in [Introduction to Knowledge Distillation](../advanced_tutorials/distillation/distillation_en.md). We choose ResNet101_vd as the teacher model for distillation. In order to adapt to the distillation process, we also adjust the learning rate of different stages of the network here. Based on the above improvements, we trained the model to get a model accuracy of 95.6%, an increase of 1.4%.
<a name="3.5"></a>
#### 3.5 Summary
After the optimization of the above methods, the final accuracy of PP-LCNet reaches 95.6%, reaching the accuracy level of the large model. We summarize the experimental results in the following table:
| Model | Tpr(%) | Latency(ms) | Storage Size(M) | Strategy |
|-------|-----------|----------|---------------|---------------|
| SwinTranformer_tiny | 95.69 | 95.30 | 107 | Use ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 68.25 | 2.85 | 1.6 | Use ImageNet pretrained model |
| PPLCNet_x1_0 | 89.57 | 2.12 | 6.5 | Use ImageNet pretrained model |
| PPLCNet_x1_0 | 92.10 | 2.12 | 6.5 | Use SSLD pretrained model |
| PPLCNet_x1_0 | 93.43 | 2.12 | 6.5 | Use SSLD pretrained model + EDA Strategy|
| <b>PPLCNet_x1_0<b> | <b>95.60<b> | <b>2.12<b> | <b>6.5<b> | Use SSLD pretrained model + EDA Strategy + SKL-UGI knowledge distillation |
We also used the same optimization strategy in the other 8 scenarios and got the following results:
| scenarios | large model | large model metrics(%) | small model | small model metrics(%) |
|----------|----------|----------|----------|----------|
| Pedestrian Attribute Classification | Res2Net200_vd | 81.25 | PPLCNet_x1_0 | 78.59 |
| Classification of Wheather Wearing Safety Helmet | Res2Net200_vd| 98.92 | PPLCNet_x1_0 |99.38 |
| Traffic Sign Classification | SwinTransformer_tiny | 98.11 | PPLCNet_x1_0 | 98.35 |
| Vehicle Attribute Classification | Res2Net200_vd_26w_4s | 91.36 | PPLCNet_x1_0 | 90.81 |
| Car Exists Classification | SwinTransformer_tiny | 97.71 | PPLCNet_x1_0 | 95.92 |
| Text Image Orientation Classification | SwinTransformer_tiny |99.12 | PPLCNet_x1_0 | 99.06 |
| Text-line Orientation Classification | SwinTransformer_tiny | 93.61 | PPLCNet_x1_0 | 96.01 |
| Language Classification | SwinTransformer_tiny | 98.12 | PPLCNet_x1_0 | 99.26 |
It can be seen from the results that the PULC scheme can improve the model accuracy in multiple application scenarios. Using the PULC scheme can greatly reduce the workload of model optimization and quickly obtain models with higher accuracy.
<a name="4"></a>
### 4. Hyperparameters Searching
In the above training process, we adjusted parameters such as learning rate, data augmentation probability, and stage learning rate mult list. The optimal values of these parameters may not be the same in different scenarios. We provide a quick hyperparameters searching script to automate the process of hyperparameter tuning. This script traverses the parameters in the search value list to replace the parameters in the default configuration, then trains in sequence, and finally selects the parameters corresponding to the model with the highest accuracy as the search result.
<a name="4.1"></a>
#### 4.1 Search based on default configuration
The configuration file [search.yaml](../../../ppcls/configs/PULC/person_exists/search.yaml) defines the configuration of hyperparameters searching in person exists or not scenarios. Use the following commands to complete hyperparameters searching.
```bash
python3 tools/search_strategy.py -c ppcls/configs/PULC/person_exists/search.yaml
```
**Note**:Regarding the search part, we are also constantly improving, so stay tuned.
<a name="4.2"></a>
#### 4.2 Custom search configuration
You can also modify the configuration of hyperparameters searching based on training results or your parameter tuning experience.
Modify the `search_values` field in `lrs` to modify the list of learning rate search values;
Modify the `search_values` field in `resolutions` to modify the search value list of resolutions;
Modify the `search_values` field in `ra_probs` to modify the search value list of RandAugment activation probability;
Modify the `search_values` field in `re_probs` to modify the search value list of RnadomErasing on probability;
Modify the `search_values` field in `lr_mult_list` to modify the lr_mult search value list;
Modify the `search_values` field in `teacher` to modify the search list of the teacher model.
After the search is completed, the final results will be generated in `output/search_person_exists`, where, except for `search_res`, the directories in `output/search_person_exists` are the weights and training log files of the results of the corresponding hyperparameters of each search training, ` search_res` corresponds to the result of knowledge distillation, that is, the final model. The weights of the model are stored in `output/output_dir/search_person_exists/DistillationModel/best_model_student.pdparams`.
此差异已折叠。
......@@ -541,9 +541,9 @@ The accuracy and speed indicators of MobileViT series models are shown in the fo
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(M) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| MobileViT_XXS | 0.6867 | 0.8878 | - | - | - | 1849.35 | 5.59 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileViT_XXS_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileViT_XXS_infer.tar) |
| MobileViT_XXS | 0.6867 | 0.8878 | - | - | - | 337.24 | 1.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileViT_XXS_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileViT_XXS_infer.tar) |
| MobileViT_XS | 0.7454 | 0.9227 | - | - | - | 930.75 | 2.33 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileViT_XS_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileViT_XS_infer.tar) |
| MobileViT_S | 0.7814 | 0.9413 | - | - | - | 337.24 | 1.28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileViT_S_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileViT_S_infer.tar) |
| MobileViT_S | 0.7814 | 0.9413 | - | - | - | 1849.35 | 5.59 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileViT_S_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileViT_S_infer.tar) |
<a name="26"></a>
......
此差异已折叠。
# PaddleClas FAQ Summary - 2022 Season 1
## Before You Read
- We collect some frequently asked questions in issues and user groups since PaddleClas is open-sourced and provide brief answers, aiming to give some reference for the majority to save you from twists and turns.
- There are many talents in the field of image classification, recognition and retrieval with quickly updated models and papers, and the answers here mainly rely on our limited project practice, so it is not possible to cover all facets. We sincerely hope that the man of insight will help to supplement and correct the content, thanks a lot.
## Catalogue
- [1. Theory](#1-theory)
- [2. Practice](#2-actual-combat)
- [2.1 Common problems of training and evaluation](#21-common-problems-of-training-and-evaluation)
- [Q2.1.1 How to freeze the parameters of some layers during training?](#q211-how-to-freeze-the-parameters-of-some-layers-during-training)
<a name="1"></a>
## 1. Theory
<a name="2"></a>
## 2. Practice
<a name="2.1"></a>
### 2.1 Common problems of training and evaluation
#### Q2.1.1 How to freeze the parameters of some layers during training?
**A**: There are currently three methods available
1. Manually modify the model code, use `paddle.ParamAttr(learning_rate=0.0)`, and set the learning rate of the frozen layer to 0.0. For details, see [paddle.ParamAttr documentation](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/api/paddle/ParamAttr_en.html#paramattr). The following code can set the learning rate of the weight parameter of the self.conv layer to 0.0.
```python
self.conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(learning_rate=0.0), # <-- set here
bias_attr=False,
data_format=data_format)
```
2. Manually set stop_gradient=True for the frozen layer, please refer to [this link](https://github.com/RainFrost1/PaddleClas/blob/24e968b8d9f7d9e2309e713cbf2afe8fda9deacd/ppcls/engine/train/train_idml.py#L40-L66). When using this method, after the gradient is returned to the layer which set strop_gradient=True, the gradient backward is stopped, that is, the weight of the previous layer will be fixed.
3. After loss.backward() and before optimizer.step(), use the clear_gradients() method in nn.Layer. For the layer to be fixed, call this method without affecting the loss return. The following code can clear the gradient of a layer or the gradient of a parameter of a layer
```python
import paddle
linear = paddle.nn.Linear(3, 4)
x = paddle.randn([4, 3])
y = linear(x)
loss = y.sum().backward()
print(linear.weight.grad)
print(linear.bias.grad)
linear.clear_gradients() # clear the gradients of the entire layer
# linear.weight.clear_grad() # Only clear the gradient of the weight parameter of the Linear layer
print(linear.weight.grad)
print(linear.bias.grad)
```
......@@ -18,6 +18,6 @@ MobileViT is a lightweight visual Transformer network that can be used as a gene
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(M) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| MobileViT_XXS | 0.6867 | 0.8878 | 0.690 | - | 1849.35 | 5.59 |
| MobileViT_XXS | 0.6867 | 0.8878 | 0.690 | - | 337.24 | 1.28 |
| MobileViT_XS | 0.7454 | 0.9227 | 0.747 | - | 930.75 | 2.33 |
| MobileViT_S | 0.7814 | 0.9413 | 0.783 | - | 337.24 | 1.28 |
| MobileViT_S | 0.7814 | 0.9413 | 0.783 | - | 1849.35 | 5.59 |
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册