未验证 提交 0b9f6226 编写于 作者: C cuicheng01 提交者: GitHub

Merge branch 'develop' into fix_multilabel

README_en.md README_ch.md
\ No newline at end of file \ No newline at end of file
...@@ -4,64 +4,85 @@ ...@@ -4,64 +4,85 @@
## 简介 ## 简介
飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 
<div align="center"> <div align="center">
<img src="./docs/images/class_simple.gif" width = "600" /> <img src="./docs/images/shituv2.gif" width = "450" />
<p>PULC实用图像分类模型效果展示</p> <p>PP-ShiTuV2图像识别系统效果展示</p>
</div> </div>
&nbsp;
<div align="center">
<img src="./docs/images/recognition.gif" width = "400" />
<p>PP-ShiTu图像识别系统效果展示</p>
</div>
## 近期更新
- 📢将于**6月15-6月17日晚20:30** 进行为期三天的课程直播,详细介绍超轻量图像分类方案,对各场景模型优化原理及使用方式进行拆解,之后还有产业案例全流程实操,对各类痛难点解决方案进行手把手教学,加上现场互动答疑,抓紧扫码上车吧!
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/45199522/173483779-2332f990-4941-4f8d-baee-69b62035fc31.png" width = "200" height = "200"/> <img src="./docs/images/class_simple.gif" width = "600" />
<p>PULC实用图像分类模型效果展示</p>
</div> </div>
- 🔥️ 2022.6.15 发布[PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md),CPU推理3ms,精度比肩SwinTransformer,覆盖人、车、OCR场景九大常见任务。
- 2022.5.26 [飞桨产业实践范例直播课](http://aglc.cn/v-c4FAR),解读**超轻量重点区域人员出入管理方案**
- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475),具体内容可以在 AI Studio 上体验。 ## 近期更新
- 🔥️ 发布[PP-ShiTuV2](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md),recall1精度提升8个点,覆盖[20+识别场景](./docs/zh_CN/introduction/ppshitu_application_scenarios.md),新增[库管理工具](./deploy/shitu_index_manager/)[Android Demo](./docs/zh_CN/quick_start/quick_start_recognition.md)全新体验。
- 2022.9.4 新增[生鲜产品自主结算范例库](./docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md),具体内容可以在AI Studio上体验。
- 2022.6.15 发布[PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md),CPU推理3ms,精度比肩SwinTransformer,覆盖人、车、OCR场景九大常见任务。
- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475),具体内容可以在 AI Studio 上体验。
- 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNetv2](./docs/zh_CN/models/PP-LCNetV2.md) - 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNetv2](./docs/zh_CN/models/PP-LCNetV2.md)
- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)
- [more](./docs/zh_CN/others/update_history.md) - [more](./docs/zh_CN/others/update_history.md)
## 特性 ## 特性
PaddleClas发布了[PP-HGNet](docs/zh_CN/models/PP-HGNet.md)[PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)[PP-LCNet](docs/zh_CN/models/PP-LCNet.md)[SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)等算法, PaddleClas发布了[PP-HGNet](docs/zh_CN/models/PP-HGNet.md)[PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)[PP-LCNet](docs/zh_CN/models/PP-LCNet.md)[SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)等算法,
并支持多种图像分类、识别相关算法,在此基础上打造[PULC超轻量图像分类方案](docs/zh_CN/PULC/PULC_quickstart.md)[PP-ShiTu图像识别系统](./docs/zh_CN/quick_start/quick_start_recognition.md) 并支持多种图像分类、识别相关算法,在此基础上打造[PULC超轻量图像分类方案](docs/zh_CN/PULC/PULC_quickstart.md)[PP-ShiTu图像识别系统](./docs/zh_CN/quick_start/quick_start_recognition.md)
![](https://user-images.githubusercontent.com/19523330/173273046-239a42da-c88d-4c2c-94b1-2134557afa49.png) ![](https://user-images.githubusercontent.com/11568925/189267545-7a6eefa0-b4fc-4ed0-ae9d-7c6d53f59798.png)
## 欢迎加入技术交流群 ## 欢迎加入技术交流群
* 您可以扫描下面的微信/QQ二维码(添加小助手微信并回复“C”),加入PaddleClas微信交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。 * 欢迎加入PaddleClas 微信用户群(扫码填写问卷即可入群)
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/48054808/160531099-9811bbe6-cfbb-47d5-8bdb-c2b40684d7dd.png" width="200"/> <img src="https://user-images.githubusercontent.com/45199522/173483779-2332f990-4941-4f8d-baee-69b62035fc31.png" width = "200" height = "200"/>
<img src="https://user-images.githubusercontent.com/80816848/164383225-e375eb86-716e-41b4-a9e0-4b8a3976c1aa.jpg" width="200"/>
</div> </div>
## 快速体验 ## 快速体验
PULC超轻量图像分类方案快速体验:[点击这里](docs/zh_CN/PULC/PULC_quickstart.md) PULC超轻量图像分类方案快速体验:[点击这里](docs/zh_CN/PULC/PULC_quickstart.md)
PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md) PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)
PP-ShiTuV2 Android Demo APP,可扫描如下二维码,下载体验
<div align="center">
<img src="./docs/images/quick_start/android_demo/PPShiTu_qrcode.png" width = "240" height = "240" />
<p>PP-ShiTuV2 Android Demo</p>
</div>
## 产业实践范例库
- 基于PP-ShiTu v2的生鲜品自助结算: [点击这里](./docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md)
- 基于PULC人员出入视频管理: [点击这里](./docs/zh_CN/samples/Personnel_Access/README.md)
- 基于 PP-ShiTu 的智慧商超商品识别:[点击这里](./docs/zh_CN/Goods_Recognition/README.md)
- 基于PP-ShiTu电梯内电瓶车入室识别:[点击这里](./docs/zh_CN/samples//Electromobile_In_Elevator_Detection/README.md)
## 文档教程 ## 文档教程
- [环境准备](docs/zh_CN/installation/install_paddleclas.md) - [环境准备](docs/zh_CN/installation/install_paddleclas.md)
- [PP-ShiTuV2图像识别系统介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)
- [图像识别快速体验](docs/zh_CN/quick_start/quick_start_recognition.md)
- [20+应用场景库](docs/zh_CN/introduction/ppshitu_application_scenarios.md)
- 子模块算法介绍及模型训练
- [主体检测](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)
- [特征提取模型](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md)
- [向量检索](./docs/zh_CN/image_recognition_pipeline/vector_search.md)
- [哈希编码](docs/zh_CN/image_recognition_pipeline/deep_hashing.md)
- PipeLine 推理部署
- [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#2)
- [基于C++预测引擎推理](deploy/cpp_shitu/readme.md)
- [服务化部署](docs/zh_CN/inference_deployment/recognition_serving_deploy.md)
- [端侧部署](docs/zh_CN/inference_deployment/lite_shitu.md)
- [库管理工具](docs/zh_CN/inference_deployment/shitu_gallery_manager.md)
- [PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md) - [PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md)
- [超轻量图像分类快速体验](docs/zh_CN/PULC/PULC_quickstart.md) - [超轻量图像分类快速体验](docs/zh_CN/PULC/PULC_quickstart.md)
- [超轻量图像分类模型库](docs/zh_CN/PULC/PULC_model_list.md) - [超轻量图像分类模型库](docs/zh_CN/PULC/PULC_model_list.md)
...@@ -82,19 +103,6 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick ...@@ -82,19 +103,6 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick
- [端侧部署](docs/zh_CN/inference_deployment/paddle_lite_deploy.md) - [端侧部署](docs/zh_CN/inference_deployment/paddle_lite_deploy.md)
- [Paddle2ONNX模型转化与预测](deploy/paddle2onnx/readme.md) - [Paddle2ONNX模型转化与预测](deploy/paddle2onnx/readme.md)
- [模型压缩](deploy/slim/README.md) - [模型压缩](deploy/slim/README.md)
- [PP-ShiTu图像识别系统介绍](#图像识别系统介绍)
- [图像识别快速体验](docs/zh_CN/quick_start/quick_start_recognition.md)
- 模块介绍
- [主体检测](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)
- [特征提取模型](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md)
- [向量检索](./docs/zh_CN/image_recognition_pipeline/vector_search.md)
- [哈希编码](docs/zh_CN/image_recognition_pipeline/)
- [模型训练](docs/zh_CN/models_training/recognition.md)
- 推理部署
- [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#2)
- [基于C++预测引擎推理](deploy/cpp_shitu/readme.md)
- [服务化部署](docs/zh_CN/inference_deployment/recognition_serving_deploy.md)
- [端侧部署](deploy/lite_shitu/README.md)
- PP系列骨干网络模型 - PP系列骨干网络模型
- [PP-HGNet](docs/zh_CN/models/PP-HGNet.md) - [PP-HGNet](docs/zh_CN/models/PP-HGNet.md)
- [PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md) - [PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)
...@@ -103,6 +111,7 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick ...@@ -103,6 +111,7 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick
- 前沿算法 - 前沿算法
- [骨干网络和预训练模型库](docs/zh_CN/algorithm_introduction/ImageNet_models.md) - [骨干网络和预训练模型库](docs/zh_CN/algorithm_introduction/ImageNet_models.md)
- [度量学习](docs/zh_CN/algorithm_introduction/metric_learning.md) - [度量学习](docs/zh_CN/algorithm_introduction/metric_learning.md)
- [ReID](./docs/zh_CN/algorithm_introduction/reid.md)
- [模型压缩](docs/zh_CN/algorithm_introduction/model_prune_quantization.md) - [模型压缩](docs/zh_CN/algorithm_introduction/model_prune_quantization.md)
- [模型蒸馏](docs/zh_CN/algorithm_introduction/knowledge_distillation.md) - [模型蒸馏](docs/zh_CN/algorithm_introduction/knowledge_distillation.md)
- [数据增强](docs/zh_CN/advanced_tutorials/DataAugmentation.md) - [数据增强](docs/zh_CN/advanced_tutorials/DataAugmentation.md)
...@@ -113,63 +122,80 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick ...@@ -113,63 +122,80 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick
- [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md) - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
- [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md) - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
- [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md) - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
- [图像分类FAQ第三季](docs/zh_CN/faq_series/faq_2022_s1.md)
- [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md) - [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
- [许可证书](#许可证书) - [许可证书](#许可证书)
- [贡献代码](#贡献代码) - [贡献代码](#贡献代码)
<a name="PULC超轻量图像分类方案"></a>
## PULC超轻量图像分类方案
<div align="center">
<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png" width = "800" />
</div>
PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法,可以自动训练得到轻量且高精度的图像分类模型。
PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型,CPU推理3ms,精度比肩SwinTransformer。
<a name="图像识别系统介绍"></a> <a name="图像识别系统介绍"></a>
## PP-ShiTu图像识别系统
## PP-ShiTuV2图像识别系统
<div align="center"> <div align="center">
<img src="./docs/images/structure.jpg" width = "800" /> <img src="./docs/images/structure.jpg" width = "800" />
</div> </div>
PP-ShiTu是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化8个方面,采用多种策略,对各个模块的模型进行优化,最终得到在CPU上仅0.2s即可完成10w+库的图像识别的系统。更多细节请参考[PP-ShiTu技术方案](https://arxiv.org/pdf/2111.00775.pdf)
<a name="分类效果展示"></a> PP-ShiTuV2是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面,采用多种策略,对各个模块的模型进行优化,PP-ShiTuV2相比V1,Recall1提升近8个点。更多细节请参考[PP-ShiTuV2详细介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)
## PULC实用图像分类模型效果展示
<div align="center">
<img src="docs/images/classification.gif">
</div>
<a name="识别效果展示"></a> <a name="识别效果展示"></a>
## PP-ShiTu图像识别系统效果展示
## PP-ShiTuV2图像识别系统效果展示
- 瓶装饮料识别 - 瓶装饮料识别
<div align="center"> <div align="center">
<img src="docs/images/drink_demo.gif"> <img src="docs/images/drink_demo.gif">
</div> </div>
- 商品识别 - 商品识别
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif" width = "400" />
</div> </div>
- 动漫人物识别 - 动漫人物识别
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769746-6b019700-d2d7-11eb-86df-f1d710999ba6.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769746-6b019700-d2d7-11eb-86df-f1d710999ba6.gif" width = "400" />
</div> </div>
- logo识别 - logo识别
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769837-7fde2a80-d2d7-11eb-9b69-04140e9d785f.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769837-7fde2a80-d2d7-11eb-9b69-04140e9d785f.gif" width = "400" />
</div> </div>
- 车辆识别 - 车辆识别
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769916-8ec4dd00-d2d7-11eb-8c60-42d89e25030c.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769916-8ec4dd00-d2d7-11eb-8c60-42d89e25030c.gif" width = "400" />
</div> </div>
<a name="PULC超轻量图像分类方案"></a>
## PULC超轻量图像分类方案
<div align="center">
<img src="https://user-images.githubusercontent.com/19523330/173011854-b10fcd7a-b799-4dfd-a1cf-9504952a3c44.png" width = "800" />
</div>
PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法,可以自动训练得到轻量且高精度的图像分类模型。
PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型,CPU推理3ms,精度比肩SwinTransformer。
<a name="分类效果展示"></a>
## PULC实用图像分类模型效果展示
<div align="center">
<img src="docs/images/classification.gif">
</div>
<a name="许可证书"></a> <a name="许可证书"></a>
## 许可证书 ## 许可证书
......
...@@ -7,20 +7,23 @@ ...@@ -7,20 +7,23 @@
PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios. PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
<div align="center"> <div align="center">
<img src="./docs/images/class_simple_en.gif" width = "600" /> <img src="./docs/images/shituv2.gif" width = "450" />
<p>PP-ShiTuV2 demo images</p>
PULC demo images
</div> </div>
&nbsp;
<div align="center"> <div align="center">
<img src="./docs/images/recognition.gif" width = "400" /> <img src="./docs/images/class_simple_en.gif" width = "600" />
PP-ShiTu demo images PULC demo images
</div> </div>
**Recent updates** **Recent updates**
- 🔥️ Release [PP-ShiTuV2](./docs/en/PPShiTu/PPShiTuV2_introduction.md), recall1 is improved by nearly 8 points, covering 20+ recognition scenarios, with [index management tool](./deploy/shitu_index_manager) and [Android Demo](./docs/en/quick_start/quick_start_recognition_en.md) for better experience.
- 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario. - 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario.
- 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf). - 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
...@@ -38,7 +41,7 @@ image classification and image recognition algorithms. ...@@ -38,7 +41,7 @@ image classification and image recognition algorithms.
Based on th algorithms above, PaddleClas release PP-ShiTu image recognition system and [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](docs/en/PULC/PULC_quickstart_en.md). Based on th algorithms above, PaddleClas release PP-ShiTu image recognition system and [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](docs/en/PULC/PULC_quickstart_en.md).
![](https://user-images.githubusercontent.com/19523330/173539361-68cf7ab1-7e3b-4e5e-b00f-1500719bd2a2.png) ![](https://user-images.githubusercontent.com/11568925/189268878-43d9d35b-90cf-425a-859e-767f8d94c5f7.png)
## Welcome to Join the Technical Exchange Group ## Welcome to Join the Technical Exchange Group
...@@ -52,12 +55,31 @@ Based on th algorithms above, PaddleClas release PP-ShiTu image recognition syst ...@@ -52,12 +55,31 @@ Based on th algorithms above, PaddleClas release PP-ShiTu image recognition syst
## Quick Start ## Quick Start
Quick experience of PP-ShiTu image recognition system:[Link](./docs/en/quick_start/quick_start_recognition_en.md) Quick experience of PP-ShiTu image recognition system:[Link](./docs/en/quick_start/quick_start_recognition_en.md)
<div align="center">
<img src="./docs/images/quick_start/android_demo/PPShiTu_qrcode.png" width = "40%" />
<p>PP-ShiTuV2 Android Demo</p>
</div>
Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassification models:[Link](docs/en/PULC/PULC_quickstart_en.md) Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassification models:[Link](docs/en/PULC/PULC_quickstart_en.md)
## Tutorials ## Tutorials
- [Install Paddle](./docs/en/installation/install_paddle_en.md) - [Install Paddle](./docs/en/installation/install_paddle_en.md)
- [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md) - [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md)
- [PP-ShiTuV2 Image Recognition Systems Introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md)
- [Image Recognition Quick Start](docs/en/quick_start/quick_start_recognition_en.md)
- [20+ application scenarios](docs/zh_CN/introduction/ppshitu_application_scenarios.md)
- Submodule Introduction and Model Training
- [Mainbody Detection](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)
- [Feature Extraction](./docs/en/image_recognition_pipeline/feature_extraction_en.md)
- [Vector Search](./docs/en/image_recognition_pipeline/vector_search_en.md)
- [Hash Encoding](./docs/zh_CN/image_recognition_pipeline/deep_hashing.md)
- PipeLine Inference and Deployment
- [Python Inference](docs/en/inference_deployment/python_deploy_en.md)
- [C++ Inference](deploy/cpp_shitu/readme_en.md)
- [Serving Deployment](docs/en/inference_deployment/recognition_serving_deploy_en.md)
- [Lite Deployment](docs/en/inference_deployment/paddle_lite_deploy_en.md)
- [Shitu Gallery Manager Tool](docs/zh_CN/inference_deployment/shitu_gallery_manager.md)
- [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md) - [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md)
- [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md) - [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md)
- [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md) - [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md)
...@@ -108,41 +130,55 @@ PULC models inference within 3ms on CPU devices, with accuracy comparable with S ...@@ -108,41 +130,55 @@ PULC models inference within 3ms on CPU devices, with accuracy comparable with S
<img src="./docs/images/structure.jpg" width = "800" /> <img src="./docs/images/structure.jpg" width = "800" />
</div> </div>
Image recognition can be divided into three steps: PP-ShiTuV2 is a practical lightweight general image recognition system, which is mainly composed of three modules: mainbody detection model, feature extraction model and vector search tool. The system adopts a variety of strategies including backbone network, loss function, data augmentations, optimal hyperparameters, pre-training model, model pruning and quantization. Compared to V1, PP-ShiTuV2, Recall1 is improved by nearly 8 points. For more details, please refer to [PP-ShiTuV2 introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md).
- (1)Identify region proposal for target objects through a detection model;
- (2)Extract features for each region proposal;
- (3)Search features in the retrieval database and output results;
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised. For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
<a name="Clas_Demo_images"></a> <a name="Rec_Demo_images"></a>
## PULC demo images ## PP-ShiTuV2 Demo images
- Drinks recognition
<div align="center"> <div align="center">
<img src="docs/images/classification_en.gif"> <img src="docs/images/drink_demo.gif">
</div> </div>
<a name="Rec_Demo_images"></a>
## Image Recognition Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
- Product recognition - Product recognition
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif" width = "400" />
</div> </div>
- Cartoon character recognition - Cartoon character recognition
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769746-6b019700-d2d7-11eb-86df-f1d710999ba6.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769746-6b019700-d2d7-11eb-86df-f1d710999ba6.gif" width = "400" />
</div> </div>
- Logo recognition - Logo recognition
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769837-7fde2a80-d2d7-11eb-9b69-04140e9d785f.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769837-7fde2a80-d2d7-11eb-9b69-04140e9d785f.gif" width = "400" />
</div> </div>
- Car recognition - Car recognition
<div align="center"> <div align="center">
<img src="https://user-images.githubusercontent.com/18028216/122769916-8ec4dd00-d2d7-11eb-8c60-42d89e25030c.gif" width = "400" /> <img src="https://user-images.githubusercontent.com/18028216/122769916-8ec4dd00-d2d7-11eb-8c60-42d89e25030c.gif" width = "400" />
</div> </div>
<a name="Clas_Demo_images"></a>
## PULC demo images
<div align="center">
<img src="docs/images/classification_en.gif">
</div>
<a name="License"></a> <a name="License"></a>
## License ## License
PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a> PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>
......
Global:
infer_imgs: "images/PULC/table_attribute/val_3610.jpg"
inference_model_dir: "./models/table_attribute_infer"
batch_size: 1
use_gpu: True
enable_mkldnn: True
cpu_num_threads: 10
benchmark: False
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
size: [224, 224]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: TableAttribute
TableAttribute:
source_threshold: 0.5
number_threshold: 0.5
color_threshold: 0.5
clarity_threshold : 0.5
obstruction_threshold: 0.5
angle_threshold: 0.5
...@@ -22,7 +22,7 @@ PreProcess: ...@@ -22,7 +22,7 @@ PreProcess:
scale: 0.00392157 scale: 0.00392157
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
channel_num: 3 channel_num: 3
- ToCHWImage: - ToCHWImage:
......
Global: Global:
infer_imgs: "./drink_dataset_v1.0/test_images/hongniu_1.jpg" infer_imgs: "./drink_dataset_v2.0/test_images/100.jpeg"
det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer" det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer"
rec_inference_model_dir: "./models/general_PPLCNet_x2_5_lite_v1.0_infer" rec_inference_model_dir: "./models/general_PPLCNetV2_base_pretrained_v1.0_infer"
rec_nms_thresold: 0.05 rec_nms_thresold: 0.05
batch_size: 1 batch_size: 1
...@@ -43,7 +43,7 @@ RecPreProcess: ...@@ -43,7 +43,7 @@ RecPreProcess:
scale: 0.00392157 scale: 0.00392157
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
- ToCHWImage: - ToCHWImage:
RecPostProcess: null RecPostProcess: null
...@@ -51,11 +51,11 @@ RecPostProcess: null ...@@ -51,11 +51,11 @@ RecPostProcess: null
# indexing engine config # indexing engine config
IndexProcess: IndexProcess:
index_method: "HNSW32" # supported: HNSW32, IVF, Flat index_method: "HNSW32" # supported: HNSW32, IVF, Flat
image_root: "./drink_dataset_v1.0/gallery" image_root: "./drink_dataset_v2.0/gallery"
index_dir: "./drink_dataset_v1.0/index" index_dir: "./drink_dataset_v2.0/index"
data_file: "./drink_dataset_v1.0/gallery/drink_label.txt" data_file: "./drink_dataset_v2.0/gallery/drink_label.txt"
index_operation: "new" # suported: "append", "remove", "new" index_operation: "new" # suported: "append", "remove", "new"
delimiter: " " delimiter: "\t"
dist_type: "IP" dist_type: "IP"
embedding_size: 512 embedding_size: 512
batch_size: 32 batch_size: 32
......
Global: Global:
infer_imgs: "./drink_dataset_v1.0/test_images/nongfu_spring.jpeg" infer_imgs: "./drink_dataset_v2.0/test_images/100.jpeg"
det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer" det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer"
rec_inference_model_dir: "./models/general_PPLCNet_x2_5_lite_v1.0_infer" rec_inference_model_dir: "./models/general_PPLCNetV2_base_pretrained_v1.0_infer"
rec_nms_thresold: 0.05 rec_nms_thresold: 0.05
batch_size: 1 batch_size: 1
...@@ -38,12 +38,15 @@ DetPostProcess: {} ...@@ -38,12 +38,15 @@ DetPostProcess: {}
RecPreProcess: RecPreProcess:
transform_ops: transform_ops:
- ResizeImage: - ResizeImage:
size: 224 size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage: - NormalizeImage:
scale: 0.00392157 scale: 1.0/255.0
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: hwc
- ToCHWImage: - ToCHWImage:
RecPostProcess: null RecPostProcess: null
...@@ -51,9 +54,9 @@ RecPostProcess: null ...@@ -51,9 +54,9 @@ RecPostProcess: null
# indexing engine config # indexing engine config
IndexProcess: IndexProcess:
index_method: "HNSW32" # supported: HNSW32, IVF, Flat index_method: "HNSW32" # supported: HNSW32, IVF, Flat
image_root: "./drink_dataset_v1.0/gallery/" image_root: "./drink_dataset_v2.0/gallery/"
index_dir: "./drink_dataset_v1.0/index" index_dir: "./drink_dataset_v2.0/index"
data_file: "./drink_dataset_v1.0/gallery/drink_label.txt" data_file: "./drink_dataset_v2.0/gallery/drink_label.txt"
index_operation: "new" # suported: "append", "remove", "new" index_operation: "new" # suported: "append", "remove", "new"
delimiter: "\t" delimiter: "\t"
dist_type: "IP" dist_type: "IP"
......
Global: Global:
infer_imgs: "./images/wangzai.jpg" infer_imgs: "./images/wangzai.jpg"
rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer" rec_inference_model_dir: "./models/general_PPLCNetV2_base_pretrained_v1.0_infer"
batch_size: 1 batch_size: 1
use_gpu: False use_gpu: False
enable_mkldnn: True enable_mkldnn: True
...@@ -15,14 +15,15 @@ Global: ...@@ -15,14 +15,15 @@ Global:
RecPreProcess: RecPreProcess:
transform_ops: transform_ops:
- ResizeImage: - ResizeImage:
resize_short: 256 size: [224, 224]
- CropImage: return_numpy: False
size: 224 interpolation: bilinear
backend: cv2
- NormalizeImage: - NormalizeImage:
scale: 0.00392157 scale: 1.0/255.0
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: hwc
- ToCHWImage: - ToCHWImage:
RecPostProcess: null RecPostProcess: null
# 服务器端C++预测 # 服务器端C++预测
本教程将介绍在服务器端部署PP-ShiTU的详细步骤。 本教程将介绍在服务器端部署PP-ShiTu的详细步骤。
## 目录 ## 目录
...@@ -30,39 +30,39 @@ ...@@ -30,39 +30,39 @@
- 下载最新版本cmake - 下载最新版本cmake
```shell ```shell
# 当前版本最新为3.22.0,根据实际情况自行下载,建议最新版本 # 当前版本最新为3.22.0,根据实际情况自行下载,建议最新版本
wget https://github.com/Kitware/CMake/releases/download/v3.22.0/cmake-3.22.0.tar.gz wget https://github.com/Kitware/CMake/releases/download/v3.22.0/cmake-3.22.0.tar.gz
tar xf cmake-3.22.0.tar.gz tar -xf cmake-3.22.0.tar.gz
``` ```
最终可以在当前目录下看到`cmake-3.22.0/`的文件夹。 最终可以在当前目录下看到`cmake-3.22.0/`的文件夹。
- 编译cmake,首先设置came源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的came源码路径,`install_path`为came的安装路径。在本例中,源码路径即为当前目录下的`cmake-3.22.0/` - 编译cmake,首先设置cmake源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的cmake源码路径,`install_path`为cmake的安装路径。在本例中,源码路径即为当前目录下的`cmake-3.22.0/`
```shell ```shell
cd ./cmake-3.22.0 cd ./cmake-3.22.0
export root_path=$PWD export root_path=$PWD
export install_path=${root_path}/cmake export install_path=${root_path}/cmake
``` ```
- 然后在cmake源码路径下,按照下面的方式进行编译 - 然后在cmake源码路径下,执行以下命令进行编译
```shell ```shell
./bootstrap --prefix=${install_path} ./bootstrap --prefix=${install_path}
make -j make -j
make install make install
``` ```
- 设置环境变量 - 编译安装cmake完成后,设置cmake的环境变量供后续程序使用
```shell ```shell
export PATH=${install_path}/bin:$PATH export PATH=${install_path}/bin:$PATH
#检查是否正常使用 #检查是否正常使用
cmake --version cmake --version
``` ```
此时,cmake就可以使用了 此时cmake就可以正常使用了
<a name="1.2"></a> <a name="1.2"></a>
...@@ -70,29 +70,33 @@ cmake --version ...@@ -70,29 +70,33 @@ cmake --version
* 首先需要从opencv官网上下载在Linux环境下源码编译的包,以3.4.7版本为例,下载及解压缩命令如下: * 首先需要从opencv官网上下载在Linux环境下源码编译的包,以3.4.7版本为例,下载及解压缩命令如下:
``` ```shell
wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/opencv-3.4.7.tar.gz
tar -xvf 3.4.7.tar.gz tar -xvf 3.4.7.tar.gz
``` ```
最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。 最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。
* 编译opencv,首先设置opencv源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的opencv源码路径,`install_path`为opencv的安装路径。在本例中,源码路径即为当前目录下的`opencv-3.4.7/` * 编译opencv,首先设置opencv源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的opencv源码路径,`install_path`为opencv的安装路径。在本例中,源码路径即为当前目录下的`opencv-3.4.7/`
```shell ```shell
cd ./opencv-3.4.7 # 进入deploy/cpp_shitu目录
export root_path=$PWD cd deploy/cpp_shitu
export install_path=${root_path}/opencv3
``` # 安装opencv
cd ./opencv-3.4.7
export root_path=$PWD
export install_path=${root_path}/opencv3
```
* 然后在opencv源码路径下,按照下面的方式进行编译。 * 然后在opencv源码路径下,按照下面的方式进行编译。
```shell ```shell
rm -rf build rm -rf build
mkdir build mkdir build
cd build cd build
cmake .. \ cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \ -DCMAKE_INSTALL_PREFIX=${install_path} \
-DCMAKE_BUILD_TYPE=Release \ -DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \ -DBUILD_SHARED_LIBS=OFF \
...@@ -110,21 +114,22 @@ cmake .. \ ...@@ -110,21 +114,22 @@ cmake .. \
-DWITH_TIFF=ON \ -DWITH_TIFF=ON \
-DBUILD_TIFF=ON -DBUILD_TIFF=ON
make -j make -j
make install make install
``` ```
* `make install`完成之后,会在该文件夹下生成opencv头文件和库文件,用于后面的PaddleClas代码编译。 * `make install`完成之后,会在该文件夹下生成opencv头文件和库文件,用于后面的PaddleClas代码编译。
以opencv3.4.7版本为例,最终在安装路径下的文件结构如下所示。**注意**:不同的opencv版本,下述的文件结构可能不同。 以opencv3.4.7版本为例,最终在安装路径下的文件结构如下所示。**注意**:不同的opencv版本,下述的文件结构可能不同。
``` ```log
opencv3/ opencv3/
|-- bin ├── bin
|-- include ├── include
|-- lib64 ├── lib
|-- share ├── lib64
``` └── share
```
<a name="1.3"></a> <a name="1.3"></a>
...@@ -139,18 +144,21 @@ opencv3/ ...@@ -139,18 +144,21 @@ opencv3/
* 如果希望获取最新预测库特性,可以从Paddle github上克隆最新代码,源码编译预测库。 * 如果希望获取最新预测库特性,可以从Paddle github上克隆最新代码,源码编译预测库。
* 可以参考[Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)的说明,从github上获取Paddle代码,然后进行编译,生成最新的预测库。使用git获取代码方法如下。 * 可以参考[Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)的说明,从github上获取Paddle代码,然后进行编译,生成最新的预测库。使用git获取代码方法如下。
```shell ```shell
git clone https://github.com/PaddlePaddle/Paddle.git # 进入deploy/cpp_shitu目录
``` cd deploy/cpp_shitu
git clone https://github.com/PaddlePaddle/Paddle.git
```
* 进入Paddle目录后,使用如下方法编译。 * 进入Paddle目录后,使用如下方法编译。
```shell ```shell
rm -rf build rm -rf build
mkdir build mkdir build
cd build cd build
cmake .. \ cmake .. \
-DWITH_CONTRIB=OFF \ -DWITH_CONTRIB=OFF \
-DWITH_MKL=ON \ -DWITH_MKL=ON \
-DWITH_MKLDNN=ON \ -DWITH_MKLDNN=ON \
...@@ -159,24 +167,25 @@ cmake .. \ ...@@ -159,24 +167,25 @@ cmake .. \
-DWITH_INFERENCE_API_TEST=OFF \ -DWITH_INFERENCE_API_TEST=OFF \
-DON_INFER=ON \ -DON_INFER=ON \
-DWITH_PYTHON=ON -DWITH_PYTHON=ON
make -j
make inference_lib_dist
```
更多编译参数选项可以参考[Paddle C++预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16) make -j
make inference_lib_dist
```
更多编译参数选项可以参考[Paddle C++预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)
* 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。 * 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。
``` ```log
build/paddle_inference_install_dir/ build/paddle_inference_install_dir/
|-- CMakeCache.txt ├── CMakeCache.txt
|-- paddle ├── paddle
|-- third_party ├── third_party
|-- version.txt └── version.txt
``` ```
其中`paddle`就是之后进行C++预测时所需的Paddle库,`version.txt`中包含当前预测库的版本信息。 其中`paddle`就是之后进行C++预测时所需的Paddle库,`version.txt`中包含当前预测库的版本信息。
<a name="1.3.2"></a> <a name="1.3.2"></a>
...@@ -187,33 +196,41 @@ build/paddle_inference_install_dir/ ...@@ -187,33 +196,41 @@ build/paddle_inference_install_dir/
`https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz``develop`版本为例,使用下述命令下载并解压: `https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz``develop`版本为例,使用下述命令下载并解压:
```shell ```shell
wget https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz # 进入deploy/cpp_shitu目录
cd deploy/cpp_shitu
tar -xvf paddle_inference.tgz wget https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz
```
tar -xvf paddle_inference.tgz
```
最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。 最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。
<a name="1.4"></a> <a name="1.4"></a>
### 1.4 安装faiss库 ### 1.4 安装faiss库
在安装`faiss`前,请安装`openblas``ubuntu`系统中安装命令如下:
```shell ```shell
# 下载 faiss apt-get install libopenblas-dev
git clone https://github.com/facebookresearch/faiss.git
cd faiss
export faiss_install_path=$PWD/faiss_install
cmake -B build . -DFAISS_ENABLE_PYTHON=OFF -DCMAKE_INSTALL_PREFIX=${faiss_install_path}
make -C build -j faiss
make -C build install
``` ```
在安装`faiss`前,请安装`openblas``ubuntu`系统中安装命令如下: 然后按照以下命令编译并安装faiss
```shell ```shell
apt-get install libopenblas-dev # 进入deploy/cpp_shitu目录
cd deploy/cpp_shitu
# 下载 faiss
git clone https://github.com/facebookresearch/faiss.git
cd faiss
export faiss_install_path=$PWD/faiss_install
cmake -B build . -DFAISS_ENABLE_PYTHON=OFF -DCMAKE_INSTALL_PREFIX=${faiss_install_path}
make -C build -j faiss
make -C build install
``` ```
注意本教程以安装faiss cpu版本为例,安装时请参考[faiss](https://github.com/facebookresearch/faiss)官网文档,根据需求自行安装。 注意本教程以安装faiss cpu版本为例,安装时请参考[faiss](https://github.com/facebookresearch/faiss)官网文档,根据需求自行安装。
...@@ -224,12 +241,14 @@ apt-get install libopenblas-dev ...@@ -224,12 +241,14 @@ apt-get install libopenblas-dev
编译命令如下,其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。同时,编译过程中需要下载编译`yaml-cpp`等C++库,请保持联网环境。 编译命令如下,其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。同时,编译过程中需要下载编译`yaml-cpp`等C++库,请保持联网环境。
```shell ```shell
# 进入deploy/cpp_shitu目录
cd deploy/cpp_shitu
sh tools/build.sh sh tools/build.sh
``` ```
具体地,`tools/build.sh`中内容如下,请根据具体路径修改。 具体地,`tools/build.sh`中内容如下,请根据具体路径和配置情况进行修改。
```shell ```shell
OPENCV_DIR=${opencv_install_dir} OPENCV_DIR=${opencv_install_dir}
...@@ -261,14 +280,13 @@ cd .. ...@@ -261,14 +280,13 @@ cd ..
上述命令中, 上述命令中,
* `OPENCV_DIR`为opencv编译安装的地址(本例中为`opencv-3.4.7/opencv3`文件夹的路径); * `OPENCV_DIR`:opencv编译安装的地址(本例中为`opencv-3.4.7/opencv3`文件夹的路径);
* `LIB_DIR`为下载的Paddle预测库(`paddle_inference`文件夹),或编译生成的Paddle预测库(`build/paddle_inference_install_dir`文件夹)的路径; * `LIB_DIR`:下载的Paddle预测库(`paddle_inference`文件夹),或编译生成的Paddle预测库(`build/paddle_inference_install_dir`文件夹)的路径;
* `CUDA_LIB_DIR`为cuda库文件地址,在docker中为`/usr/local/cuda/lib64` * `CUDA_LIB_DIR`:cuda库文件地址,在docker中为`/usr/local/cuda/lib64`
* `CUDNN_LIB_DIR`为cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/` * `CUDNN_LIB_DIR`:cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/`
* `TENSORRT_DIR`是tensorrt库文件地址,在dokcer中为`/usr/local/TensorRT6-cuda10.0-cudnn7/`,TensorRT需要结合GPU使用。 * `TENSORRT_DIR`:tensorrt库文件地址,在dokcer中为`/usr/local/TensorRT6-cuda10.0-cudnn7/`,TensorRT需要结合GPU使用。
* `FAISS_DIR`是faiss的安装地址 * `FAISS_DIR`:faiss的安装地址
* `FAISS_WITH_MKL`是指在编译faiss的过程中,是否使用了mkldnn,本文档中编译faiss,没有使用,而使用了openblas,故设置为`OFF`,若使用了mkldnn,则为`ON`. * `FAISS_WITH_MKL`:指在编译faiss的过程中是否使用mkldnn,本文档中编译faiss没有使用,而使用了openblas,故设置为`OFF`,若使用了mkldnn则为`ON`.
在执行上述命令,编译完成之后,会在当前路径下生成`build`文件夹,其中生成一个名为`pp_shitu`的可执行文件。 在执行上述命令,编译完成之后,会在当前路径下生成`build`文件夹,其中生成一个名为`pp_shitu`的可执行文件。
...@@ -276,60 +294,68 @@ cd .. ...@@ -276,60 +294,68 @@ cd ..
## 3. 运行demo ## 3. 运行demo
- 请参考[识别快速开始文档](../../docs/zh_CN/quick_start/quick_start_recognition.md),下载好相应的 轻量级通用主体检测模型、轻量级通用识别模型及瓶装饮料测试数据并解压。 - 按照如下命令下载好相应的轻量级通用主体检测模型、轻量级通用识别模型及瓶装饮料测试数据并解压。
```shell ```shell
# 进入deploy目录
cd deploy/
mkdir models mkdir models
cd models cd models
# 下载并解压主体检测模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar # 下载并解压特征提取模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar
cd .. cd ..
mkdir data mkdir data
cd data cd data
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar
tar -xf drink_dataset_v1.0.tar tar -xf drink_dataset_v2.0.tar
cd .. cd ..
``` ```
- 将相应的yaml文件拷到当前文件夹下 - 将相应的yaml文件拷到当前文件夹下
```shell ```shell
cp ../configs/inference_drink.yaml . cp ../configs/inference_drink.yaml ./
``` ```
-`inference_drink.yaml`中的相对路径,改成基于本目录的路径或者绝对路径。涉及到的参数有 -`inference_drink.yaml`中的相对路径,改成基于 `deploy/cpp_shitu` 目录的相对路径或者绝对路径。涉及到的参数有
- Global.infer_imgs :此参数可以是具体的图像地址,也可以是图像集所在的目录 - `Global.infer_imgs` :此参数可以是具体的图像地址,也可以是图像集所在的目录
- Global.det_inference_model_dir : 检测模型存储目录 - `Global.det_inference_model_dir` : 检测模型存储目录
- Global.rec_inference_model_dir : 识别模型存储目录 - `Global.rec_inference_model_dir` : 识别模型存储目录
- IndexProcess.index_dir : 检索库的存储目录,在示例中,检索库在下载的demo数据中。 - `IndexProcess.index_dir` : 检索库的存储目录,在示例中,检索库在下载的demo数据中。
- 字典转换 - 标签文件转换
由于python的检索库的字典,使用`pickle`进行的序列化存储,导致C++不方便读取,因此进行转换 由于python的检索库的字典是使用`pickle`转换得到的序列化存储结果,导致C++不方便读取,因此需要先转换成普通的文本文件。
```shell ```shell
python tools/transform_id_map.py -c inference_drink.yaml python3.7 tools/transform_id_map.py -c inference_drink.yaml
``` ```
转换成功后,在`IndexProcess.index_dir`目录下生成`id_map.txt`方便c++ 读取。 转换成功后,在`IndexProcess.index_dir`目录下生成`id_map.txt`以便在C++推理时读取。
- 执行程序 - 执行程序
```shell ```shell
./build/pp_shitu -c inference_drink.yaml ./build/pp_shitu -c inference_drink.yaml
# or
./build/pp_shitu -config inference_drink.yaml
``` ```
若对图像集进行检索,则可能得到,如下结果。注意,此结果只做展示,具体以实际运行结果为准。 `drink_dataset_v2.0/test_images/nongfu_spring.jpeg` 作为输入图像,则执行上述推理命令可以得到如下结果
同时,需注意的是,由于opencv 版本问题,会导致图像在预处理的过程中,resize产生细微差别,导致python 和c++结果,轻微不同,如bbox相差几个像素,检索结果小数点后3位diff等。但不会改变最终检索label。 ```log
../../deploy/drink_dataset_v2.0/test_images/nongfu_spring.jpeg:
result0: bbox[0, 0, 729, 1094], score: 0.688691, label: 农夫山泉-饮用天然水
```
![](../../docs/images/quick_start/shitu_c++_result.png) 由于python和C++的opencv实现存在部分不同,可能导致python推理和C++推理结果有微小差异。但基本不影响最终的检索结果。
<a name="4"></a> <a name="4"></a>
......
# PP-ShiTu在Paddle-Lite端侧部署
本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleClas PP-ShiTu模型的详细步骤。
Paddle Lite是飞桨轻量化推理引擎,为手机、IoT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。
## 1. 准备环境
### 运行准备
- 电脑(编译Paddle Lite)
- 安卓手机(armv7或armv8)
### 1.1 准备交叉编译环境
交叉编译环境用于编译 Paddle Lite 和 PaddleClas 的PP-ShiTu Lite demo。
支持多种开发环境,不同开发环境的编译流程请参考对应文档,请确保安装完成Java jdk、Android NDK(R17以上)。
1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
```shell
# 配置完成交叉编译环境后,更新环境变量
# for docker、Linux
source ~/.bashrc
# for Mac OS
source ~/.bash_profile
```
### 1.2 准备预测库
预测库有两种获取方式:
1. [**建议**]直接下载,预测库下载链接如下:
|平台| 架构 | 预测库下载链接|
|-|-|-|
|Android| arm7 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv7.clang.c++_static.with_extra.with_cv.tar.gz) |
| Android | arm8 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv.tar.gz) |
| Android | arm8(FP16) | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip) |
**注意**:1. 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo)
2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下:
```shell
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
# 如果使用编译方式,建议使用develop分支编译预测库
git checkout develop
# FP32
./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON
# FP16
./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON --with_arm82_fp16=ON
```
**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_andriod.html#id2)
直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv/`文件夹,通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。
预测库的文件目录如下:
```
inference_lite_lib.android.armv8/
|-- cxx C++ 预测库和头文件
| |-- include C++ 头文件
| | |-- paddle_api.h
| | |-- paddle_image_preprocess.h
| | |-- paddle_lite_factory_helper.h
| | |-- paddle_place.h
| | |-- paddle_use_kernels.h
| | |-- paddle_use_ops.h
| | `-- paddle_use_passes.h
| `-- lib C++预测库
| |-- libpaddle_api_light_bundled.a C++静态库
| `-- libpaddle_light_api_shared.so C++动态库
|-- java Java预测库
| |-- jar
| | `-- PaddlePredictor.jar
| |-- so
| | `-- libpaddle_lite_jni.so
| `-- src
|-- demo C++和Java示例代码
| |-- cxx C++ 预测库demo
| `-- java Java 预测库demo
```
## 2 模型准备
### 2.1 模型准备
PaddleClas 提供了转换并优化后的推理模型,可以直接参考下方 2.1.1 小节进行下载。如果需要使用其他模型,请参考后续 2.1.2 小节自行转换并优化模型。
#### 2.1.1 使用PaddleClas提供的推理模型
```shell
# 进入lite_ppshitu目录
cd $PaddleClas/deploy/lite_shitu
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/lite/ppshitu_lite_models_v1.2.tar
tar -xf ppshitu_lite_models_v1.2.tar
rm -f ppshitu_lite_models_v1.2.tar
```
#### 2.1.2 使用其他模型
Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。
**注意**:如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。
##### 2.1.2.1 安装paddle_lite_opt工具
安装`paddle_lite_opt`工具有如下两种方法:
1. [**建议**]pip安装paddlelite并进行转换
```shell
pip install paddlelite==2.10rc
```
2. 源码编译Paddle-Lite生成`paddle_lite_opt`工具
模型优化需要Paddle-Lite的`opt`可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下:
```shell
# 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout develop
# 启动编译
./lite/tools/build.sh build_optimize_tool
```
编译完成后,`opt`文件位于`build.opt/lite/api/`下,可通过如下方式查看`opt`的运行选项和使用方式;
```shell
cd build.opt/lite/api/
./opt
```
`opt`的使用方式与参数与上面的`paddle_lite_opt`完全一致。
之后使用`paddle_lite_opt`工具可以进行inference模型的转换。`paddle_lite_opt`的部分参数如下:
|选项|说明|
|-|-|
|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径|
|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径|
|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现,默认为naive_buffer|
|--optimize_out|优化模型的输出路径|
|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm|
更详细的`paddle_lite_opt`工具使用说明请参考[使用opt转化模型文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html)
`--model_file`表示inference模型的model文件地址,`--param_file`表示inference模型的param文件地址;`optimize_out`用于指定输出文件的名称(不需要添加`.nb`的后缀)。直接在命令行中运行`paddle_lite_opt`,也可以查看所有参数及其说明。
##### 2.1.2.2 转换示例
下面介绍使用`paddle_lite_opt`完成主体检测模型和识别模型的预训练模型,转成inference模型,最终转换成Paddle-Lite的优化模型的过程。
1. 转换主体检测模型
```shell
# 当前目录为 $PaddleClas/deploy/lite_shitu
# $code_path需替换成相应的运行目录,可以根据需要,将$code_path设置成需要的目录
export $code_path=~
cd $code_path
git clone https://github.com/PaddlePaddle/PaddleDetection.git
# 进入PaddleDetection根目录
cd PaddleDetection
# 将预训练模型导出为inference模型
python tools/export_model.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml -o weights=https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams export_post_process=False --output_dir=inference
# 将inference模型转化为Paddle-Lite优化模型
paddle_lite_opt --model_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdmodel --param_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdiparams --optimize_out=inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det
# 将转好的模型复制到lite_shitu目录下
cd $PaddleClas/deploy/lite_shitu
mkdir models
cp $code_path/PaddleDetection/inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det.nb $PaddleClas/deploy/lite_shitu/models
```
2. 转换识别模型
```shell
# 转换为Paddle-Lite模型
paddle_lite_opt --model_file=inference/inference.pdmodel --param_file=inference/inference.pdiparams --optimize_out=inference/rec
# 将模型文件拷贝到lite_shitu下
cp inference/rec.nb deploy/lite_shitu/models/
cd deploy/lite_shitu
```
**注意**`--optimize_out` 参数为优化后模型的保存路径,无需加后缀`.nb``--model_file` 参数为模型结构信息文件的路径,`--param_file` 参数为模型权重信息文件的路径,请注意文件名。
### 2.2 生成新的检索库
由于lite 版本的检索库用的是`faiss1.5.3`版本,与新版本不兼容,因此需要重新生成index库
#### 2.2.1 数据及环境配置
```shell
# 进入上级目录
cd ..
# 下载瓶装饮料数据集
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar
rm -rf drink_dataset_v1.0.tar
rm -rf drink_dataset_v1.0/index
# 安装1.5.3版本的faiss
pip install faiss-cpu==1.5.3
# 下载通用识别模型,可替换成自己的inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
rm -rf general_PPLCNet_x2_5_lite_v1.0_infer.tar
```
#### 2.2.2 生成新的index文件
```shell
# 生成新的index库,注意指定好识别模型的路径,同时将index_mothod修改成Flat,HNSW32和IVF在此版本中可能存在bug,请慎重使用。
# 如果使用自己的识别模型,对应的修改inference model的目录
python python/build_gallery.py -c configs/inference_drink.yaml -o Global.rec_inference_model_dir=general_PPLCNet_x2_5_lite_v1.0_infer -o IndexProcess.index_method=Flat
# 进入到lite_shitu目录
cd lite_shitu
mv ../drink_dataset_v1.0 .
```
### 2.3 将yaml文件转换成json文件
```shell
# 如果测试单张图像
python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_path images/demo.jpeg
# or
# 如果测试多张图像
python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_dir images
# 执行完成后,会在lit_shitu下生成shitu_config.json配置文件
```
### 2.4 index字典转换
由于python的检索库字典,使用`pickle`进行的序列化存储,导致C++不方便读取,因此需要进行转换
```shell
# 转化id_map.pkl为id_map.txt
python transform_id_map.py -c ../configs/inference_drink.yaml
```
转换成功后,会在`IndexProcess.index_dir`目录下生成`id_map.txt`
### 2.5 与手机联调
首先需要进行一些准备工作。
1. 准备一台arm8的安卓手机,如果编译的预测库是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI=arm7`
2. 电脑上安装ADB工具,用于调试。 ADB安装方式如下:
2.1. MAC电脑安装ADB:
```shell
brew cask install android-platform-tools
```
2.2. Linux安装ADB
```shell
sudo apt update
sudo apt install -y wget adb
```
2.3. Window安装ADB
win上安装需要去谷歌的安卓平台下载ADB软件包进行安装:[链接](https://developer.android.com/studio)
3. 手机连接电脑后,开启手机`USB调试`选项,选择`文件传输`模式,在电脑终端中输入:
```shell
adb devices
```
如果有device输出,则表示安装成功,如下所示:
```
List of devices attached
744be294 device
```
4. 编译lite部署代码生成移动端可执行文件
```shell
cd $PaddleClas/deploy/lite_shitu
# ${lite prediction library path}下载的Paddle-Lite库路径
inference_lite_path=${lite prediction library path}/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.with_cv/
mkdir $inference_lite_path/demo/cxx/ppshitu_lite
cp -r * $inference_lite_path/demo/cxx/ppshitu_lite
cd $inference_lite_path/demo/cxx/ppshitu_lite
# 执行编译,等待完成后得到可执行文件main
make ARM_ABI=arm8
#如果是arm7,则执行 make ARM_ABI = arm7 (或者在Makefile中修改该项)
```
5. 准备优化后的模型、预测库文件、测试图像。
```shell
mkdir deploy
mv ppshitu_lite_models_v1.1 deploy/
mv drink_dataset_v1.0 deploy/
mv images deploy/
mv shitu_config.json deploy/
cp pp_shitu deploy/
# 将C++预测动态库so文件复制到deploy文件夹中
cp ../../../cxx/lib/libpaddle_light_api_shared.so deploy/
```
执行完成后,deploy文件夹下将有如下文件格式:
```shell
deploy/
|-- ppshitu_lite_models_v1.1/
| |--mainbody_PPLCNet_x2_5_640_quant_v1.1_lite.nb 优化后的主体检测模型文件
| |--general_PPLCNet_x2_5_lite_v1.1_infer.nb 优化后的识别模型文件
|-- images/
| |--demo.jpg 图片文件
|-- drink_dataset_v1.0/ 瓶装饮料demo数据
| |--index 检索index目录
|-- pp_shitu 生成的移动端执行文件
|-- shitu_config.json 执行时参数配置文件
|-- libpaddle_light_api_shared.so Paddle-Lite库文件
```
**注意:**
* `shitu_config.json` 包含了目标检测的超参数,请按需进行修改
6. 启动调试,上述步骤完成后就可以使用ADB将文件夹 `deploy/` push到手机上运行,步骤如下:
```shell
# 将上述deploy文件夹push到手机上
adb push deploy /data/local/tmp/
adb shell
cd /data/local/tmp/deploy
export LD_LIBRARY_PATH=/data/local/tmp/deploy:$LD_LIBRARY_PATH
# 修改权限为可执行
chmod 777 pp_shitu
# 执行程序
./pp_shitu shitu_config.json
```
如果对代码做了修改,则需要重新编译并push到手机上。
运行效果如下:
```
images/demo.jpeg:
result0: bbox[344, 98, 527, 593], score: 0.811656, label: 红牛-强化型
result1: bbox[0, 0, 600, 600], score: 0.729664, label: 红牛-强化型
```
## FAQ
Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗?
A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件即可,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。
Q2:换一个图测试怎么做?
A2:替换 deploy 下的测试图像为你想要测试的图像,并重新生成json配置文件(或者直接修改图像路径),使用 ADB 再次 push 到手机上即可。
../../docs/zh_CN/inference_deployment/lite_shitu.md
\ No newline at end of file
...@@ -9,15 +9,15 @@ ...@@ -9,15 +9,15 @@
# 默认编译时的${PWD}=PaddleClas/deploy/paddleserving/ # 默认编译时的${PWD}=PaddleClas/deploy/paddleserving/
python_name=${1:-'python'} export python_name=${1:-'python'}
apt-get update apt-get update
apt install -y libcurl4-openssl-dev libbz2-dev apt install -y libcurl4-openssl-dev libbz2-dev
wget -nc https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar wget -nc https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar
tar xf centos_ssl.tar tar xf centos_ssl.tar
rm -rf centos_ssl.tar rm -rf centos_ssl.tar
mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k \mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k
mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k \mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k
ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10
ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10
ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so
......
...@@ -16,9 +16,8 @@ op: ...@@ -16,9 +16,8 @@ op:
#当op配置没有server_endpoints时,从local_service_conf读取本地服务配置 #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
local_service_conf: local_service_conf:
#uci模型路径 #uci模型路径
model_config: ../../models/general_PPLCNet_x2_5_lite_v1.0_serving model_config: ../../models/general_PPLCNetV2_base_pretrained_v1.0_serving
#计算硬件类型: 空缺时由devices决定(CPU/GPU),0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu #计算硬件类型: 空缺时由devices决定(CPU/GPU),0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 1 device_type: 1
...@@ -37,7 +36,7 @@ op: ...@@ -37,7 +36,7 @@ op:
local_service_conf: local_service_conf:
client_type: local_predictor client_type: local_predictor
device_type: 1 device_type: 1
devices: '0' devices: "0"
fetch_list: fetch_list:
- save_infer_model/scale_0.tmp_1 - save_infer_model/scale_0.tmp_1
model_config: ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ model_config: ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
import requests # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
import json #
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import base64 import base64
import json
import os import os
imgpath = "../../drink_dataset_v1.0/test_images/001.jpeg" import requests
image_path = "../../drink_dataset_v2.0/test_images/100.jpeg"
def bytes_to_base64(image_bytes: bytes) -> bytes:
"""encode bytes using base64 algorithm
Args:
image_bytes (bytes): bytes object to be encoded
Returns:
bytes: base64 bytes
"""
return base64.b64encode(image_bytes).decode('utf8')
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
if __name__ == "__main__": if __name__ == "__main__":
url = "http://127.0.0.1:18081/recognition/prediction" url = "http://127.0.0.1:18081/recognition/prediction"
with open(os.path.join(".", imgpath), 'rb') as file: with open(os.path.join(".", image_path), 'rb') as file:
image_data1 = file.read() image_bytes = file.read()
image = cv2_to_base64(image_data1)
data = {"key": ["image"], "value": [image]} image_base64 = bytes_to_base64(image_bytes)
data = {"key": ["image"], "value": [image_base64]}
for i in range(1): for i in range(1):
r = requests.post(url=url, data=json.dumps(data)) r = requests.post(url=url, data=json.dumps(data))
......
...@@ -15,20 +15,33 @@ try: ...@@ -15,20 +15,33 @@ try:
from paddle_serving_server_gpu.pipeline import PipelineClient from paddle_serving_server_gpu.pipeline import PipelineClient
except ImportError: except ImportError:
from paddle_serving_server.pipeline import PipelineClient from paddle_serving_server.pipeline import PipelineClient
import base64 import base64
import os
client = PipelineClient() client = PipelineClient()
client.connect(['127.0.0.1:9994']) client.connect(['127.0.0.1:9994'])
imgpath = "../../drink_dataset_v1.0/test_images/001.jpeg" image_path = "../../drink_dataset_v2.0/test_images/100.jpeg"
def bytes_to_base64(image_bytes: bytes) -> bytes:
"""encode bytes using base64 algorithm
Args:
image_bytes (bytes): bytes to be encoded
Returns:
bytes: base64 bytes
"""
return base64.b64encode(image_bytes).decode('utf8')
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
if __name__ == "__main__": if __name__ == "__main__":
with open(imgpath, 'rb') as file: with open(os.path.join(".", image_path), 'rb') as file:
image_data = file.read() image_bytes = file.read()
image = cv2_to_base64(image_data) image_base64 = bytes_to_base64(image_bytes)
for i in range(1): for i in range(1):
ret = client.predict(feed_dict={"image": image}, fetch=["result"]) ret = client.predict(
feed_dict={"image": image_base64}, fetch=["result"])
print(ret) print(ret)
...@@ -15,7 +15,7 @@ feed_var { ...@@ -15,7 +15,7 @@ feed_var {
shape: 6 shape: 6
} }
fetch_var { fetch_var {
name: "save_infer_model/scale_0.tmp_1" name: "batch_norm_25.tmp_2"
alias_name: "features" alias_name: "features"
is_lod_tensor: false is_lod_tensor: false
fetch_type: 1 fetch_type: 1
......
...@@ -15,7 +15,7 @@ feed_var { ...@@ -15,7 +15,7 @@ feed_var {
shape: 6 shape: 6
} }
fetch_var { fetch_var {
name: "save_infer_model/scale_0.tmp_1" name: "batch_norm_25.tmp_2"
alias_name: "features" alias_name: "features"
is_lod_tensor: false is_lod_tensor: false
fetch_type: 1 fetch_type: 1
......
...@@ -11,17 +11,24 @@ ...@@ -11,17 +11,24 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from paddle_serving_server.web_service import WebService, Op import base64
import json
import logging import logging
import numpy as np import os
import pickle
import sys import sys
import cv2 import cv2
from paddle_serving_app.reader import *
import base64
import os
import faiss import faiss
import pickle import numpy as np
import json from paddle_serving_app.reader import BGR2RGB
from paddle_serving_app.reader import Div
from paddle_serving_app.reader import Normalize
from paddle_serving_app.reader import RCNNPostprocess
from paddle_serving_app.reader import Resize
from paddle_serving_app.reader import Sequential
from paddle_serving_app.reader import Transpose
from paddle_serving_server.web_service import Op, WebService
class DetOp(Op): class DetOp(Op):
...@@ -101,11 +108,11 @@ class RecOp(Op): ...@@ -101,11 +108,11 @@ class RecOp(Op):
def init_op(self): def init_op(self):
self.seq = Sequential([ self.seq = Sequential([
BGR2RGB(), Resize((224, 224)), Div(255), BGR2RGB(), Resize((224, 224)), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
False), Transpose((2, 0, 1)) Transpose((2, 0, 1))
]) ])
index_dir = "../../drink_dataset_v1.0/index" index_dir = "../../drink_dataset_v2.0/index"
assert os.path.exists(os.path.join( assert os.path.exists(os.path.join(
index_dir, "vector.index")), "vector.index not found ..." index_dir, "vector.index")), "vector.index not found ..."
assert os.path.exists(os.path.join( assert os.path.exists(os.path.join(
...@@ -136,7 +143,7 @@ class RecOp(Op): ...@@ -136,7 +143,7 @@ class RecOp(Op):
}) })
self.det_boxes = boxes self.det_boxes = boxes
#construct batch images for rec # construct batch images for rec
imgs = [] imgs = []
for box in boxes: for box in boxes:
box = [int(x) for x in box["bbox"]] box = [int(x) for x in box["bbox"]]
...@@ -192,7 +199,7 @@ class RecOp(Op): ...@@ -192,7 +199,7 @@ class RecOp(Op):
pred["rec_scores"] = scores[i][0] pred["rec_scores"] = scores[i][0]
results.append(pred) results.append(pred)
#do nms # do NMS
results = self.nms_to_rec_results(results, self.rec_nms_thresold) results = self.nms_to_rec_results(results, self.rec_nms_thresold)
return {"result": str(results)}, None, "" return {"result": str(results)}, None, ""
......
...@@ -3,12 +3,12 @@ gpu_id=$1 ...@@ -3,12 +3,12 @@ gpu_id=$1
# PP-ShiTu CPP serving script # PP-ShiTu CPP serving script
if [[ -n "${gpu_id}" ]]; then if [[ -n "${gpu_id}" ]]; then
nohup python3.7 -m paddle_serving_server.serve \ nohup python3.7 -m paddle_serving_server.serve \
--model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNet_x2_5_lite_v1.0_serving \ --model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNetV2_base_pretrained_v1.0_serving \
--op GeneralPicodetOp GeneralFeatureExtractOp \ --op GeneralPicodetOp GeneralFeatureExtractOp \
--port 9400 --gpu_id="${gpu_id}" > log_PPShiTu.txt 2>&1 & --port 9400 --gpu_id="${gpu_id}" > log_PPShiTu.txt 2>&1 &
else else
nohup python3.7 -m paddle_serving_server.serve \ nohup python3.7 -m paddle_serving_server.serve \
--model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNet_x2_5_lite_v1.0_serving \ --model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNetV2_base_pretrained_v1.0_serving \
--op GeneralPicodetOp GeneralFeatureExtractOp \ --op GeneralPicodetOp GeneralFeatureExtractOp \
--port 9400 > log_PPShiTu.txt 2>&1 & --port 9400 > log_PPShiTu.txt 2>&1 &
fi fi
...@@ -12,20 +12,19 @@ ...@@ -12,20 +12,19 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import numpy as np import os
import pickle
from paddle_serving_client import Client
from paddle_serving_app.reader import *
import cv2 import cv2
import faiss import faiss
import os import numpy as np
import pickle from paddle_serving_client import Client
rec_nms_thresold = 0.05 rec_nms_thresold = 0.05
rec_score_thres = 0.5 rec_score_thres = 0.5
feature_normalize = True feature_normalize = True
return_k = 1 return_k = 1
index_dir = "../../drink_dataset_v1.0/index" index_dir = "../../drink_dataset_v2.0/index"
def init_index(index_dir): def init_index(index_dir):
...@@ -41,7 +40,7 @@ def init_index(index_dir): ...@@ -41,7 +40,7 @@ def init_index(index_dir):
return searcher, id_map return searcher, id_map
#get box # get box
def nms_to_rec_results(results, thresh=0.1): def nms_to_rec_results(results, thresh=0.1):
filtered_results = [] filtered_results = []
...@@ -91,21 +90,21 @@ def postprocess(fetch_dict, feature_normalize, det_boxes, searcher, id_map, ...@@ -91,21 +90,21 @@ def postprocess(fetch_dict, feature_normalize, det_boxes, searcher, id_map,
pred["rec_scores"] = scores[i][0] pred["rec_scores"] = scores[i][0]
results.append(pred) results.append(pred)
#do nms # do NMS
results = nms_to_rec_results(results, rec_nms_thresold) results = nms_to_rec_results(results, rec_nms_thresold)
return results return results
#do client # do client
if __name__ == "__main__": if __name__ == "__main__":
client = Client() client = Client()
client.load_client_config([ client.load_client_config([
"../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client", "../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client",
"../../models/general_PPLCNet_x2_5_lite_v1.0_client" "../../models/general_PPLCNetV2_base_pretrained_v1.0_client"
]) ])
client.connect(['127.0.0.1:9400']) client.connect(['127.0.0.1:9400'])
im = cv2.imread("../../drink_dataset_v1.0/test_images/001.jpeg") im = cv2.imread("../../drink_dataset_v2.0/test_images/100.jpeg")
im_shape = np.array(im.shape[:2]).reshape(-1) im_shape = np.array(im.shape[:2]).reshape(-1)
fetch_map = client.predict( fetch_map = client.predict(
feed={"image": im, feed={"image": im,
...@@ -113,7 +112,7 @@ if __name__ == "__main__": ...@@ -113,7 +112,7 @@ if __name__ == "__main__":
fetch=["features", "boxes"], fetch=["features", "boxes"],
batch=False) batch=False)
#add retrieval procedure # add retrieval procedure
det_boxes = fetch_map["boxes"] det_boxes = fetch_map["boxes"]
searcher, id_map = init_index(index_dir) searcher, id_map = init_index(index_dir)
results = postprocess(fetch_map, feature_normalize, det_boxes, searcher, results = postprocess(fetch_map, feature_normalize, det_boxes, searcher,
......
...@@ -12,16 +12,14 @@ ...@@ -12,16 +12,14 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os import os
import pickle
import cv2 import cv2
import faiss import faiss
import numpy as np import numpy as np
from tqdm import tqdm
import pickle
from paddleclas.deploy.utils import logger, config
from paddleclas.deploy.python.predict_rec import RecPredictor
from paddleclas.deploy.python.predict_rec import RecPredictor from paddleclas.deploy.python.predict_rec import RecPredictor
from paddleclas.deploy.utils import config, logger
from tqdm import tqdm
def split_datafile(data_file, image_root, delimiter="\t"): def split_datafile(data_file, image_root, delimiter="\t"):
...@@ -52,6 +50,7 @@ class GalleryBuilder(object): ...@@ -52,6 +50,7 @@ class GalleryBuilder(object):
self.config = config self.config = config
self.rec_predictor = RecPredictor(config) self.rec_predictor = RecPredictor(config)
assert 'IndexProcess' in config.keys(), "Index config not found ... " assert 'IndexProcess' in config.keys(), "Index config not found ... "
self.android_demo = config["Global"].get("android_demo", False)
self.build(config['IndexProcess']) self.build(config['IndexProcess'])
def build(self, config): def build(self, config):
...@@ -70,10 +69,86 @@ class GalleryBuilder(object): ...@@ -70,10 +69,86 @@ class GalleryBuilder(object):
"new", "remove", "append" "new", "remove", "append"
], "Only append, remove and new operation are supported" ], "Only append, remove and new operation are supported"
if self.android_demo:
self._create_index_for_android_demo(config, gallery_features, gallery_docs)
return
# vector.index: faiss index file # vector.index: faiss index file
# id_map.pkl: use this file to map id to image_doc # id_map.pkl: use this file to map id to image_doc
index, ids = None, None
if operation_method in ["remove", "append"]: if operation_method in ["remove", "append"]:
# if remove or append, vector.index and id_map.pkl must exist # if remove or append, load vector.index and id_map.pkl
index, ids = self._load_index(config)
index_method = config.get("index_method", "HNSW32")
else:
index_method, index, ids = self._create_index(config)
if index_method == "HNSW32":
logger.warning(
"The HNSW32 method dose not support 'remove' operation")
if operation_method != "remove":
# calculate id for new data
index, ids = self._add_gallery(index, ids, gallery_features, gallery_docs, config, operation_method)
else:
if index_method == "HNSW32":
raise RuntimeError(
"The index_method: HNSW32 dose not support 'remove' operation"
)
# remove ids in id_map, remove index data in faiss index
index, ids = self._rm_id_in_galllery(index, ids, gallery_docs)
# store faiss index file and id_map file
self._save_gallery(config, index, ids)
def _create_index_for_android_demo(self, config, gallery_features, gallery_docs):
if not os.path.exists(config["index_dir"]):
os.makedirs(config["index_dir"], exist_ok=True)
#build index
index = faiss.IndexFlatIP(config["embedding_size"])
index.add(gallery_features)
# calculate id for data
ids_now = (np.arange(0, len(gallery_docs))).astype(np.int64)
ids = {}
for i, d in zip(list(ids_now), gallery_docs):
ids[i] = d
self._save_gallery(config, index, ids)
def _extract_features(self, gallery_images, config):
# extract gallery features
if config["dist_type"] == "hamming":
gallery_features = np.zeros(
[len(gallery_images), config['embedding_size'] // 8],
dtype=np.uint8)
else:
gallery_features = np.zeros(
[len(gallery_images), config['embedding_size']],
dtype=np.float32)
#construct batch imgs and do inference
batch_size = config.get("batch_size", 32)
batch_img = []
for i, image_file in enumerate(tqdm(gallery_images)):
img = cv2.imread(image_file)
if img is None:
logger.error("img empty, please check {}".format(image_file))
exit()
img = img[:, :, ::-1]
batch_img.append(img)
if (i + 1) % batch_size == 0:
rec_feat = self.rec_predictor.predict(batch_img)
gallery_features[i - batch_size + 1:i + 1, :] = rec_feat
batch_img = []
if len(batch_img) > 0:
rec_feat = self.rec_predictor.predict(batch_img)
gallery_features[-len(batch_img):, :] = rec_feat
batch_img = []
return gallery_features
def _load_index(self, config):
assert os.path.join( assert os.path.join(
config["index_dir"], "vector.index" config["index_dir"], "vector.index"
), "The vector.index dose not exist in {} when 'index_operation' is not None".format( ), "The vector.index dose not exist in {} when 'index_operation' is not None".format(
...@@ -89,7 +164,9 @@ class GalleryBuilder(object): ...@@ -89,7 +164,9 @@ class GalleryBuilder(object):
ids = pickle.load(fd) ids = pickle.load(fd)
assert index.ntotal == len(ids.keys( assert index.ntotal == len(ids.keys(
)), "data number in index is not equal in in id_map" )), "data number in index is not equal in in id_map"
else: return index, ids
def _create_index(self, config):
if not os.path.exists(config["index_dir"]): if not os.path.exists(config["index_dir"]):
os.makedirs(config["index_dir"], exist_ok=True) os.makedirs(config["index_dir"], exist_ok=True)
index_method = config.get("index_method", "HNSW32") index_method = config.get("index_method", "HNSW32")
...@@ -116,16 +193,12 @@ class GalleryBuilder(object): ...@@ -116,16 +193,12 @@ class GalleryBuilder(object):
index_method, dist_type) index_method, dist_type)
index = faiss.IndexIDMap2(index) index = faiss.IndexIDMap2(index)
ids = {} ids = {}
return index_method, index, ids
if config["index_method"] == "HNSW32": def _add_gallery(self, index, ids, gallery_features, gallery_docs, config, operation_method):
logger.warning(
"The HNSW32 method dose not support 'remove' operation")
if operation_method != "remove":
# calculate id for new data
start_id = max(ids.keys()) + 1 if ids else 0 start_id = max(ids.keys()) + 1 if ids else 0
ids_now = ( ids_now = (
np.arange(0, len(gallery_images)) + start_id).astype(np.int64) np.arange(0, len(gallery_docs)) + start_id).astype(np.int64)
# only train when new index file # only train when new index file
if operation_method == "new": if operation_method == "new":
...@@ -139,12 +212,9 @@ class GalleryBuilder(object): ...@@ -139,12 +212,9 @@ class GalleryBuilder(object):
for i, d in zip(list(ids_now), gallery_docs): for i, d in zip(list(ids_now), gallery_docs):
ids[i] = d ids[i] = d
else: return index, ids
if config["index_method"] == "HNSW32":
raise RuntimeError( def _rm_id_in_galllery(self, index, ids, gallery_docs):
"The index_method: HNSW32 dose not support 'remove' operation"
)
# remove ids in id_map, remove index data in faiss index
remove_ids = list( remove_ids = list(
filter(lambda k: ids.get(k) in gallery_docs, ids.keys())) filter(lambda k: ids.get(k) in gallery_docs, ids.keys()))
remove_ids = np.asarray(remove_ids) remove_ids = np.asarray(remove_ids)
...@@ -152,7 +222,9 @@ class GalleryBuilder(object): ...@@ -152,7 +222,9 @@ class GalleryBuilder(object):
for k in remove_ids: for k in remove_ids:
del ids[k] del ids[k]
# store faiss index file and id_map file return index, ids
def _save_gallery(self, config, index, ids):
if config["dist_type"] == "hamming": if config["dist_type"] == "hamming":
faiss.write_index_binary( faiss.write_index_binary(
index, os.path.join(config["index_dir"], "vector.index")) index, os.path.join(config["index_dir"], "vector.index"))
...@@ -163,40 +235,6 @@ class GalleryBuilder(object): ...@@ -163,40 +235,6 @@ class GalleryBuilder(object):
with open(os.path.join(config["index_dir"], "id_map.pkl"), 'wb') as fd: with open(os.path.join(config["index_dir"], "id_map.pkl"), 'wb') as fd:
pickle.dump(ids, fd) pickle.dump(ids, fd)
def _extract_features(self, gallery_images, config):
# extract gallery features
if config["dist_type"] == "hamming":
gallery_features = np.zeros(
[len(gallery_images), config['embedding_size'] // 8],
dtype=np.uint8)
else:
gallery_features = np.zeros(
[len(gallery_images), config['embedding_size']],
dtype=np.float32)
#construct batch imgs and do inference
batch_size = config.get("batch_size", 32)
batch_img = []
for i, image_file in enumerate(tqdm(gallery_images)):
img = cv2.imread(image_file)
if img is None:
logger.error("img empty, please check {}".format(image_file))
exit()
img = img[:, :, ::-1]
batch_img.append(img)
if (i + 1) % batch_size == 0:
rec_feat = self.rec_predictor.predict(batch_img)
gallery_features[i - batch_size + 1:i + 1, :] = rec_feat
batch_img = []
if len(batch_img) > 0:
rec_feat = self.rec_predictor.predict(batch_img)
gallery_features[-len(batch_img):, :] = rec_feat
batch_img = []
return gallery_features
def main(config): def main(config):
GalleryBuilder(config) GalleryBuilder(config)
......
...@@ -364,3 +364,49 @@ class VehicleAttribute(object): ...@@ -364,3 +364,49 @@ class VehicleAttribute(object):
).astype(np.int8).tolist() ).astype(np.int8).tolist()
batch_res.append({"attributes": label_res, "output": pred_res}) batch_res.append({"attributes": label_res, "output": pred_res})
return batch_res return batch_res
class TableAttribute(object):
def __init__(
self,
source_threshold=0.5,
number_threshold=0.5,
color_threshold=0.5,
clarity_threshold=0.5,
obstruction_threshold=0.5,
angle_threshold=0.5, ):
self.source_threshold = source_threshold
self.number_threshold = number_threshold
self.color_threshold = color_threshold
self.clarity_threshold = clarity_threshold
self.obstruction_threshold = obstruction_threshold
self.angle_threshold = angle_threshold
def __call__(self, batch_preds, file_names=None):
# postprocess output of predictor
batch_res = []
for res in batch_preds:
res = res.tolist()
label_res = []
source = 'Scanned' if res[0] > self.source_threshold else 'Photo'
number = 'Little' if res[1] > self.number_threshold else 'Numerous'
color = 'Black-and-White' if res[
2] > self.color_threshold else 'Multicolor'
clarity = 'Clear' if res[3] > self.clarity_threshold else 'Blurry'
obstruction = 'Without-Obstacles' if res[
4] > self.number_threshold else 'With-Obstacles'
angle = 'Horizontal' if res[
5] > self.number_threshold else 'Tilted'
label_res = [source, number, color, clarity, obstruction, angle]
threshold_list = [
self.source_threshold, self.number_threshold,
self.color_threshold, self.clarity_threshold,
self.obstruction_threshold, self.angle_threshold
]
pred_res = (np.array(res) > np.array(threshold_list)
).astype(np.int8).tolist()
batch_res.append({"attributes": label_res, "output": pred_res})
return batch_res
...@@ -11,7 +11,6 @@ ...@@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import os import os
import cv2 import cv2
...@@ -136,6 +135,7 @@ def main(config): ...@@ -136,6 +135,7 @@ def main(config):
for number, result_dict in enumerate(batch_results): for number, result_dict in enumerate(batch_results):
if "PersonAttribute" in config[ if "PersonAttribute" in config[
"PostProcess"] or "VehicleAttribute" in config[ "PostProcess"] or "VehicleAttribute" in config[
"PostProcess"] or "TableAttribute" in config[
"PostProcess"]: "PostProcess"]:
filename = batch_names[number] filename = batch_names[number]
print("{}:\t {}".format(filename, result_dict)) print("{}:\t {}".format(filename, result_dict))
......
../../docs/zh_CN/inference_deployment/shitu_gallery_manager.md
\ No newline at end of file
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
from PyQt5 import QtCore, QtGui, QtWidgets
import mod.mainwindow
from paddleclas.deploy.utils import config, logger
from paddleclas.deploy.python.predict_rec import RecPredictor
from fastapi import FastAPI
import uvicorn
import numpy as np
import faiss
from typing import List
import pickle
import cv2
import socket
import json
import operator
from multiprocessing import Process
"""
完整的index库如下:
root_path/ # 库存储目录
|-- image_list.txt # 图像列表,每行:image_path label。由前端生成及修改。后端只读
|-- features.pkl # 建库之后,保存的embedding向量,后端生成,前端无需操作
|-- images # 图像存储目录,由前端生成及增删查等操作。后端只读
| |-- md5.jpg
| |-- md5.jpg
| |-- ……
|-- index # 真正的生成的index库存储目录,后端生成及操作,前端无需操作。
| |-- vector.index # faiss生成的索引库
| |-- id_map.pkl # 索引文件
"""
class ShiTuIndexManager(object):
def __init__(self, config):
self.root_path = None
self.image_list_path = "image_list.txt"
self.image_dir = "images"
self.index_path = "index/vector.index"
self.id_map_path = "index/id_map.pkl"
self.features_path = "features.pkl"
self.index = None
self.id_map = None
self.features = None
self.config = config
self.predictor = RecPredictor(config)
def _load_pickle(self, path):
if os.path.exists(path):
return pickle.load(open(path, 'rb'))
else:
return None
def _save_pickle(self, path, data):
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, 'wb') as fd:
pickle.dump(data, fd)
def _load_index(self):
self.index = faiss.read_index(
os.path.join(self.root_path, self.index_path))
self.id_map = self._load_pickle(
os.path.join(self.root_path, self.id_map_path))
self.features = self._load_pickle(
os.path.join(self.root_path, self.features_path))
def _save_index(self, index, id_map, features):
faiss.write_index(index, os.path.join(self.root_path, self.index_path))
self._save_pickle(os.path.join(self.root_path, self.id_map_path),
id_map)
self._save_pickle(os.path.join(self.root_path, self.features_path),
features)
def _update_path(self, root_path, image_list_path=None):
if root_path == self.root_path:
pass
else:
self.root_path = root_path
if not os.path.exists(os.path.join(root_path, "index")):
os.mkdir(os.path.join(root_path, "index"))
if image_list_path is not None:
self.image_list_path = image_list_path
def _cal_featrue(self, image_list):
batch_images = []
featrures = None
cnt = 0
for idx, image_path in enumerate(image_list):
image = cv2.imread(image_path)
if image is None:
return "{} is broken or not exist. Stop"
else:
image = image[:, :, ::-1]
batch_images.append(image)
cnt += 1
if cnt % self.config["Global"]["batch_size"] == 0 or (
idx + 1) == len(image_list):
if len(batch_images) == 0:
continue
batch_results = self.predictor.predict(batch_images)
featrures = batch_results if featrures is None else np.concatenate(
(featrures, batch_results), axis=0)
batch_images = []
return featrures
def _split_datafile(self, data_file, image_root):
'''
data_file: image path and info, which can be splitted by spacer
image_root: image path root
delimiter: delimiter
'''
gallery_images = []
gallery_docs = []
gallery_ids = []
with open(data_file, 'r', encoding='utf-8') as f:
lines = f.readlines()
for _, ori_line in enumerate(lines):
line = ori_line.strip().split()
text_num = len(line)
assert text_num >= 2, f"line({ori_line}) must be splitted into at least 2 parts, but got {text_num}"
image_file = os.path.join(image_root, line[0])
gallery_images.append(image_file)
gallery_docs.append(ori_line.strip())
gallery_ids.append(os.path.basename(line[0]).split(".")[0])
return gallery_images, gallery_docs, gallery_ids
def create_index(self,
image_list: str,
index_method: str = "HNSW32",
image_root: str = None):
if not os.path.exists(image_list):
return "{} is not exist".format(image_list)
if index_method.lower() not in ['hnsw32', 'ivf', 'flat']:
return "The index method Only support: HNSW32, IVF, Flat"
self._update_path(os.path.dirname(image_list), image_list)
# get image_paths
image_root = image_root if image_root is not None else self.root_path
gallery_images, gallery_docs, image_ids = self._split_datafile(
image_list, image_root)
# gernerate index
if index_method == "IVF":
index_method = index_method + str(
min(max(int(len(gallery_images) // 32), 2), 65536)) + ",Flat"
index = faiss.index_factory(
self.config["IndexProcess"]["embedding_size"], index_method,
faiss.METRIC_INNER_PRODUCT)
self.index = faiss.IndexIDMap2(index)
features = self._cal_featrue(gallery_images)
self.index.train(features)
index_ids = np.arange(0, len(gallery_images)).astype(np.int64)
self.index.add_with_ids(features, index_ids)
self.id_map = dict()
for i, d in zip(list(index_ids), gallery_docs):
self.id_map[i] = d
self.features = {
"features": features,
"index_method": index_method,
"image_ids": image_ids,
"index_ids": index_ids.tolist()
}
self._save_index(self.index, self.id_map, self.features)
def open_index(self, root_path: str, image_list_path: str) -> str:
self._update_path(root_path)
_, _, image_ids = self._split_datafile(image_list_path, root_path)
if os.path.exists(os.path.join(self.root_path, self.index_path)) and \
os.path.exists(os.path.join(self.root_path, self.id_map_path)) and \
os.path.exists(os.path.join(self.root_path, self.features_path)):
self._update_path(root_path)
self._load_index()
if operator.eq(set(image_ids), set(self.features['image_ids'])):
return ""
else:
return "The image list is different from index, Please update index"
else:
return "File not exist: features.pkl, vector.index, id_map.pkl"
def update_index(self, image_list: str, image_root: str = None) -> str:
if self.index and self.id_map and self.features:
image_paths, image_docs, image_ids = self._split_datafile(
image_list,
image_root if image_root is not None else self.root_path)
# for add image
add_ids = list(
set(image_ids).difference(set(self.features["image_ids"])))
add_indexes = [i for i, x in enumerate(image_ids) if x in add_ids]
add_image_paths = [image_paths[i] for i in add_indexes]
add_image_docs = [image_docs[i] for i in add_indexes]
add_image_ids = [image_ids[i] for i in add_indexes]
self._add_index(add_image_paths, add_image_docs, add_image_ids)
# delete images
delete_ids = list(
set(self.features["image_ids"]).difference(set(image_ids)))
self._delete_index(delete_ids)
self._save_index(self.index, self.id_map, self.features)
return ""
else:
return "Failed. Please create or open index first"
def _add_index(self, image_list: List, image_docs: List, image_ids: List):
if len(image_ids) == 0:
return
featrures = self._cal_featrue(image_list)
index_ids = (np.arange(0, len(image_list)) + max(self.id_map.keys()) +
1).astype(np.int64)
self.index.add_with_ids(featrures, index_ids)
for i, d in zip(index_ids, image_docs):
self.id_map[i] = d
self.features['features'] = np.concatenate(
[self.features['features'], featrures], axis=0)
self.features['image_ids'].extend(image_ids)
self.features['index_ids'].extend(index_ids.tolist())
def _delete_index(self, image_ids: List):
if len(image_ids) == 0:
return
indexes = [
i for i, x in enumerate(self.features['image_ids'])
if x in image_ids
]
self.features["features"] = np.delete(self.features["features"],
indexes,
axis=0)
self.features["image_ids"] = np.delete(np.asarray(
self.features["image_ids"]),
indexes,
axis=0).tolist()
index_ids = np.delete(np.asarray(self.features["index_ids"]),
indexes,
axis=0).tolist()
id_map_values = [self.id_map[i] for i in index_ids]
self.index.reset()
ids = np.arange(0, len(id_map_values)).astype(np.int64)
self.index.add_with_ids(self.features['features'], ids)
self.id_map.clear()
for i, d in zip(ids, id_map_values):
self.id_map[i] = d
self.features["index_ids"] = ids
app = FastAPI()
@app.get("/new_index")
def new_index(image_list_path: str,
index_method: str = "HNSW32",
index_root_path: str = None,
force: bool = False):
result = ""
try:
if index_root_path is not None:
image_list_path = os.path.join(index_root_path, image_list_path)
index_path = os.path.join(index_root_path, "index", "vector.index")
id_map_path = os.path.join(index_root_path, "index", "id_map.pkl")
if not (os.path.exists(index_path)
and os.path.exists(id_map_path)) or force:
manager.create_index(image_list_path, index_method, index_root_path)
else:
result = "There alrealy has index in {}".format(index_root_path)
except Exception as e:
result = e.__str__()
data = {"error_message": result}
return json.dumps(data).encode()
@app.get("/open_index")
def open_index(index_root_path: str, image_list_path: str):
result = ""
try:
image_list_path = os.path.join(index_root_path, image_list_path)
result = manager.open_index(index_root_path, image_list_path)
except Exception as e:
result = e.__str__()
data = {"error_message": result}
return json.dumps(data).encode()
@app.get("/update_index")
def update_index(image_list_path: str, index_root_path: str = None):
result = ""
try:
if index_root_path is not None:
image_list_path = os.path.join(index_root_path, image_list_path)
result = manager.update_index(image_list=image_list_path,
image_root=index_root_path)
except Exception as e:
result = e.__str__()
data = {"error_message": result}
return json.dumps(data).encode()
def FrontInterface(server_process=None):
front = QtWidgets.QApplication([])
main_window = mod.mainwindow.MainWindow(process=server_process)
main_window.showMaximized()
sys.exit(front.exec_())
def Server(args):
[app, host, port] = args
uvicorn.run(app, host=host, port=port)
if __name__ == '__main__':
args = config.parse_args()
model_config = config.get_config(args.config,
overrides=args.override,
show=True)
manager = ShiTuIndexManager(model_config)
try:
ip = socket.gethostbyname(socket.gethostname())
except:
ip = '127.0.0.1'
port = 8000
p_server = Process(target=Server, args=([app, ip, port],))
p_server.start()
# p_client = Process(target=FrontInterface, args=())
# p_client.start()
# p_client.join()
FrontInterface(p_server)
p_server.terminate()
sys.exit(0)
import os
from PyQt5 import QtCore, QtWidgets
from mod import image_list_manager as imglistmgr
from mod import utils
from mod import ui_addclassifydialog
from mod import ui_renameclassifydialog
class ClassifyUiContext(QtCore.QObject):
# 分类界面相关业务
selected = QtCore.pyqtSignal(str) # 选择分类信号
def __init__(self, ui: QtWidgets.QListView, parent: QtWidgets.QMainWindow,
image_list_mgr: imglistmgr.ImageListManager):
super(ClassifyUiContext, self).__init__()
self.__ui = ui
self.__parent = parent
self.__imageListMgr = image_list_mgr
self.__menu = QtWidgets.QMenu()
self.__initMenu()
self.__initUi()
self.__connectSignal()
@property
def ui(self):
return self.__ui
@property
def parent(self):
return self.__parent
@property
def imageListManager(self):
return self.__imageListMgr
@property
def menu(self):
return self.__menu
def __initUi(self):
"""初始化分类界面"""
self.__ui.setEditTriggers(QtWidgets.QAbstractItemView.NoEditTriggers)
def __connectSignal(self):
"""连接信号"""
self.__ui.clicked.connect(self.uiClicked)
self.__ui.doubleClicked.connect(self.uiDoubleClicked)
def __initMenu(self):
"""初始化分类界面菜单"""
utils.setMenu(self.__menu, "添加分类", self.addClassify)
utils.setMenu(self.__menu, "移除分类", self.removeClassify)
utils.setMenu(self.__menu, "重命名分类", self.renemeClassify)
self.__ui.setContextMenuPolicy(QtCore.Qt.CustomContextMenu)
self.__ui.customContextMenuRequested.connect(self.__showMenu)
def __showMenu(self, pos):
"""显示分类界面菜单"""
if len(self.__imageListMgr.filePath) > 0:
self.__menu.exec_(self.__ui.mapToGlobal(pos))
def setClassifyList(self, classify_list):
"""设置分类列表"""
list_model = QtCore.QStringListModel(classify_list)
self.__ui.setModel(list_model)
def uiClicked(self, index):
"""分类列表点击"""
if not self.__ui.currentIndex().isValid():
return
txt = index.data()
self.selected.emit(txt)
def uiDoubleClicked(self, index):
"""分类列表双击"""
if not self.__ui.currentIndex().isValid():
return
ole_name = index.data()
dlg = QtWidgets.QDialog(parent=self.parent)
ui = ui_renameclassifydialog.Ui_RenameClassifyDialog()
ui.setupUi(dlg)
ui.oldNameLineEdit.setText(ole_name)
result = dlg.exec_()
new_name = ui.newNameLineEdit.text()
if result == QtWidgets.QDialog.Accepted:
mgr_result = self.__imageListMgr.renameClassify(ole_name, new_name)
if not mgr_result:
QtWidgets.QMessageBox.warning(self.parent, "重命名分类", "重命名分类错误")
else:
self.setClassifyList(self.__imageListMgr.classifyList)
self.__imageListMgr.writeFile()
def addClassify(self):
"""添加分类"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self.__parent, "提示",
"请先打开正确的图像库")
return
dlg = QtWidgets.QDialog(parent=self.parent)
ui = ui_addclassifydialog.Ui_AddClassifyDialog()
ui.setupUi(dlg)
result = dlg.exec_()
txt = ui.lineEdit.text()
if result == QtWidgets.QDialog.Accepted:
mgr_result = self.__imageListMgr.addClassify(txt)
if not mgr_result:
QtWidgets.QMessageBox.warning(self.parent, "添加分类", "添加分类错误")
else:
self.setClassifyList(self.__imageListMgr.classifyList)
def removeClassify(self):
"""移除分类"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self.__parent, "提示",
"请先打开正确的图像库")
return
if not self.__ui.currentIndex().isValid():
return
classify = self.__ui.currentIndex().data()
result = QtWidgets.QMessageBox.information(
self.parent,
"移除分类",
"确定移除分类: {}".format(classify),
buttons=QtWidgets.QMessageBox.Ok | QtWidgets.QMessageBox.Cancel,
defaultButton=QtWidgets.QMessageBox.Cancel)
if result == QtWidgets.QMessageBox.Ok:
if len(self.__imageListMgr.imageList(classify)) > 0:
QtWidgets.QMessageBox.warning(self.parent, "移除分类",
"分类下存在图片,请先移除图片")
else:
self.__imageListMgr.removeClassify(classify)
self.setClassifyList(self.__imageListMgr.classifyList)
def renemeClassify(self):
"""重命名分类"""
idx = self.__ui.currentIndex()
if idx.isValid():
self.uiDoubleClicked(idx)
def searchClassify(self, classify):
"""查找分类"""
self.setClassifyList(self.__imageListMgr.findLikeClassify(classify))
import os
class ImageListManager:
"""
图像列表文件管理器
"""
def __init__(self, file_path="", encoding="utf-8"):
self.__filePath = ""
self.__dirName = ""
self.__dataList = {}
self.__findLikeClassifyResult = []
if file_path != "":
self.readFile(file_path, encoding)
@property
def filePath(self):
return self.__filePath
@property
def dirName(self):
return self.__dirName
@dirName.setter
def dirName(self, value):
self.__dirName = value
@property
def dataList(self):
return self.__dataList
@property
def classifyList(self):
return self.__dataList.keys()
@property
def findLikeClassifyResult(self):
return self.__findLikeClassifyResult
def imageList(self, classify: str):
"""
获取分类下的图片列表
Args:
classify (str): 分类名称
Returns:
list: 图片列表
"""
return self.__dataList[classify]
def readFile(self, file_path: str, encoding="utf-8"):
"""
读取文件内容
Args:
file_path (str): 文件路径
encoding (str, optional): 文件编码. 默认 "utf-8".
Raises:
Exception: 文件不存在
"""
if not os.path.exists(file_path):
raise Exception("文件不存在:{}".format(file_path))
self.__filePath = file_path
self.__dirName = os.path.dirname(self.__filePath)
self.__readData(file_path, encoding)
def __readData(self, file_path: str, encoding="utf-8"):
"""
读取文件内容
Args:
file_path (str): 文件路径
encoding (str, optional): 文件编码. 默认 "utf-8".
"""
with open(file_path, "r", encoding=encoding) as f:
self.__dataList.clear()
for line in f:
line = line.rstrip("\n")
data = line.split("\t")
self.__appendData(data)
def __appendData(self, data: list):
"""
添加数据
Args:
data (list): 数据
"""
if data[1] not in self.__dataList:
self.__dataList[data[1]] = []
self.__dataList[data[1]].append(data[0])
def writeFile(self, file_path="", encoding="utf-8"):
"""
写入文件
Args:
file_path (str, optional): 文件路径. 默认 "".
encoding (str, optional): 文件编码. 默认 "utf-8".
"""
if file_path == "":
file_path = self.__filePath
if not os.path.exists(file_path):
return False
self.__dirName = os.path.dirname(self.__filePath)
lines = []
for classify in self.__dataList.keys():
for path in self.__dataList[classify]:
lines.append("{}\t{}\n".format(path, classify))
with open(file_path, "w", encoding=encoding) as f:
f.writelines(lines)
return True
def realPath(self, image_path: str):
"""
获取真实路径
Args:
image_path (str): 图片路径
"""
return os.path.join(self.__dirName, image_path)
def realPathList(self, classify: str):
"""
获取分类下的真实路径列表
Args:
classify (str): 分类名称
Returns:
list: 真实路径列表
"""
if classify not in self.classifyList:
return []
paths = self.__dataList[classify]
if len(paths) == 0:
return []
for i in range(len(paths)):
paths[i] = os.path.join(self.__dirName, paths[i])
return paths
def findLikeClassify(self, name: str):
"""
查找类似的分类名称
Args:
name (str): 分类名称
Returns:
list: 类似的分类名称列表
"""
self.__findLikeClassifyResult.clear()
for classify in self.__dataList.keys():
word = str(name)
if (word in classify):
self.__findLikeClassifyResult.append(classify)
return self.__findLikeClassifyResult
def addClassify(self, classify: str):
"""
添加分类
Args:
classify (str): 分类名称
Returns:
bool: 如果分类名称已经存在,返回False,否则添加分类并返回True
"""
if classify in self.__dataList:
return False
self.__dataList[classify] = []
return True
def removeClassify(self, classify: str):
"""
移除分类
Args:
classify (str): 分类名称
Returns:
bool: 如果分类名称不存在,返回False,否则移除分类并返回True
"""
if classify not in self.__dataList:
return False
self.__dataList.pop(classify)
return True
def renameClassify(self, old_classify: str, new_classify: str):
"""
重命名分类名称
Args:
old_classify (str): 原分类名称
new_classify (str): 新分类名称
Returns:
bool: 如果原分类名称不存在,或者新分类名称已经存在,返回False,否则重命名分类名称并返回True
"""
if old_classify not in self.__dataList:
return False
if new_classify in self.__dataList:
return False
self.__dataList[new_classify] = self.__dataList[old_classify]
self.__dataList.pop(old_classify)
return True
def allClassfiyNotEmpty(self):
"""
检查所有分类是否都有图片
Returns:
bool: 如果有一个分类没有图片,返回False,否则返回True
"""
for classify in self.__dataList.keys():
if len(self.__dataList[classify]) == 0:
return False
return True
def resetImageList(self, classify: str, image_list: list):
"""
重置图片列表
Args:
classify (str): 分类名称
image_list (list): 图片相对路径列表
Returns:
bool: 如果分类名称不存在,返回False,否则重置图片列表并返回True
"""
if classify not in self.__dataList:
return False
self.__dataList[classify] = image_list
return True
import os
from stat import filemode
from PyQt5 import QtCore, QtGui, QtWidgets
from mod import image_list_manager as imglistmgr
from mod import utils
from mod import ui_renameclassifydialog
from mod import imageeditclassifydialog
# 图像缩放基数
BASE_IMAGE_SIZE = 64
class ImageListUiContext(QtCore.QObject):
# 图片列表界面相关业务,style sheet 在 MainWindow.ui 相应的 ImageListWidget 中设置
listCount = QtCore.pyqtSignal(int) # 图像列表图像的数量
selectedCount = QtCore.pyqtSignal(int) # 图像列表选择图像的数量
def __init__(self, ui: QtWidgets.QListWidget,
parent: QtWidgets.QMainWindow,
image_list_mgr: imglistmgr.ImageListManager):
super(ImageListUiContext, self).__init__()
self.__ui = ui
self.__parent = parent
self.__imageListMgr = image_list_mgr
self.__initUi()
self.__menu = QtWidgets.QMenu()
self.__initMenu()
self.__connectSignal()
self.__selectedClassify = ""
self.__imageScale = 1
@property
def ui(self):
return self.__ui
@property
def parent(self):
return self.__parent
@property
def imageListManager(self):
return self.__imageListMgr
@property
def menu(self):
return self.__menu
def __initUi(self):
"""初始化图片列表样式"""
self.__ui.setViewMode(QtWidgets.QListView.IconMode)
self.__ui.setSpacing(15)
self.__ui.setMovement(QtWidgets.QListView.Static)
self.__ui.setSelectionMode(
QtWidgets.QAbstractItemView.ExtendedSelection)
def __initMenu(self):
"""初始化图片列表界面菜单"""
utils.setMenu(self.__menu, "添加图片", self.addImage)
utils.setMenu(self.__menu, "移除图片", self.removeImage)
utils.setMenu(self.__menu, "编辑图片分类", self.editImageClassify)
self.__menu.addSeparator()
utils.setMenu(self.__menu, "选择全部图片", self.selectAllImage)
utils.setMenu(self.__menu, "反向选择图片", self.reverseSelectImage)
utils.setMenu(self.__menu, "取消选择图片", self.cancelSelectImage)
self.__ui.setContextMenuPolicy(QtCore.Qt.CustomContextMenu)
self.__ui.customContextMenuRequested.connect(self.__showMenu)
def __showMenu(self, pos):
"""显示图片列表界面菜单"""
if len(self.__imageListMgr.filePath) > 0:
self.__menu.exec_(self.__ui.mapToGlobal(pos))
def __connectSignal(self):
"""连接信号与槽"""
self.__ui.itemSelectionChanged.connect(self.onSelectionChanged)
def setImageScale(self, scale: int):
"""设置图片大小"""
self.__imageScale = scale
size = QtCore.QSize(scale * BASE_IMAGE_SIZE, scale * BASE_IMAGE_SIZE)
self.__ui.setIconSize(size)
for i in range(self.__ui.count()):
item = self.__ui.item(i)
item.setSizeHint(size)
def setImageList(self, classify: str):
"""设置图片列表"""
size = QtCore.QSize(self.__imageScale * BASE_IMAGE_SIZE,
self.__imageScale * BASE_IMAGE_SIZE)
self.__selectedClassify = classify
image_list = self.__imageListMgr.imageList(classify)
self.__ui.clear()
count = 0
for i in image_list:
item = QtWidgets.QListWidgetItem(self.__ui)
item.setIcon(QtGui.QIcon(self.__imageListMgr.realPath(i)))
item.setData(QtCore.Qt.UserRole, i)
item.setSizeHint(size)
self.__ui.addItem(item)
count += 1
self.listCount.emit(count)
def clear(self):
"""清除图片列表"""
self.__ui.clear()
def addImage(self):
"""添加图片"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self.__parent, "提示",
"请先打开正确的图像库")
return
filter = "图片 (*.png *.jpg *.jpeg *.PNG *.JPG *.JPEG);;所有文件(*.*)"
dlg = QtWidgets.QFileDialog(self.__parent)
dlg.setFileMode(QtWidgets.QFileDialog.ExistingFiles) # 多选文件
dlg.setViewMode(QtWidgets.QFileDialog.Detail) # 详细模式
file_paths = dlg.getOpenFileNames(filter=filter)[0]
if len(file_paths) == 0:
return
image_list_dir = self.__imageListMgr.dirName
file_list = []
for path in file_paths:
if not os.path.exists(path):
continue
new_file = self.__copyToImagesDir(path)
if new_file != "" and image_list_dir in new_file:
# 去掉 image_list_dir 的路径和斜杠
begin = len(image_list_dir) + 1
file_list.append(new_file[begin:])
if len(file_list) > 0:
if self.__selectedClassify == "":
QtWidgets.QMessageBox.warning(self.__parent, "提示", "请先选择分类")
return
new_list = self.__imageListMgr.imageList(
self.__selectedClassify) + file_list
self.__imageListMgr.resetImageList(self.__selectedClassify,
new_list)
self.setImageList(self.__selectedClassify)
self.__imageListMgr.writeFile()
def __copyToImagesDir(self, image_path: str):
md5 = utils.fileMD5(image_path)
file_ext = utils.fileExtension(image_path)
to_dir = os.path.join(self.__imageListMgr.dirName, "images")
new_path = os.path.join(to_dir, md5 + file_ext)
if os.path.exists(to_dir):
utils.copyFile(image_path, new_path)
return new_path
else:
return ""
def removeImage(self):
"""移除图片"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self.__parent, "提示",
"请先打开正确的图像库")
return
path_list = []
image_list = self.__ui.selectedItems()
if len(image_list) == 0:
return
question = QtWidgets.QMessageBox.question(self.__parent, "移除图片",
"确定移除所选图片吗?")
if question == QtWidgets.QMessageBox.No:
return
for i in range(self.__ui.count()):
item = self.__ui.item(i)
img_path = item.data(QtCore.Qt.UserRole)
if not item.isSelected():
path_list.append(img_path)
else:
# 从磁盘上删除图片
utils.removeFile(
os.path.join(self.__imageListMgr.dirName, img_path))
self.__imageListMgr.resetImageList(self.__selectedClassify, path_list)
self.setImageList(self.__selectedClassify)
self.__imageListMgr.writeFile()
def editImageClassify(self):
"""编辑图片分类"""
old_classify = self.__selectedClassify
dlg = imageeditclassifydialog.ImageEditClassifyDialog(
parent=self.__parent,
old_classify=old_classify,
classify_list=self.__imageListMgr.classifyList)
result = dlg.exec_()
new_classify = dlg.newClassify
if result == QtWidgets.QDialog.Accepted \
and new_classify != old_classify \
and new_classify != "":
self.__moveImage(old_classify, new_classify)
self.__imageListMgr.writeFile()
def __moveImage(self, old_classify, new_classify):
"""移动图片"""
keep_list = []
is_selected = False
move_list = self.__imageListMgr.imageList(new_classify)
for i in range(self.__ui.count()):
item = self.__ui.item(i)
txt = item.data(QtCore.Qt.UserRole)
if item.isSelected():
move_list.append(txt)
is_selected = True
else:
keep_list.append(txt)
if is_selected:
self.__imageListMgr.resetImageList(new_classify, move_list)
self.__imageListMgr.resetImageList(old_classify, keep_list)
self.setImageList(old_classify)
def selectAllImage(self):
"""选择所有图片"""
self.__ui.selectAll()
def reverseSelectImage(self):
"""反向选择图片"""
for i in range(self.__ui.count()):
item = self.__ui.item(i)
item.setSelected(not item.isSelected())
def cancelSelectImage(self):
"""取消选择图片"""
self.__ui.clearSelection()
def onSelectionChanged(self):
"""选择图像该变,发送选择的数量信号"""
count = len(self.__ui.selectedItems())
self.selectedCount.emit(count)
import os
from PyQt5 import QtCore, QtGui, QtWidgets
from mod import image_list_manager
from mod import ui_imageeditclassifydialog
from mod import utils
class ImageEditClassifyDialog(QtWidgets.QDialog):
"""图像编辑分类对话框"""
def __init__(self, parent, old_classify, classify_list):
super(ImageEditClassifyDialog, self).__init__(parent)
self.ui = ui_imageeditclassifydialog.Ui_Dialog()
self.ui.setupUi(self) # 初始化主窗口界面
self.__oldClassify = old_classify
self.__classifyList = classify_list
self.__newClassify = ""
self.__searchResult = []
self.__initUi()
self.__connectSignal()
@property
def newClassify(self):
return self.__newClassify
def __initUi(self):
self.ui.oldLineEdit.setText(self.__oldClassify)
self.__setClassifyList(self.__classifyList)
self.ui.classifyListView.setEditTriggers(
QtWidgets.QAbstractItemView.NoEditTriggers)
def __connectSignal(self):
self.ui.classifyListView.clicked.connect(self.selectedListView)
self.ui.searchButton.clicked.connect(self.searchClassify)
def __setClassifyList(self, classify_list):
list_model = QtCore.QStringListModel(classify_list)
self.ui.classifyListView.setModel(list_model)
def selectedListView(self, index):
if not self.ui.classifyListView.currentIndex().isValid():
return
txt = index.data()
self.ui.newLineEdit.setText(txt)
self.__newClassify = txt
def searchClassify(self):
txt = self.ui.searchWordLineEdit.text()
self.__searchResult.clear()
for classify in self.__classifyList:
if txt in classify:
self.__searchResult.append(classify)
self.__setClassifyList(self.__searchResult)
import json
import os
import urllib3
import urllib.parse
class IndexHttpClient():
"""索引库客户端,使用 urllib3 连接,使用 urllib.parse 进行 url 编码"""
def __init__(self, host: str, port: int):
self.__host = host
self.__port = port
self.__http = urllib3.PoolManager()
self.__headers = {"Content-type": "application/json"}
def url(self):
return "http://{}:{}".format(self.__host, self.__port)
def new_index(self,
image_list_path: str,
index_root_path: str,
index_method="HNSW32",
force=False):
"""新建 重建 库"""
if index_method not in ["HNSW32", "FLAT", "IVF"]:
raise Exception(
"index_method 必须是 HNSW32, FLAT, IVF,实际值为:{}".format(
index_method))
params = {"image_list_path":image_list_path, \
"index_root_path":index_root_path, \
"index_method":index_method, \
"force":force}
return self.__post(self.url() + "/new_index?", params)
def open_index(self, index_root_path: str, image_list_path: str):
"""打开库"""
params = {
"index_root_path": index_root_path,
"image_list_path": image_list_path
}
return self.__post(self.url() + "/open_index?", params)
def update_index(self, image_list_path: str, index_root_path: str):
"""更新索引库"""
params = {"image_list_path":image_list_path, \
"index_root_path":index_root_path}
return self.__post(self.url() + "/update_index?", params)
def __post(self, url: str, params: dict):
"""发送 url 并接收数据"""
http = self.__http
encode_params = urllib.parse.urlencode(params)
get_url = url + encode_params
req = http.request("GET", get_url, headers=self.__headers)
result = json.loads(req.data)
if isinstance(result, str):
result = eval(result)
msg = result["error_message"]
if msg != None and len(msg) == 0:
msg = None
return msg
from multiprocessing.dummy import active_children
from multiprocessing import Process
import os
import sys
import socket
from PyQt5 import QtCore, QtGui, QtWidgets
from mod import ui_mainwindow
from mod import image_list_manager
from mod import classify_ui_context
from mod import image_list_ui_context
from mod import ui_newlibrarydialog
from mod import index_http_client
from mod import utils
from mod import ui_waitdialog
import threading
TOOL_BTN_ICON_SIZE = 64
TOOL_BTN_ICON_SMALL = 48
try:
DEFAULT_HOST = socket.gethostbyname(socket.gethostname())
except:
DEFAULT_HOST = '127.0.0.1'
# DEFAULT_HOST = "localhost"
DEFAULT_PORT = 8000
PADDLECLAS_DOC_URL = "https://gitee.com/paddlepaddle/PaddleClas/docs/zh_CN/inference_deployment/shitu_gallery_manager.md"
class MainWindow(QtWidgets.QMainWindow):
"""主窗口"""
newIndexMsg = QtCore.pyqtSignal(str) # 新建索引库线程信号
openIndexMsg = QtCore.pyqtSignal(str) # 打开索引库线程信号
updateIndexMsg = QtCore.pyqtSignal(str) # 更新索引库线程信号
importImageCount = QtCore.pyqtSignal(int) # 导入图像数量信号
def __init__(self, process=None):
super(MainWindow, self).__init__()
self.server_process = process
self.ui = ui_mainwindow.Ui_MainWindow()
self.ui.setupUi(self) # 初始化主窗口界面
self.__imageListMgr = image_list_manager.ImageListManager()
self.__appMenu = QtWidgets.QMenu() # 应用菜单
self.__libraryAppendMenu = QtWidgets.QMenu() # 图像库附加功能菜单
self.__initAppMenu() # 初始化应用菜单
self.__pathBar = QtWidgets.QLabel(self) # 路径
self.__classifyCountBar = QtWidgets.QLabel(self) # 分类数量
self.__imageCountBar = QtWidgets.QLabel(self) # 图像列表数量
self.__imageSelectedBar = QtWidgets.QLabel(self) # 图像列表选择数量
self.__spaceBar1 = QtWidgets.QLabel(self) # 空格间隔栏
self.__spaceBar2 = QtWidgets.QLabel(self) # 空格间隔栏
self.__spaceBar3 = QtWidgets.QLabel(self) # 空格间隔栏
# 分类界面相关业务
self.__classifyUiContext = classify_ui_context.ClassifyUiContext(
ui=self.ui.classifyListView,
parent=self,
image_list_mgr=self.__imageListMgr)
# 图片列表界面相关业务
self.__imageListUiContext = image_list_ui_context.ImageListUiContext(
ui=self.ui.imageListWidget,
parent=self,
image_list_mgr=self.__imageListMgr)
# 搜索的历史记录回车快捷键
self.__historyCmbShortcut = QtWidgets.QShortcut(
QtGui.QKeySequence(QtCore.Qt.Key_Return),
self.ui.searchClassifyHistoryCmb)
self.__waitDialog = QtWidgets.QDialog() # 等待对话框
self.__waitDialogUi = ui_waitdialog.Ui_WaitDialog() # 等待对话框界面
self.__initToolBtn()
self.__connectSignal()
self.__initUI()
self.__initWaitDialog()
def __initUI(self):
"""初始化界面"""
# 窗口图标
self.setWindowIcon(QtGui.QIcon("./resource/app_icon.png"))
# 初始化分割窗口
self.ui.splitter.setStretchFactor(0, 20)
self.ui.splitter.setStretchFactor(1, 80)
# 初始化图像缩放
self.ui.imageScaleSlider.setValue(4)
# 状态栏界面设置
space_bar = " " # 间隔16空格
self.__spaceBar1.setText(space_bar)
self.__spaceBar2.setText(space_bar)
self.__spaceBar3.setText(space_bar)
self.ui.statusbar.addWidget(self.__pathBar)
self.ui.statusbar.addWidget(self.__spaceBar1)
self.ui.statusbar.addWidget(self.__classifyCountBar)
self.ui.statusbar.addWidget(self.__spaceBar2)
self.ui.statusbar.addWidget(self.__imageCountBar)
self.ui.statusbar.addWidget(self.__spaceBar3)
self.ui.statusbar.addWidget(self.__imageSelectedBar)
def __initToolBtn(self):
"""初始化工具按钮"""
self.__setToolButton(self.ui.appMenuBtn, "应用菜单",
"./resource/app_menu.png", TOOL_BTN_ICON_SIZE)
self.__setToolButton(self.ui.saveImageLibraryBtn, "保存图像库",
"./resource/save_image_Library.png",
TOOL_BTN_ICON_SIZE)
self.ui.saveImageLibraryBtn.clicked.connect(self.saveImageLibrary)
self.__setToolButton(self.ui.addClassifyBtn, "添加分类",
"./resource/add_classify.png",
TOOL_BTN_ICON_SIZE)
self.ui.addClassifyBtn.clicked.connect(
self.__classifyUiContext.addClassify)
self.__setToolButton(self.ui.removeClassifyBtn, "移除分类",
"./resource/remove_classify.png",
TOOL_BTN_ICON_SIZE)
self.ui.removeClassifyBtn.clicked.connect(
self.__classifyUiContext.removeClassify)
self.__setToolButton(self.ui.searchClassifyBtn, "查找分类",
"./resource/search_classify.png",
TOOL_BTN_ICON_SMALL)
self.ui.searchClassifyBtn.clicked.connect(
self.__classifyUiContext.searchClassify)
self.__setToolButton(self.ui.addImageBtn, "添加图片",
"./resource/add_image.png", TOOL_BTN_ICON_SMALL)
self.ui.addImageBtn.clicked.connect(self.__imageListUiContext.addImage)
self.__setToolButton(self.ui.removeImageBtn, "移除图片",
"./resource/remove_image.png",
TOOL_BTN_ICON_SMALL)
self.ui.removeImageBtn.clicked.connect(
self.__imageListUiContext.removeImage)
self.ui.searchClassifyHistoryCmb.setToolTip("查找分类历史")
self.ui.imageScaleSlider.setToolTip("图片缩放")
def __setToolButton(self, button, tool_tip: str, icon_path: str,
icon_size: int):
"""设置工具按钮"""
button.setToolTip(tool_tip)
button.setIcon(QtGui.QIcon(icon_path))
button.setIconSize(QtCore.QSize(icon_size, icon_size))
def __initAppMenu(self):
"""初始化应用菜单"""
utils.setMenu(self.__appMenu, "新建图像库", self.newImageLibrary)
utils.setMenu(self.__appMenu, "打开图像库", self.openImageLibrary)
utils.setMenu(self.__appMenu, "保存图像库", self.saveImageLibrary)
self.__libraryAppendMenu.setTitle("导入图像")
utils.setMenu(self.__libraryAppendMenu, "导入 image_list 图像",
self.importImageListImage)
utils.setMenu(self.__libraryAppendMenu, "导入多文件夹图像",
self.importDirsImage)
self.__appMenu.addMenu(self.__libraryAppendMenu)
self.__appMenu.addSeparator()
utils.setMenu(self.__appMenu, "新建/重建 索引库", self.newIndexLibrary)
utils.setMenu(self.__appMenu, "更新索引库", self.updateIndexLibrary)
self.__appMenu.addSeparator()
utils.setMenu(self.__appMenu, "帮助", self.showHelp)
utils.setMenu(self.__appMenu, "关于", self.showAbout)
utils.setMenu(self.__appMenu, "退出", self.exitApp)
self.ui.appMenuBtn.setMenu(self.__appMenu)
self.ui.appMenuBtn.setPopupMode(QtWidgets.QToolButton.InstantPopup)
def __initWaitDialog(self):
"""初始化等待对话框"""
self.__waitDialogUi.setupUi(self.__waitDialog)
self.__waitDialog.setWindowFlags(QtCore.Qt.Dialog
| QtCore.Qt.FramelessWindowHint)
def __startWait(self, msg: str):
"""开始显示等待对话框"""
self.setEnabled(False)
self.__waitDialogUi.msgLabel.setText(msg)
self.__waitDialog.setWindowFlags(QtCore.Qt.Dialog
| QtCore.Qt.FramelessWindowHint
| QtCore.Qt.WindowStaysOnTopHint)
self.__waitDialog.show()
self.__waitDialog.repaint()
def __stopWait(self):
"""停止显示等待对话框"""
self.setEnabled(True)
self.__waitDialogUi.msgLabel.setText("执行完毕!")
self.__waitDialog.setWindowFlags(QtCore.Qt.Dialog
| QtCore.Qt.FramelessWindowHint
| QtCore.Qt.CustomizeWindowHint)
self.__waitDialog.close()
def __connectSignal(self):
"""连接信号与槽"""
self.__classifyUiContext.selected.connect(
self.__imageListUiContext.setImageList)
self.ui.searchClassifyBtn.clicked.connect(self.searchClassify)
self.ui.imageScaleSlider.valueChanged.connect(
self.__imageListUiContext.setImageScale)
self.__imageListUiContext.listCount.connect(self.__setImageCountBar)
self.__imageListUiContext.selectedCount.connect(
self.__setImageSelectedCountBar)
self.__historyCmbShortcut.activated.connect(self.searchClassify)
self.newIndexMsg.connect(self.__onNewIndexMsg)
self.openIndexMsg.connect(self.__onOpenIndexMsg)
self.updateIndexMsg.connect(self.__onUpdateIndexMsg)
self.importImageCount.connect(self.__onImportImageCount)
def newImageLibrary(self):
"""新建图像库"""
dir_path = self.__openDirDialog("新建图像库")
if dir_path == None:
return
if not utils.isEmptyDir(dir_path):
QtWidgets.QMessageBox.warning(self, "错误", "该目录不为空,请选择空目录")
return
if not utils.initLibrary(dir_path):
QtWidgets.QMessageBox.warning(self, "错误", "新建图像库失败")
return
QtWidgets.QMessageBox.information(self, "提示", "新建图像库成功")
self.__reload(os.path.join(dir_path, "image_list.txt"), dir_path)
def __openDirDialog(self, title: str):
"""打开目录对话框"""
dlg = QtWidgets.QFileDialog(self)
dlg.setWindowTitle(title)
dlg.setOption(QtWidgets.QFileDialog.ShowDirsOnly, True)
dlg.setFileMode(QtWidgets.QFileDialog.Directory)
dlg.setAcceptMode(QtWidgets.QFileDialog.AcceptOpen)
if dlg.exec_() == QtWidgets.QDialog.Accepted:
dir_path = dlg.selectedFiles()[0]
return dir_path
return None
def openImageLibrary(self):
"""打开图像库"""
dir_path = self.__openDirDialog("打开图像库")
if dir_path != None:
image_list_path = os.path.join(dir_path, "image_list.txt")
if os.path.exists(image_list_path) \
and os.path.exists(os.path.join(dir_path, "images")):
self.__reload(image_list_path, dir_path)
self.openIndexLibrary()
def __reload(self, image_list_path: str, msg: str):
"""重新加载图像库"""
self.__imageListMgr.readFile(image_list_path)
self.__imageListUiContext.clear()
self.__classifyUiContext.setClassifyList(
self.__imageListMgr.classifyList)
self.__setPathBar(msg)
self.__setClassifyCountBar(len(self.__imageListMgr.classifyList))
self.__setImageCountBar(0)
self.__setImageSelectedCountBar(0)
def saveImageLibrary(self):
"""保存图像库"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.warning(self, "错误", "请先打开正确的图像库")
return
self.__imageListMgr.writeFile()
self.__reload(self.__imageListMgr.filePath,
self.__imageListMgr.dirName)
hint_str = "为保证图片准确识别,请在修改图片库后更新索引库。\n\
如果是新建图像库或者没有索引库,请新建索引库。"
QtWidgets.QMessageBox.information(self, "提示", hint_str)
def __onImportImageCount(self, count: int):
"""导入图像槽"""
self.__stopWait()
if count == -1:
QtWidgets.QMessageBox.warning(self, "错误", "导入到当前图像库错误")
return
QtWidgets.QMessageBox.information(self, "提示",
"导入图像库成功,导入图像:{}".format(count))
self.__reload(self.__imageListMgr.filePath,
self.__imageListMgr.dirName)
def __importImageListImageThread(self, from_path: str, to_path: str):
"""导入 image_list 图像 线程"""
count = utils.oneKeyImportFromFile(from_path=from_path,
to_path=to_path)
if count == None:
count = -1
self.importImageCount.emit(count)
def importImageListImage(self):
"""导入 image_list 图像 到当前图像库,建议当前库是新建的空库"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库")
return
from_path = QtWidgets.QFileDialog.getOpenFileName(
caption="导入 image_list 图像", filter="txt (*.txt)")[0]
if not os.path.exists(from_path):
QtWidgets.QMessageBox.information(self, "提示", "打开的文件不存在")
return
from_mgr = image_list_manager.ImageListManager(from_path)
self.__startWait("正在导入图像,请等待。。。")
thread = threading.Thread(target=self.__importImageListImageThread,
args=(from_mgr.filePath,
self.__imageListMgr.filePath))
thread.start()
def __importDirsImageThread(self, from_dir: str, to_image_list_path: str):
"""导入多文件夹图像 线程"""
count = utils.oneKeyImportFromDirs(
from_dir=from_dir, to_image_list_path=to_image_list_path)
if count == None:
count = -1
self.importImageCount.emit(count)
def importDirsImage(self):
"""导入 多文件夹图像 到当前图像库,建议当前库是新建的空库"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库")
return
dir_path = self.__openDirDialog("导入多文件夹图像")
if dir_path == None:
return
if not os.path.exists(dir_path):
QtWidgets.QMessageBox.information(self, "提示", "打开的目录不存在")
return
self.__startWait("正在导入图像,请等待。。。")
thread = threading.Thread(target=self.__importDirsImageThread,
args=(dir_path,
self.__imageListMgr.filePath))
thread.start()
def __newIndexThread(self, index_root_path: str, image_list_path: str,
index_method: str, force: bool):
"""新建重建索引库线程"""
try:
client = index_http_client.IndexHttpClient(
DEFAULT_HOST, DEFAULT_PORT)
err_msg = client.new_index(image_list_path=image_list_path,
index_root_path=index_root_path,
index_method=index_method,
force=force)
if err_msg == None:
err_msg = ""
self.newIndexMsg.emit(err_msg)
except Exception as e:
self.newIndexMsg.emit(str(e))
def __onNewIndexMsg(self, err_msg):
"""新建重建索引库槽"""
self.__stopWait()
if err_msg == "":
QtWidgets.QMessageBox.information(self, "提示", "新建/重建 索引库成功")
else:
QtWidgets.QMessageBox.warning(self, "错误", err_msg)
def newIndexLibrary(self):
"""新建重建索引库"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库")
return
dlg = QtWidgets.QDialog(self)
ui = ui_newlibrarydialog.Ui_NewlibraryDialog()
ui.setupUi(dlg)
result = dlg.exec_()
index_method = ui.indexMethodCmb.currentText()
force = ui.resetCheckBox.isChecked()
if result == QtWidgets.QDialog.Accepted:
self.__startWait("正在 新建/重建 索引库,请等待。。。")
thread = threading.Thread(target=self.__newIndexThread,
args=(self.__imageListMgr.dirName,
"image_list.txt", index_method,
force))
thread.start()
def __openIndexThread(self, index_root_path: str, image_list_path: str):
"""打开索引库线程"""
try:
client = index_http_client.IndexHttpClient(
DEFAULT_HOST, DEFAULT_PORT)
err_msg = client.open_index(index_root_path=index_root_path,
image_list_path=image_list_path)
if err_msg == None:
err_msg = ""
self.openIndexMsg.emit(err_msg)
except Exception as e:
self.openIndexMsg.emit(str(e))
def __onOpenIndexMsg(self, err_msg):
"""打开索引库槽"""
self.__stopWait()
if err_msg == "":
QtWidgets.QMessageBox.information(self, "提示", "打开索引库成功")
else:
QtWidgets.QMessageBox.warning(self, "错误", err_msg)
def openIndexLibrary(self):
"""打开索引库"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库")
return
self.__startWait("正在打开索引库,请等待。。。")
thread = threading.Thread(target=self.__openIndexThread,
args=(self.__imageListMgr.dirName,
"image_list.txt"))
thread.start()
def __updateIndexThread(self, index_root_path: str, image_list_path: str):
"""更新索引库线程"""
try:
client = index_http_client.IndexHttpClient(
DEFAULT_HOST, DEFAULT_PORT)
err_msg = client.update_index(image_list_path=image_list_path,
index_root_path=index_root_path)
if err_msg == None:
err_msg = ""
self.updateIndexMsg.emit(err_msg)
except Exception as e:
self.updateIndexMsg.emit(str(e))
def __onUpdateIndexMsg(self, err_msg):
"""更新索引库槽"""
self.__stopWait()
if err_msg == "":
QtWidgets.QMessageBox.information(self, "提示", "更新索引库成功")
else:
QtWidgets.QMessageBox.warning(self, "错误", err_msg)
def updateIndexLibrary(self):
"""更新索引库"""
if not os.path.exists(self.__imageListMgr.filePath):
QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库")
return
self.__startWait("正在更新索引库,请等待。。。")
thread = threading.Thread(target=self.__updateIndexThread,
args=(self.__imageListMgr.dirName,
"image_list.txt"))
thread.start()
def searchClassify(self):
"""查找分类"""
if len(self.__imageListMgr.classifyList) == 0:
return
cmb = self.ui.searchClassifyHistoryCmb
txt = cmb.currentText()
is_has = False
if txt != "":
for i in range(cmb.count()):
if cmb.itemText(i) == txt:
is_has = True
break
if not is_has:
cmb.addItem(txt)
self.__classifyUiContext.searchClassify(txt)
def showHelp(self):
"""显示帮助"""
QtGui.QDesktopServices.openUrl(QtCore.QUrl(PADDLECLAS_DOC_URL))
def showAbout(self):
"""显示关于对话框"""
QtWidgets.QMessageBox.information(self, "关于", "识图图像库管理 V1.0.0")
def exitApp(self):
"""退出应用"""
if isinstance(self.server_process, Process):
self.server_process.terminate()
# os.kill(self.server_pid)
sys.exit(0)
def __setPathBar(self, msg: str):
"""设置路径状态栏信息"""
self.__pathBar.setText("图像库路径:{}".format(msg))
def __setClassifyCountBar(self, msg: str):
self.__classifyCountBar.setText("分类总数量:{}".format(msg))
def __setImageCountBar(self, count: int):
"""设置图像数量状态栏信息"""
self.__imageCountBar.setText("当前图像数量:{}".format(count))
def __setImageSelectedCountBar(self, count: int):
"""设置选择图像数量状态栏信息"""
self.__imageSelectedBar.setText("选择图像数量:{}".format(count))
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'ui/AddClassifyDialog.ui'
#
# Created by: PyQt5 UI code generator 5.15.5
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again. Do not edit this file unless you know what you are doing.
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_AddClassifyDialog(object):
def setupUi(self, AddClassifyDialog):
AddClassifyDialog.setObjectName("AddClassifyDialog")
AddClassifyDialog.resize(286, 127)
AddClassifyDialog.setModal(True)
self.verticalLayout = QtWidgets.QVBoxLayout(AddClassifyDialog)
self.verticalLayout.setObjectName("verticalLayout")
self.label = QtWidgets.QLabel(AddClassifyDialog)
self.label.setObjectName("label")
self.verticalLayout.addWidget(self.label)
self.lineEdit = QtWidgets.QLineEdit(AddClassifyDialog)
self.lineEdit.setObjectName("lineEdit")
self.verticalLayout.addWidget(self.lineEdit)
spacerItem = QtWidgets.QSpacerItem(20, 11,
QtWidgets.QSizePolicy.Minimum,
QtWidgets.QSizePolicy.Expanding)
self.verticalLayout.addItem(spacerItem)
self.buttonBox = QtWidgets.QDialogButtonBox(AddClassifyDialog)
self.buttonBox.setOrientation(QtCore.Qt.Horizontal)
self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel
| QtWidgets.QDialogButtonBox.Ok)
self.buttonBox.setObjectName("buttonBox")
self.verticalLayout.addWidget(self.buttonBox)
self.retranslateUi(AddClassifyDialog)
self.buttonBox.accepted.connect(AddClassifyDialog.accept)
self.buttonBox.rejected.connect(AddClassifyDialog.reject)
QtCore.QMetaObject.connectSlotsByName(AddClassifyDialog)
def retranslateUi(self, AddClassifyDialog):
_translate = QtCore.QCoreApplication.translate
AddClassifyDialog.setWindowTitle(
_translate("AddClassifyDialog", "添加分类"))
self.label.setText(_translate("AddClassifyDialog", "分类名称"))
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
AddClassifyDialog = QtWidgets.QDialog()
ui = Ui_AddClassifyDialog()
ui.setupUi(AddClassifyDialog)
AddClassifyDialog.show()
sys.exit(app.exec_())
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'ui/ImageEditClassifyDialog.ui'
#
# Created by: PyQt5 UI code generator 5.15.5
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again. Do not edit this file unless you know what you are doing.
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_Dialog(object):
def setupUi(self, Dialog):
Dialog.setObjectName("Dialog")
Dialog.resize(414, 415)
Dialog.setMinimumSize(QtCore.QSize(0, 0))
self.verticalLayout = QtWidgets.QVBoxLayout(Dialog)
self.verticalLayout.setObjectName("verticalLayout")
self.label = QtWidgets.QLabel(Dialog)
self.label.setObjectName("label")
self.verticalLayout.addWidget(self.label)
self.oldLineEdit = QtWidgets.QLineEdit(Dialog)
self.oldLineEdit.setEnabled(False)
self.oldLineEdit.setObjectName("oldLineEdit")
self.verticalLayout.addWidget(self.oldLineEdit)
self.label_2 = QtWidgets.QLabel(Dialog)
self.label_2.setObjectName("label_2")
self.verticalLayout.addWidget(self.label_2)
self.newLineEdit = QtWidgets.QLineEdit(Dialog)
self.newLineEdit.setEnabled(False)
self.newLineEdit.setObjectName("newLineEdit")
self.verticalLayout.addWidget(self.newLineEdit)
self.horizontalLayout = QtWidgets.QHBoxLayout()
self.horizontalLayout.setObjectName("horizontalLayout")
self.searchWordLineEdit = QtWidgets.QLineEdit(Dialog)
self.searchWordLineEdit.setObjectName("searchWordLineEdit")
self.horizontalLayout.addWidget(self.searchWordLineEdit)
self.searchButton = QtWidgets.QPushButton(Dialog)
self.searchButton.setObjectName("searchButton")
self.horizontalLayout.addWidget(self.searchButton)
self.verticalLayout.addLayout(self.horizontalLayout)
self.classifyListView = QtWidgets.QListView(Dialog)
self.classifyListView.setEnabled(True)
self.classifyListView.setMinimumSize(QtCore.QSize(400, 200))
self.classifyListView.setObjectName("classifyListView")
self.verticalLayout.addWidget(self.classifyListView)
self.buttonBox = QtWidgets.QDialogButtonBox(Dialog)
self.buttonBox.setOrientation(QtCore.Qt.Horizontal)
self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel
| QtWidgets.QDialogButtonBox.Ok)
self.buttonBox.setObjectName("buttonBox")
self.verticalLayout.addWidget(self.buttonBox)
self.retranslateUi(Dialog)
self.buttonBox.accepted.connect(Dialog.accept)
self.buttonBox.rejected.connect(Dialog.reject)
QtCore.QMetaObject.connectSlotsByName(Dialog)
def retranslateUi(self, Dialog):
_translate = QtCore.QCoreApplication.translate
Dialog.setWindowTitle(_translate("Dialog", "编辑图像分类"))
self.label.setText(_translate("Dialog", "原分类"))
self.label_2.setText(_translate("Dialog", "新分类"))
self.searchButton.setText(_translate("Dialog", "查找"))
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
Dialog = QtWidgets.QDialog()
ui = Ui_Dialog()
ui.setupUi(Dialog)
Dialog.show()
sys.exit(app.exec_())
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'ui/MainWindow.ui'
#
# Created by: PyQt5 UI code generator 5.15.5
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again. Do not edit this file unless you know what you are doing.
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_MainWindow(object):
def setupUi(self, MainWindow):
MainWindow.setObjectName("MainWindow")
MainWindow.resize(833, 538)
MainWindow.setMinimumSize(QtCore.QSize(0, 0))
self.centralwidget = QtWidgets.QWidget(MainWindow)
self.centralwidget.setObjectName("centralwidget")
self.verticalLayout_3 = QtWidgets.QVBoxLayout(self.centralwidget)
self.verticalLayout_3.setObjectName("verticalLayout_3")
self.horizontalLayout_3 = QtWidgets.QHBoxLayout()
self.horizontalLayout_3.setObjectName("horizontalLayout_3")
self.appMenuBtn = QtWidgets.QToolButton(self.centralwidget)
self.appMenuBtn.setObjectName("appMenuBtn")
self.horizontalLayout_3.addWidget(self.appMenuBtn)
self.saveImageLibraryBtn = QtWidgets.QToolButton(self.centralwidget)
self.saveImageLibraryBtn.setObjectName("saveImageLibraryBtn")
self.horizontalLayout_3.addWidget(self.saveImageLibraryBtn)
self.addClassifyBtn = QtWidgets.QToolButton(self.centralwidget)
self.addClassifyBtn.setObjectName("addClassifyBtn")
self.horizontalLayout_3.addWidget(self.addClassifyBtn)
self.removeClassifyBtn = QtWidgets.QToolButton(self.centralwidget)
self.removeClassifyBtn.setObjectName("removeClassifyBtn")
self.horizontalLayout_3.addWidget(self.removeClassifyBtn)
spacerItem = QtWidgets.QSpacerItem(40, 20,
QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Minimum)
self.horizontalLayout_3.addItem(spacerItem)
self.imageScaleSlider = QtWidgets.QSlider(self.centralwidget)
self.imageScaleSlider.setMaximumSize(QtCore.QSize(400, 16777215))
self.imageScaleSlider.setMinimum(1)
self.imageScaleSlider.setMaximum(8)
self.imageScaleSlider.setPageStep(2)
self.imageScaleSlider.setOrientation(QtCore.Qt.Horizontal)
self.imageScaleSlider.setObjectName("imageScaleSlider")
self.horizontalLayout_3.addWidget(self.imageScaleSlider)
self.verticalLayout_3.addLayout(self.horizontalLayout_3)
self.splitter = QtWidgets.QSplitter(self.centralwidget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Expanding)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(
self.splitter.sizePolicy().hasHeightForWidth())
self.splitter.setSizePolicy(sizePolicy)
self.splitter.setOrientation(QtCore.Qt.Horizontal)
self.splitter.setObjectName("splitter")
self.widget = QtWidgets.QWidget(self.splitter)
self.widget.setObjectName("widget")
self.verticalLayout_2 = QtWidgets.QVBoxLayout(self.widget)
self.verticalLayout_2.setContentsMargins(0, 0, 0, 0)
self.verticalLayout_2.setObjectName("verticalLayout_2")
self.horizontalLayout = QtWidgets.QHBoxLayout()
self.horizontalLayout.setObjectName("horizontalLayout")
self.searchClassifyHistoryCmb = QtWidgets.QComboBox(self.widget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Fixed)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(
self.searchClassifyHistoryCmb.sizePolicy().hasHeightForWidth())
self.searchClassifyHistoryCmb.setSizePolicy(sizePolicy)
self.searchClassifyHistoryCmb.setEditable(True)
self.searchClassifyHistoryCmb.setObjectName("searchClassifyHistoryCmb")
self.horizontalLayout.addWidget(self.searchClassifyHistoryCmb)
self.searchClassifyBtn = QtWidgets.QToolButton(self.widget)
self.searchClassifyBtn.setObjectName("searchClassifyBtn")
self.horizontalLayout.addWidget(self.searchClassifyBtn)
self.verticalLayout_2.addLayout(self.horizontalLayout)
self.classifyListView = QtWidgets.QListView(self.widget)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Expanding)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(
self.classifyListView.sizePolicy().hasHeightForWidth())
self.classifyListView.setSizePolicy(sizePolicy)
self.classifyListView.setMinimumSize(QtCore.QSize(200, 0))
self.classifyListView.setEditTriggers(
QtWidgets.QAbstractItemView.NoEditTriggers)
self.classifyListView.setObjectName("classifyListView")
self.verticalLayout_2.addWidget(self.classifyListView)
self.widget1 = QtWidgets.QWidget(self.splitter)
self.widget1.setObjectName("widget1")
self.verticalLayout = QtWidgets.QVBoxLayout(self.widget1)
self.verticalLayout.setContentsMargins(0, 0, 0, 0)
self.verticalLayout.setObjectName("verticalLayout")
self.horizontalLayout_2 = QtWidgets.QHBoxLayout()
self.horizontalLayout_2.setObjectName("horizontalLayout_2")
self.addImageBtn = QtWidgets.QToolButton(self.widget1)
self.addImageBtn.setObjectName("addImageBtn")
self.horizontalLayout_2.addWidget(self.addImageBtn)
self.removeImageBtn = QtWidgets.QToolButton(self.widget1)
self.removeImageBtn.setObjectName("removeImageBtn")
self.horizontalLayout_2.addWidget(self.removeImageBtn)
spacerItem1 = QtWidgets.QSpacerItem(40, 20,
QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Minimum)
self.horizontalLayout_2.addItem(spacerItem1)
self.verticalLayout.addLayout(self.horizontalLayout_2)
self.imageListWidget = QtWidgets.QListWidget(self.widget1)
sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding,
QtWidgets.QSizePolicy.Expanding)
sizePolicy.setHorizontalStretch(0)
sizePolicy.setVerticalStretch(0)
sizePolicy.setHeightForWidth(
self.imageListWidget.sizePolicy().hasHeightForWidth())
self.imageListWidget.setSizePolicy(sizePolicy)
self.imageListWidget.setMinimumSize(QtCore.QSize(200, 0))
self.imageListWidget.setStyleSheet(
"QListWidget::Item:hover{background:skyblue;padding-top:0px; padding-bottom:0px;}\n"
"QListWidget::item:selected{background:rgb(245, 121, 0); color:red;}"
)
self.imageListWidget.setObjectName("imageListWidget")
self.verticalLayout.addWidget(self.imageListWidget)
self.verticalLayout_3.addWidget(self.splitter)
MainWindow.setCentralWidget(self.centralwidget)
self.statusbar = QtWidgets.QStatusBar(MainWindow)
self.statusbar.setObjectName("statusbar")
MainWindow.setStatusBar(self.statusbar)
self.retranslateUi(MainWindow)
QtCore.QMetaObject.connectSlotsByName(MainWindow)
def retranslateUi(self, MainWindow):
_translate = QtCore.QCoreApplication.translate
MainWindow.setWindowTitle(_translate("MainWindow", "识图图像库管理"))
self.appMenuBtn.setText(_translate("MainWindow", "..."))
self.saveImageLibraryBtn.setText(_translate("MainWindow", "..."))
self.addClassifyBtn.setText(_translate("MainWindow", "..."))
self.removeClassifyBtn.setText(_translate("MainWindow", "..."))
self.searchClassifyBtn.setText(_translate("MainWindow", "..."))
self.addImageBtn.setText(_translate("MainWindow", "..."))
self.removeImageBtn.setText(_translate("MainWindow", "..."))
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
MainWindow = QtWidgets.QMainWindow()
ui = Ui_MainWindow()
ui.setupUi(MainWindow)
MainWindow.show()
sys.exit(app.exec_())
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'ui/NewlibraryDialog.ui'
#
# Created by: PyQt5 UI code generator 5.15.5
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again. Do not edit this file unless you know what you are doing.
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_NewlibraryDialog(object):
def setupUi(self, NewlibraryDialog):
NewlibraryDialog.setObjectName("NewlibraryDialog")
NewlibraryDialog.resize(414, 230)
self.verticalLayout = QtWidgets.QVBoxLayout(NewlibraryDialog)
self.verticalLayout.setObjectName("verticalLayout")
self.label = QtWidgets.QLabel(NewlibraryDialog)
self.label.setObjectName("label")
self.verticalLayout.addWidget(self.label)
self.indexMethodCmb = QtWidgets.QComboBox(NewlibraryDialog)
self.indexMethodCmb.setEnabled(True)
self.indexMethodCmb.setObjectName("indexMethodCmb")
self.indexMethodCmb.addItem("")
self.indexMethodCmb.addItem("")
self.indexMethodCmb.addItem("")
self.verticalLayout.addWidget(self.indexMethodCmb)
self.resetCheckBox = QtWidgets.QCheckBox(NewlibraryDialog)
self.resetCheckBox.setObjectName("resetCheckBox")
self.verticalLayout.addWidget(self.resetCheckBox)
spacerItem = QtWidgets.QSpacerItem(20, 80,
QtWidgets.QSizePolicy.Minimum,
QtWidgets.QSizePolicy.Expanding)
self.verticalLayout.addItem(spacerItem)
self.buttonBox = QtWidgets.QDialogButtonBox(NewlibraryDialog)
self.buttonBox.setOrientation(QtCore.Qt.Horizontal)
self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel
| QtWidgets.QDialogButtonBox.Ok)
self.buttonBox.setObjectName("buttonBox")
self.verticalLayout.addWidget(self.buttonBox)
self.retranslateUi(NewlibraryDialog)
self.indexMethodCmb.setCurrentIndex(0)
self.buttonBox.accepted.connect(NewlibraryDialog.accept)
self.buttonBox.rejected.connect(NewlibraryDialog.reject)
QtCore.QMetaObject.connectSlotsByName(NewlibraryDialog)
def retranslateUi(self, NewlibraryDialog):
_translate = QtCore.QCoreApplication.translate
NewlibraryDialog.setWindowTitle(
_translate("NewlibraryDialog", "新建/重建 索引"))
self.label.setText(_translate("NewlibraryDialog", "索引方式"))
self.indexMethodCmb.setItemText(
0, _translate("NewlibraryDialog", "HNSW32"))
self.indexMethodCmb.setItemText(1,
_translate("NewlibraryDialog", "FLAT"))
self.indexMethodCmb.setItemText(2, _translate("NewlibraryDialog",
"IVF"))
self.resetCheckBox.setText(
_translate("NewlibraryDialog", "重建索引,警告:会覆盖原索引"))
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
NewlibraryDialog = QtWidgets.QDialog()
ui = Ui_NewlibraryDialog()
ui.setupUi(NewlibraryDialog)
NewlibraryDialog.show()
sys.exit(app.exec_())
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'ui/RenameClassifyDialog.ui'
#
# Created by: PyQt5 UI code generator 5.15.5
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again. Do not edit this file unless you know what you are doing.
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_RenameClassifyDialog(object):
def setupUi(self, RenameClassifyDialog):
RenameClassifyDialog.setObjectName("RenameClassifyDialog")
RenameClassifyDialog.resize(342, 194)
self.verticalLayout = QtWidgets.QVBoxLayout(RenameClassifyDialog)
self.verticalLayout.setObjectName("verticalLayout")
self.oldlabel = QtWidgets.QLabel(RenameClassifyDialog)
self.oldlabel.setObjectName("oldlabel")
self.verticalLayout.addWidget(self.oldlabel)
self.oldNameLineEdit = QtWidgets.QLineEdit(RenameClassifyDialog)
self.oldNameLineEdit.setEnabled(False)
self.oldNameLineEdit.setObjectName("oldNameLineEdit")
self.verticalLayout.addWidget(self.oldNameLineEdit)
self.newlabel = QtWidgets.QLabel(RenameClassifyDialog)
self.newlabel.setObjectName("newlabel")
self.verticalLayout.addWidget(self.newlabel)
self.newNameLineEdit = QtWidgets.QLineEdit(RenameClassifyDialog)
self.newNameLineEdit.setObjectName("newNameLineEdit")
self.verticalLayout.addWidget(self.newNameLineEdit)
spacerItem = QtWidgets.QSpacerItem(20, 14,
QtWidgets.QSizePolicy.Minimum,
QtWidgets.QSizePolicy.Expanding)
self.verticalLayout.addItem(spacerItem)
self.buttonBox = QtWidgets.QDialogButtonBox(RenameClassifyDialog)
self.buttonBox.setOrientation(QtCore.Qt.Horizontal)
self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel
| QtWidgets.QDialogButtonBox.Ok)
self.buttonBox.setObjectName("buttonBox")
self.verticalLayout.addWidget(self.buttonBox)
self.retranslateUi(RenameClassifyDialog)
self.buttonBox.accepted.connect(RenameClassifyDialog.accept)
self.buttonBox.rejected.connect(RenameClassifyDialog.reject)
QtCore.QMetaObject.connectSlotsByName(RenameClassifyDialog)
def retranslateUi(self, RenameClassifyDialog):
_translate = QtCore.QCoreApplication.translate
RenameClassifyDialog.setWindowTitle(
_translate("RenameClassifyDialog", "重命名分类"))
self.oldlabel.setText(_translate("RenameClassifyDialog", "原名称"))
self.newlabel.setText(_translate("RenameClassifyDialog", "新名称"))
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
RenameClassifyDialog = QtWidgets.QDialog()
ui = Ui_RenameClassifyDialog()
ui.setupUi(RenameClassifyDialog)
RenameClassifyDialog.show()
sys.exit(app.exec_())
# -*- coding: utf-8 -*-
# Form implementation generated from reading ui file 'ui/WaitDialog.ui'
#
# Created by: PyQt5 UI code generator 5.15.5
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again. Do not edit this file unless you know what you are doing.
from PyQt5 import QtCore, QtGui, QtWidgets
class Ui_WaitDialog(object):
def setupUi(self, WaitDialog):
WaitDialog.setObjectName("WaitDialog")
WaitDialog.setWindowModality(QtCore.Qt.NonModal)
WaitDialog.resize(324, 78)
self.verticalLayout = QtWidgets.QVBoxLayout(WaitDialog)
self.verticalLayout.setObjectName("verticalLayout")
self.msgLabel = QtWidgets.QLabel(WaitDialog)
self.msgLabel.setObjectName("msgLabel")
self.verticalLayout.addWidget(self.msgLabel)
self.progressBar = QtWidgets.QProgressBar(WaitDialog)
self.progressBar.setMaximum(0)
self.progressBar.setProperty("value", -1)
self.progressBar.setObjectName("progressBar")
self.verticalLayout.addWidget(self.progressBar)
spacerItem = QtWidgets.QSpacerItem(20, 1,
QtWidgets.QSizePolicy.Minimum,
QtWidgets.QSizePolicy.Expanding)
self.verticalLayout.addItem(spacerItem)
self.retranslateUi(WaitDialog)
QtCore.QMetaObject.connectSlotsByName(WaitDialog)
def retranslateUi(self, WaitDialog):
_translate = QtCore.QCoreApplication.translate
WaitDialog.setWindowTitle(_translate("WaitDialog", "请等待"))
self.msgLabel.setText(_translate("WaitDialog", "正在更新索引库,请等待。。。"))
if __name__ == "__main__":
import sys
app = QtWidgets.QApplication(sys.argv)
WaitDialog = QtWidgets.QDialog()
ui = Ui_WaitDialog()
ui.setupUi(WaitDialog)
WaitDialog.show()
sys.exit(app.exec_())
import os
import sys
from PyQt5 import QtCore, QtGui, QtWidgets
import hashlib
import shutil
from mod import image_list_manager
def setMenu(menu: QtWidgets.QMenu, text: str, triggered):
"""设置菜单"""
action = menu.addAction(text)
action.triggered.connect(triggered)
def fileMD5(file_path: str):
"""计算文件的MD5值"""
md5 = hashlib.md5()
with open(file_path, 'rb') as f:
md5.update(f.read())
return md5.hexdigest().lower()
def copyFile(from_path: str, to_path: str):
"""复制文件"""
shutil.copyfile(from_path, to_path)
return os.path.exists(to_path)
def removeFile(file_path: str):
"""删除文件"""
if os.path.exists(file_path):
os.remove(file_path)
return not os.path.exists(file_path)
def fileExtension(file_path: str):
"""获取文件的扩展名"""
return os.path.splitext(file_path)[1]
def copyImageToDir(self, from_image_path: str, to_dir_path: str):
"""复制图像文件到目标目录"""
if not os.path.exists(from_image_path) and not os.path.exists(to_dir_path):
return None
md5 = fileMD5(from_image_path)
file_ext = fileExtension(from_image_path)
new_path = os.path.join(to_dir_path, md5 + file_ext)
copyFile(from_image_path, new_path)
return new_path
def oneKeyImportFromFile(from_path: str, to_path: str):
"""从其它图像库 from_path {image_list.txt} 导入到图像库 to_path {image_list.txt}"""
if not os.path.exists(from_path) or not os.path.exists(to_path):
return None
if from_path == to_path:
return None
from_mgr = image_list_manager.ImageListManager(file_path=from_path)
to_mgr = image_list_manager.ImageListManager(file_path=to_path)
return oneKeyImport(from_mgr=from_mgr, to_mgr=to_mgr)
def oneKeyImportFromDirs(from_dir: str, to_image_list_path: str):
"""从其它图像库 from_dir 搜索子目录 导入到图像库 to_image_list_path"""
if not os.path.exists(from_dir) or not os.path.exists(to_image_list_path):
return None
if from_dir == os.path.dirname(to_image_list_path):
return None
from_mgr = image_list_manager.ImageListManager()
to_mgr = image_list_manager.ImageListManager(
file_path=to_image_list_path)
from_mgr.dirName = from_dir
sub_dir_list = os.listdir(from_dir)
for sub_dir in sub_dir_list:
real_sub_dir = os.path.join(from_dir, sub_dir)
if not os.path.isdir(real_sub_dir):
continue
img_list = os.listdir(real_sub_dir)
img_path = []
for img in img_list:
real_img = os.path.join(real_sub_dir, img)
if not os.path.isfile(real_img):
continue
img_path.append("{}/{}".format(sub_dir, img))
if len(img_path) == 0:
continue
from_mgr.addClassify(sub_dir)
from_mgr.resetImageList(sub_dir, img_path)
return oneKeyImport(from_mgr=from_mgr, to_mgr=to_mgr)
def oneKeyImport(from_mgr: image_list_manager.ImageListManager,
to_mgr: image_list_manager.ImageListManager):
"""一键导入"""
count = 0
for classify in from_mgr.classifyList:
img_list = from_mgr.realPathList(classify)
to_mgr.addClassify(classify)
to_img_list = to_mgr.imageList(classify)
new_img_list = []
for img in img_list:
from_image_path = img
to_dir_path = os.path.join(to_mgr.dirName, "images")
md5 = fileMD5(from_image_path)
file_ext = fileExtension(from_image_path)
new_path = os.path.join(to_dir_path, md5 + file_ext)
if os.path.exists(new_path):
# 如果新文件 MD5 重复跳过后面的复制文件操作
continue
copyFile(from_image_path, new_path)
new_img_list.append("images/" + md5 + file_ext)
count += 1
to_img_list += new_img_list
to_mgr.resetImageList(classify, to_img_list)
to_mgr.writeFile()
return count
def newFile(file_path: str):
"""创建文件"""
if os.path.exists(file_path):
return False
else:
with open(file_path, 'w') as f:
pass
return True
def isEmptyDir(dir_path: str):
"""判断目录是否为空"""
return not os.listdir(dir_path)
def initLibrary(dir_path: str):
"""初始化库"""
images_dir = os.path.join(dir_path, "images")
if not os.path.exists(images_dir):
os.makedirs(images_dir)
image_list_path = os.path.join(dir_path, "image_list.txt")
newFile(image_list_path)
return os.path.exists(dir_path)
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>AddClassifyDialog</class>
<widget class="QDialog" name="AddClassifyDialog">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>286</width>
<height>127</height>
</rect>
</property>
<property name="windowTitle">
<string>添加分类</string>
</property>
<property name="modal">
<bool>true</bool>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QLabel" name="label">
<property name="text">
<string>分类名称</string>
</property>
</widget>
</item>
<item>
<widget class="QLineEdit" name="lineEdit"/>
</item>
<item>
<spacer name="verticalSpacer">
<property name="orientation">
<enum>Qt::Vertical</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>20</width>
<height>11</height>
</size>
</property>
</spacer>
</item>
<item>
<widget class="QDialogButtonBox" name="buttonBox">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property>
</widget>
</item>
</layout>
</widget>
<resources/>
<connections>
<connection>
<sender>buttonBox</sender>
<signal>accepted()</signal>
<receiver>AddClassifyDialog</receiver>
<slot>accept()</slot>
<hints>
<hint type="sourcelabel">
<x>248</x>
<y>254</y>
</hint>
<hint type="destinationlabel">
<x>157</x>
<y>274</y>
</hint>
</hints>
</connection>
<connection>
<sender>buttonBox</sender>
<signal>rejected()</signal>
<receiver>AddClassifyDialog</receiver>
<slot>reject()</slot>
<hints>
<hint type="sourcelabel">
<x>316</x>
<y>260</y>
</hint>
<hint type="destinationlabel">
<x>286</x>
<y>274</y>
</hint>
</hints>
</connection>
</connections>
</ui>
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>Dialog</class>
<widget class="QDialog" name="Dialog">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>414</width>
<height>415</height>
</rect>
</property>
<property name="minimumSize">
<size>
<width>0</width>
<height>0</height>
</size>
</property>
<property name="windowTitle">
<string>编辑图像分类</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QLabel" name="label">
<property name="text">
<string>原分类</string>
</property>
</widget>
</item>
<item>
<widget class="QLineEdit" name="oldLineEdit">
<property name="enabled">
<bool>false</bool>
</property>
</widget>
</item>
<item>
<widget class="QLabel" name="label_2">
<property name="text">
<string>新分类</string>
</property>
</widget>
</item>
<item>
<widget class="QLineEdit" name="newLineEdit">
<property name="enabled">
<bool>false</bool>
</property>
</widget>
</item>
<item>
<layout class="QHBoxLayout" name="horizontalLayout">
<item>
<widget class="QLineEdit" name="searchWordLineEdit"/>
</item>
<item>
<widget class="QPushButton" name="searchButton">
<property name="text">
<string>查找</string>
</property>
</widget>
</item>
</layout>
</item>
<item>
<widget class="QListView" name="classifyListView">
<property name="enabled">
<bool>true</bool>
</property>
<property name="minimumSize">
<size>
<width>400</width>
<height>200</height>
</size>
</property>
</widget>
</item>
<item>
<widget class="QDialogButtonBox" name="buttonBox">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property>
</widget>
</item>
</layout>
</widget>
<resources/>
<connections>
<connection>
<sender>buttonBox</sender>
<signal>accepted()</signal>
<receiver>Dialog</receiver>
<slot>accept()</slot>
<hints>
<hint type="sourcelabel">
<x>248</x>
<y>254</y>
</hint>
<hint type="destinationlabel">
<x>157</x>
<y>274</y>
</hint>
</hints>
</connection>
<connection>
<sender>buttonBox</sender>
<signal>rejected()</signal>
<receiver>Dialog</receiver>
<slot>reject()</slot>
<hints>
<hint type="sourcelabel">
<x>316</x>
<y>260</y>
</hint>
<hint type="destinationlabel">
<x>286</x>
<y>274</y>
</hint>
</hints>
</connection>
</connections>
</ui>
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>MainWindow</class>
<widget class="QMainWindow" name="MainWindow">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>833</width>
<height>538</height>
</rect>
</property>
<property name="minimumSize">
<size>
<width>0</width>
<height>0</height>
</size>
</property>
<property name="windowTitle">
<string>识图图像库管理</string>
</property>
<widget class="QWidget" name="centralwidget">
<layout class="QVBoxLayout" name="verticalLayout_3">
<item>
<layout class="QHBoxLayout" name="horizontalLayout_3">
<item>
<widget class="QToolButton" name="appMenuBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
<item>
<widget class="QToolButton" name="saveImageLibraryBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
<item>
<widget class="QToolButton" name="addClassifyBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
<item>
<widget class="QToolButton" name="removeClassifyBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
<item>
<spacer name="horizontalSpacer_3">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>40</width>
<height>20</height>
</size>
</property>
</spacer>
</item>
<item>
<widget class="QSlider" name="imageScaleSlider">
<property name="maximumSize">
<size>
<width>400</width>
<height>16777215</height>
</size>
</property>
<property name="minimum">
<number>1</number>
</property>
<property name="maximum">
<number>8</number>
</property>
<property name="pageStep">
<number>2</number>
</property>
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
</widget>
</item>
</layout>
</item>
<item>
<widget class="QSplitter" name="splitter">
<property name="sizePolicy">
<sizepolicy hsizetype="Expanding" vsizetype="Expanding">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<widget class="QWidget" name="">
<layout class="QVBoxLayout" name="verticalLayout_2">
<item>
<layout class="QHBoxLayout" name="horizontalLayout">
<item>
<widget class="QComboBox" name="searchClassifyHistoryCmb">
<property name="sizePolicy">
<sizepolicy hsizetype="Expanding" vsizetype="Fixed">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="editable">
<bool>true</bool>
</property>
</widget>
</item>
<item>
<widget class="QToolButton" name="searchClassifyBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
</layout>
</item>
<item>
<widget class="QListView" name="classifyListView">
<property name="sizePolicy">
<sizepolicy hsizetype="Expanding" vsizetype="Expanding">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="minimumSize">
<size>
<width>200</width>
<height>0</height>
</size>
</property>
<property name="editTriggers">
<set>QAbstractItemView::NoEditTriggers</set>
</property>
</widget>
</item>
</layout>
</widget>
<widget class="QWidget" name="">
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<layout class="QHBoxLayout" name="horizontalLayout_2">
<item>
<widget class="QToolButton" name="addImageBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
<item>
<widget class="QToolButton" name="removeImageBtn">
<property name="text">
<string>...</string>
</property>
</widget>
</item>
<item>
<spacer name="horizontalSpacer_2">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>40</width>
<height>20</height>
</size>
</property>
</spacer>
</item>
</layout>
</item>
<item>
<widget class="QListWidget" name="imageListWidget">
<property name="sizePolicy">
<sizepolicy hsizetype="Expanding" vsizetype="Expanding">
<horstretch>0</horstretch>
<verstretch>0</verstretch>
</sizepolicy>
</property>
<property name="minimumSize">
<size>
<width>200</width>
<height>0</height>
</size>
</property>
<property name="styleSheet">
<string notr="true">QListWidget::Item:hover{background:skyblue;padding-top:0px; padding-bottom:0px;}
QListWidget::item:selected{background:rgb(245, 121, 0); color:red;}</string>
</property>
</widget>
</item>
</layout>
</widget>
</widget>
</item>
</layout>
</widget>
<widget class="QStatusBar" name="statusbar"/>
</widget>
<resources/>
<connections/>
</ui>
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>NewlibraryDialog</class>
<widget class="QDialog" name="NewlibraryDialog">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>414</width>
<height>230</height>
</rect>
</property>
<property name="windowTitle">
<string>新建/重建 索引</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QLabel" name="label">
<property name="text">
<string>索引方式</string>
</property>
</widget>
</item>
<item>
<widget class="QComboBox" name="indexMethodCmb">
<property name="enabled">
<bool>true</bool>
</property>
<property name="currentIndex">
<number>0</number>
</property>
<item>
<property name="text">
<string>HNSW32</string>
</property>
</item>
<item>
<property name="text">
<string>FLAT</string>
</property>
</item>
<item>
<property name="text">
<string>IVF</string>
</property>
</item>
</widget>
</item>
<item>
<widget class="QCheckBox" name="resetCheckBox">
<property name="text">
<string>重建索引,警告:会覆盖原索引</string>
</property>
</widget>
</item>
<item>
<spacer name="verticalSpacer">
<property name="orientation">
<enum>Qt::Vertical</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>20</width>
<height>80</height>
</size>
</property>
</spacer>
</item>
<item>
<widget class="QDialogButtonBox" name="buttonBox">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property>
</widget>
</item>
</layout>
</widget>
<resources/>
<connections>
<connection>
<sender>buttonBox</sender>
<signal>accepted()</signal>
<receiver>NewlibraryDialog</receiver>
<slot>accept()</slot>
<hints>
<hint type="sourcelabel">
<x>248</x>
<y>254</y>
</hint>
<hint type="destinationlabel">
<x>157</x>
<y>274</y>
</hint>
</hints>
</connection>
<connection>
<sender>buttonBox</sender>
<signal>rejected()</signal>
<receiver>NewlibraryDialog</receiver>
<slot>reject()</slot>
<hints>
<hint type="sourcelabel">
<x>316</x>
<y>260</y>
</hint>
<hint type="destinationlabel">
<x>286</x>
<y>274</y>
</hint>
</hints>
</connection>
</connections>
</ui>
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>RenameClassifyDialog</class>
<widget class="QDialog" name="RenameClassifyDialog">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>342</width>
<height>194</height>
</rect>
</property>
<property name="windowTitle">
<string>重命名分类</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QLabel" name="oldlabel">
<property name="text">
<string>原名称</string>
</property>
</widget>
</item>
<item>
<widget class="QLineEdit" name="oldNameLineEdit">
<property name="enabled">
<bool>false</bool>
</property>
</widget>
</item>
<item>
<widget class="QLabel" name="newlabel">
<property name="text">
<string>新名称</string>
</property>
</widget>
</item>
<item>
<widget class="QLineEdit" name="newNameLineEdit"/>
</item>
<item>
<spacer name="verticalSpacer">
<property name="orientation">
<enum>Qt::Vertical</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>20</width>
<height>14</height>
</size>
</property>
</spacer>
</item>
<item>
<widget class="QDialogButtonBox" name="buttonBox">
<property name="orientation">
<enum>Qt::Horizontal</enum>
</property>
<property name="standardButtons">
<set>QDialogButtonBox::Cancel|QDialogButtonBox::Ok</set>
</property>
</widget>
</item>
</layout>
</widget>
<resources/>
<connections>
<connection>
<sender>buttonBox</sender>
<signal>accepted()</signal>
<receiver>RenameClassifyDialog</receiver>
<slot>accept()</slot>
<hints>
<hint type="sourcelabel">
<x>248</x>
<y>254</y>
</hint>
<hint type="destinationlabel">
<x>157</x>
<y>274</y>
</hint>
</hints>
</connection>
<connection>
<sender>buttonBox</sender>
<signal>rejected()</signal>
<receiver>RenameClassifyDialog</receiver>
<slot>reject()</slot>
<hints>
<hint type="sourcelabel">
<x>316</x>
<y>260</y>
</hint>
<hint type="destinationlabel">
<x>286</x>
<y>274</y>
</hint>
</hints>
</connection>
</connections>
</ui>
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>WaitDialog</class>
<widget class="QDialog" name="WaitDialog">
<property name="windowModality">
<enum>Qt::NonModal</enum>
</property>
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>324</width>
<height>78</height>
</rect>
</property>
<property name="windowTitle">
<string>请等待</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QLabel" name="msgLabel">
<property name="text">
<string>正在更新索引库,请等待。。。</string>
</property>
</widget>
</item>
<item>
<widget class="QProgressBar" name="progressBar">
<property name="maximum">
<number>0</number>
</property>
<property name="value">
<number>-1</number>
</property>
</widget>
</item>
<item>
<spacer name="verticalSpacer">
<property name="orientation">
<enum>Qt::Vertical</enum>
</property>
<property name="sizeHint" stdset="0">
<size>
<width>20</width>
<height>1</height>
</size>
</property>
</spacer>
</item>
</layout>
</widget>
<resources/>
<connections/>
</ui>
...@@ -60,8 +60,12 @@ class Predictor(object): ...@@ -60,8 +60,12 @@ class Predictor(object):
config = Config(model_file, params_file) config = Config(model_file, params_file)
if args.use_gpu: if args.get("use_gpu", False):
config.enable_use_gpu(args.gpu_mem, 0) config.enable_use_gpu(args.gpu_mem, 0)
elif args.get("use_npu", False):
config.enable_npu()
elif args.get("use_xpu", False):
config.enable_xpu()
else: else:
config.disable_gpu() config.disable_gpu()
if args.enable_mkldnn: if args.enable_mkldnn:
......
## PP-ShiTuV2 Image Recognition System
## Table of contents
- [1. Introduction of PP-ShiTuV2 model and application scenarios](#1-introduction-of-pp-shituv2-model-and-application-scenarios)
- [2. Quick experience](#2-quick-experience)
- [2.1 Quick experience of PP-ShiTu android demo](#21-quick-experience-of-pp-shitu-android-demo)
- [2.2 Quick experience of command line code](#22-quick-experience-of-command-line-code)
- [3 Module introduction and training](#3-module-introduction-and-training)
- [3.1 Mainbody detection](#31-mainbody-detection)
- [3.2 Feature Extraction](#32-feature-extraction)
- [3.3 Vector Search](#33-vector-search)
- [4. Inference Deployment](#4-inference-deployment)
- [4.1 Inference model preparation](#41-inference-model-preparation)
- [4.1.1 Export the inference model from pretrained model](#411-export-the-inference-model-from-pretrained-model)
- [4.1.2 Download the inference model directly](#412-download-the-inference-model-directly)
- [4.2 Test data preparation](#42-test-data-preparation)
- [4.3 Inference based on Python inference engine](#43-inference-based-on-python-inference-engine)
- [4.3.1 single image prediction](#431-single-image-prediction)
- [4.3.2 multi images prediction](#432-multi-images-prediction)
- [4.3 Inference based on C++ inference engine](#43-inference-based-on-c-inference-engine)
- [4.4 Serving deployment](#44-serving-deployment)
- [4.5 Lite deployment](#45-lite-deployment)
- [4.6 Paddle2ONNX](#46-paddle2onnx)
- [references](#references)
## 1. Introduction of PP-ShiTuV2 model and application scenarios
PP-shituv2 is a practical lightweight general image recognition system improved on PP-ShitUV1. It is composed of three modules: mainbody detection, feature extraction and vector search. Compared with PP-ShiTuV1, PP-ShiTuV2 has higher recognition accuracy, stronger generalization and similar inference speed <sup>*</sup>. This paper mainly optimize in training dataset, feature extraction with better backbone network, loss function and training strategy, which significantly improved the retrieval performance of PP-ShiTuV2 in multiple practical application scenarios.
<div align="center">
<img src="../../images/structure.jpg" />
</div>
The following table lists the relevant metric obtained by PP-ShiTuV2 with comparison to PP-ShiTuV1.
| model | storage (mainbody detection + feature extraction) | product |
| :--------- | :------------------------------------------------ | :------- |
| | | recall@1 |
| PP-ShiTuV1 | 64(30+34)MB | 66.8% |
| PP-ShiTuV2 | 49(30+19) | 73.8% |
**Note:**
- For the introduction of recall and mAP metric, please refer to [Retrieval Metric](../algorithm_introduction/reid.md).
- Latency is based on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz test, MKLDNN acceleration strategy is enabled, and the number of threads is 10.
## 2. Quick experience
### 2.1 Quick experience of PP-ShiTu android demo
You can download and install the APP by scanning the QR code or [click the link](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk)
<div align=center><img src="../../images/quick_start/android_demo/PPShiTu_qrcode.png" height="45%" width="45%"/></div>
Then save the following demo pictures to your phone:
<div align=center><img src="../../images/recognition/drink_data_demo/test_images/nongfu_spring.jpeg" width=30% height=30% /></div>
Open the installed APP, click the "**file recognition**" button below, select the above saved image, and you can get the following recognition results:
<div align=center><img src="../../images/quick_start/android_demo/android_nongfu_spring.JPG" width=30% height=30%/></div>
### 2.2 Quick experience of command line code
- First follow the commands below to install paddlepaddle and faiss
```shell
# If your machine is installed with CUDA9 or CUDA10, please run the following command to install
python3.7 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
# If your machine is CPU, please run the following command to install
python3.7 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
# install faiss database
python3.7 -m pip install faiss-cpu==1.7.1post2
```
- Then follow the command below to install the paddleclas whl package
```shell
# Go to the root directory of PaddleClas
cd PaddleClas
# install paddleclas
python3.7 setup.py install
```
- Then execute the following command to download and decompress the demo data, and finally execute command to quick start image recognition
```shell
# Download and unzip the demo data
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar
# Execute the identification command
paddleclas \
--model_name=PP-ShiTuV2 \
--infer_imgs=./drink_dataset_v2.0/test_images/100.jpeg \
--index_dir=./drink_dataset_v2.0/index/ \
--data_file=./drink_dataset_v2.0/gallery/drink_label.txt
```
## 3 Module introduction and training
### 3.1 Mainbody detection
Mainbody detection is a widely used detection technology. It refers to detecting the coordinate position of one or more objects in the image, and then cropping the corresponding area in the image for identification. Mainbody detection is the pre-procedure of the recognition task. The input image is recognized after mainbody detection, which can remove complex backgrounds and effectively improve the recognition accuracy.
Taking into account the detection speed, model size, detection accuracy and other factors, the lightweight model `PicoDet-LCNet_x2_5` developed by PaddleDetection was finally selected as the mainbody detection model of PP-ShiTuV2
For details on the dataset, training, evaluation, inference, etc. of the mainbody detection model, please refer to the document: [picodet_lcnet_x2_5_640_mainbody](../../en/image_recognition_pipeline/mainbody_detection_en.md).
### 3.2 Feature Extraction
Feature extraction is a key part of image recognition. It is designed to convert the input image into a fixed-dimensional feature vector for subsequent [vector search](../../en/image_recognition_pipeline/vector_search_en.md) . Taking into account the speed of the feature extraction model, model size, feature extraction performance and other factors, the [`PPLCNetV2_base`](../../en/models/PP-LCNet_en.md) developed by PaddleClas was finally selected as the feature extraction network. Compared with `PPLCNet_x2_5` used by PP-ShiTuV1, `PPLCNetV2_base` basically maintains high classification accuracy and reduces inference time by 40%<sup>*</sup>.
**Note:** <sup>*</sup>The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.
During the experiment, we found that we can make appropriate improvements to `PPLCNetV2_base` to achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and `FC` at the end of `PPLCNetV2_base`, change the stride of the last stage (RepDepthwiseSeparable) to 1.
For details about the dataset, training, evaluation, inference, etc. of the feature extraction model, please refer to the document: [PPLCNetV2_base_ShiTu](../../en/image_recognition_pipeline/feature_extraction_en.md).
### 3.3 Vector Search
Vector Search technology is widely used in image recognition. Its' main goal is to calculate the similarity or distance of the feature vector in the established vector database for a given query vector, and return the similarity ranking result of the candidate vector.
In the PP-ShiTuV2 recognition system, we use the [Faiss](https://github.com/facebookresearch/faiss) vector research open source library, which has good adaptability, easy installation, rich algorithms, It supports the advantages of both CPU and GPU.
For the installation and use of the Faiss vector research tool in the PP-ShiTuV2 system, please refer to the document: [vector search](../../en/image_recognition_pipeline/vector_search_en.md).
## 4. Inference Deployment
### 4.1 Inference model preparation
Paddle Inference is the native inference database of Paddle, which enabled on the server and the cloud to provide high-performance inference capabilities. Compared to making predictions based on pre-trained models directly, Paddle Inference can use MKLDNN, CUDNN, and TensorRT for prediction acceleration to achieve better inference performance. For more introduction to Paddle Inference inference engine, please refer to [Paddle Inference official website tutorial](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html).
When using Paddle Inference for model inference, the loaded model type is the inference model. This case provides two methods to obtain the inference model. If you want to get the same result as the document, please click [Download the inference model directly](#412-download-the-inference-model-directly).
#### 4.1.1 Export the inference model from pretrained model
- Please refer to the document [Mainbody Detection Inference Model Preparation](../../en/image_recognition_pipeline/mainbody_detection_en.md), or refer to [4.1.2](#412-direct download-inference-model)
- To export the weights of the feature extraction model, you can refer to the following commands:
```shell
python3.7 tools/export_model.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams" \
-o Global.save_inference_dir=deploy/models/GeneralRecognitionV2_PPLCNetV2_base`
```
After executing the script, the `GeneralRecognitionV2_PPLCNetV2_base` folder will be generated under `deploy/models/` with the following file structure:
```log
deploy/models/
├── GeneralRecognitionV2_PPLCNetV2_base
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
#### 4.1.2 Download the inference model directly
[Section 4.1.1](#411-export-the-inference-model-from-pretrained-model) provides a method to export the inference model, here we provide the exported inference model, you can download the model to the specified location and decompress it by the following command experience.
```shell
cd deploy/models
# Download the mainbody detection inference model and unzip it
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
# Download the feature extraction inference model and unzip it
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.
```
### 4.2 Test data preparation
After preparing the mainbody detection and feature extraction models, you also need to prepare the test data as input. You can run the following commands to download and decompress the test data.
```shell
# return to ./deploy
cd ../
# Download the test data drink_dataset_v2.0 and unzip it
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar
```
### 4.3 Inference based on Python inference engine
#### 4.3.1 single image prediction
Then execute the following command to identify the single image `./drink_dataset_v2.0/test_images/100.jpeg`.
```shell
# Execute the following command to predict with GPU
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg"
# Execute the following command to predict with CPU
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg" -o Global.use_gpu=False
```
The final output is as follows.
```log
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
```
#### 4.3.2 multi images prediction
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can modify the corresponding configuration through the following -o parameter.
```shell
# Use the command below to predict with GPU
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images"
# Use the following command to predict with CPU
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images" -o Global.use_gpu=False
```
The terminal will output the recognition results of all images in the folder, as shown below.
```log
...
[{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}]
Inference: 120.39852142333984 ms per batch image
[{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}]
Inference: 32.045602798461914 ms per batch image
[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}]
Inference: 113.41428756713867 ms per batch image
[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}]
Inference: 122.04337120056152 ms per batch image
[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}]
Inference: 37.95266151428223 ms per batch image
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
...
```
Where `bbox` represents the bounding box of the detected mainbody, `rec_docs` represents the most similar category to the detection object in the index database, and `rec_scores` represents the corresponding similarity.
### 4.3 Inference based on C++ inference engine
PaddleClas provides an example of inference based on C++ prediction engine, you can refer to [Server-side C++ prediction](../../../deploy/cpp_shitu/readme_en.md) to complete the corresponding inference deployment. If you are using the Windows platform, you can refer to [Visual Studio 2019 Community CMake Compilation Guide](../inference_deployment/python_deploy_en.md) to complete the corresponding prediction database compilation and model prediction work.
### 4.4 Serving deployment
Paddle Serving provides high-performance, flexible and easy-to-use industrial-grade online inference services. Paddle Serving supports RESTful, gRPC, bRPC and other protocols, and provides inference solutions in a variety of heterogeneous hardware and operating system environments. For more introduction to Paddle Serving, please refer to [Paddle Serving Code Repository](https://github.com/PaddlePaddle/Serving).
PaddleClas provides an example of model serving deployment based on Paddle Serving. You can refer to [Model serving deployment](../inference_deployment/recognition_serving_deploy_en.md) to complete the corresponding deployment.
### 4.5 Lite deployment
Paddle Lite is a high-performance, lightweight, flexible and easily extensible deep learning inference framework, positioned to support multiple hardware platforms including mobile, embedded and server. For more introduction to Paddle Lite, please refer to [Paddle Lite Code Repository](https://github.com/PaddlePaddle/Paddle-Lite).
### 4.6 Paddle2ONNX
Paddle2ONNX supports converting PaddlePaddle model format to ONNX model format. The deployment of Paddle models to various inference engines can be completed through ONNX, including TensorRT/OpenVINO/MNN/TNN/NCNN, and other inference engines or hardware that support the ONNX open source format. For more introduction to Paddle2ONNX, please refer to [Paddle2ONNX Code Repository](https://github.com/PaddlePaddle/Paddle2ONNX).
PaddleClas provides an example of converting an inference model to an ONNX model and making inference prediction based on Paddle2ONNX. You can refer to [Paddle2ONNX Model Conversion and Prediction](../../../deploy/paddle2onnx/readme_en.md) to complete the corresponding deployment work.
## references
1. Schall, Konstantin, et al. "GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval." International Conference on Multimedia Modeling. Springer, Cham, 2022.
2. Luo, Hao, et al. "A strong baseline and batch normalization neck for deep person re-identification." IEEE Transactions on Multimedia 22.10 (2019): 2597-2609.
...@@ -12,12 +12,15 @@ ...@@ -12,12 +12,15 @@
- [4.4 Model Inference](#4.4) - [4.4 Model Inference](#4.4)
<a name="1"></a> <a name="1"></a>
## 1.Introduction
## 1. Abstract
Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning. Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning.
<a name="2"></a> <a name="2"></a>
## 2.Network Structure
## 2. Introduction
In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure: In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure:
...@@ -31,152 +34,239 @@ Functions of the above modules : ...@@ -31,152 +34,239 @@ Functions of the above modules :
- **Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss. - **Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss.
<a name="3"></a> <a name="3"></a>
## 3.General Recognition Models
In PP-Shitu, we have [PP_LCNet_x2_5](../models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](../../../ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in [General Recognition_configuration files](../../../ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets: ## 3. Methods
#### 3.1 Backbone
The Backbone part adopts [PP-LCNetV2_base](../models/PP-LCNetV2.md), which is based on `PPLCNet_V1`, including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement After several optimization points, the final classification accuracy is similar to `PPLCNet_x2_5`, and the inference delay is reduced by 40%<sup>*</sup>. During the experiment, we made appropriate improvements to `PPLCNetV2_base`, so that it can achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and ` at the end of `PPLCNetV2_base` FC`, change the stride of the last stage (RepDepthwiseSeparable) to 1.
**Note:** <sup>*</sup>The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.
#### 3.2 Neck
We use [BN Neck](../../../ppcls/arch/gears/bnneck.py) to standardize each dimension of the features extracted by Backbone, reducing difficulty of optimizing metric learning loss and identification loss simultaneously.
| Datasets | Data Size | Class Number | Scenarios | URL | #### 3.3 Head
| ------------ | --------- | ------------ | ------------------ | ------------------------------------------------------------ |
| Aliproduct | 2498771 | 50030 | Commodities | [URL](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
| GLDv2 | 1580470 | 81313 | Landmarks | [URL](https://github.com/cvdfoundation/google-landmark) |
| VeRI-Wild | 277797 | 30671 | Vehicle | [URL](https://github.com/PKU-IMRE/VERI-Wild) |
| LogoDet-3K | 155427 | 3000 | Logo | [URL](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| iCartoonFace | 389678 | 5013 | Cartoon Characters | [URL](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
| SOP | 59551 | 11318 | Commodities | [URL](https://cvgl.stanford.edu/projects/lifted_struct/) |
| Inshop | 25882 | 3997 | Commodities | [URL](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
| **Total** | **5M** | **185K** | ---- | ---- |
The results are shown in the table below: We use [FC Layer](../../../ppcls/arch/gears/fc.py) as the classification head to convert features into logits for classification loss.
| Model | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP | Inshop | Latency(ms) | #### 3.4 Loss
| ------------- | ---------- | --------- | ---------- | ------------ | ----- | ------ | ----------- |
| PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 |
- Evaluation metric: `Recall@1` We use [Cross entropy loss](../../../ppcls/loss/celoss.py) and [TripletAngularMarginLoss](../../../ppcls/loss/tripletangularmarginloss.py), and we improved the original TripletLoss(TriHard Loss), replacing the optimization objective from L2 Euclidean space to cosine space, adding a hard distance constraint between anchor and positive/negtive, so the generalization ability of the model is improved. For detailed configuration files, see [GeneralRecognitionV2_PPLCNetV2_base.yaml](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-77).
- CPU of the speed evaluation machine: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.
- Evaluation conditions for the speed metric: MKLDNN enabled, number of threads set to 10 #### 3.5 Data Augmentation
- Address of the pre-training model: [General recognition pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
We consider that the object may rotate to a certain extent and can not maintain an upright state in real scenes, so we add an appropriate [random rotation](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117) in the data augmentation to improve the retrieval performance in real scenes.
<a name="4"></a> <a name="4"></a>
## 4.Customized Feature Extraction
Customized feature extraction refers to retraining the feature extraction model based on one's own task. It consists of four main steps: 1) data preparation, 2) model training, 3) model evaluation, and 4) model inference. ## 4. Experimental
<a name="4.1"></a> We reasonably expanded and optimized the original training data, and finally used a summary of the following 17 public datasets:
### 4.1 Data Preparation
To start with, customize your dataset based on the task (See [Format description](../data_preparation/recognition_dataset_en.md#1) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below: | Dataset | Data Amount | Number of Categories | Scenario | Dataset Address |
| :--------------------- | :---------: | :------------------: | :---------: | :-------------------------------------------------------------------------------------: |
| Aliproduct | 2498771 | 50030 | Commodities | [Address](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
| GLDv2 | 1580470 | 81313 | Landmark | [address](https://github.com/cvdfoundation/google-landmark) |
| VeRI-Wild | 277797 | 30671 | Vehicles | [Address](https://github.com/PKU-IMRE/VERI-Wild) |
| LogoDet-3K | 155427 | 3000 | Logo | [Address](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| SOP | 59551 | 11318 | Commodities | [Address](https://cvgl.stanford.edu/projects/lifted_struct/) |
| Inshop | 25882 | 3997 | Commodities | [Address](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
| bird400 | 58388 | 400 | birds | [address](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) |
| 104flows | 12753 | 104 | Flowers | [Address](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) |
| Cars | 58315 | 112 | Vehicles | [Address](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) |
| Fashion Product Images | 44441 | 47 | Products | [Address](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) |
| flowerrecognition | 24123 | 59 | flower | [address](https://www.kaggle.com/datasets/aymenktari/flowerrecognition) |
| food-101 | 101000 | 101 | food | [address](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) |
| fruits-262 | 225639 | 262 | fruits | [address](https://www.kaggle.com/datasets/aelchimminut/fruits262) |
| inaturalist | 265213 | 1010 | natural | [address](https://github.com/visipedia/inat_comp/tree/master/2017) |
| indoor-scenes | 15588 | 67 | indoor | [address](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019) |
| Products-10k | 141931 | 9691 | Products | [Address](https://products-10k.github.io/) |
| CompCars | 16016 | 431 | Vehicles | [Address](http://​​​​​​http://ai.stanford.edu/~jkrause/cars/car_dataset.html​) |
| **Total** | **6M** | **192K** | - | - |
``` The final model accuracy metrics are shown in the following table:
Head:
name: ArcMargin
embedding_size: 512
class_num: 185341 #Number of class
```
``` | Model | Latency (ms) | Storage (MB) | product<sup>*</sup> | | Aliproduct | | VeRI-Wild | | LogoDet-3k | | iCartoonFace | | SOP | | Inshop | | gldv2 | | imdb_face | | iNat | | instre | | sketch | | sop | |
Train: | :--------------------- | :----------- | :----------- | :------------------ | :--- | ---------- | ---- | --------- | ---- | ---------- | ---- | ------------ | ---- | -------- | --------- | ------ | -------- | ----- | -------- | --------- | -------- | ---- | -------- | ------ | -------- | ------ | -------- | --- | --- |
dataset: | | | | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mrecall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP |
name: ImageNetDataset | PP-ShiTuV1_general_rec | 5.0 | 34 | 65.9 | 54.3 | 83.9 | 83.2 | 88.7 | 60.1 | 86.1 | 73.6 | | 50.4 | 27.9 | 9.5 | 97.6 | 90.3 |
image_root: ./dataset/ #The directory where the train dataset is located | PP-ShiTuV2_general_rec | 6.1 | 19 | 73.7 | 61.0 | 84.2 | 83.3 | 87.8 | 68.8 | 88.0 | 63.2 | 53.6 | 27.5 | | 71.4 | 39.3 | 15.6 | 98.3 | 90.9 |
cls_label_path: ./dataset/train_reg_all_data.txt #The address of label file for train dataset
```
``` *The product dataset is a dataset made to verify the generalization performance of PP-ShiTu, and all the data are not present in the training and testing sets. The data contains 7 major categories (cosmetics, landmarks, wine, watches, cars, sports shoes, beverages) and 250 subcategories. When testing, use the labels of 250 small classes for testing; the sop dataset comes from [GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122), which can be regarded as " SOP" dataset.
Query: * Pre-trained model address: [general_PPLCNetV2_base_pretrained_v1.0.pdparams](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams)
dataset: * The evaluation metrics used are: `Recall@1` and `mAP`
name: VeriWild * The CPU specific information of the speed test machine is: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`
image_root: ./dataset/Aliproduct/. #The directory where the query dataset is located * The evaluation conditions of the speed indicator are: MKLDNN is turned on, and the number of threads is set to 10
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for query dataset
```
``` <a name="5"></a>
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/ #The directory where the gallery dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for gallery dataset
```
<a name="4.2"></a> ## 5. Custom Feature Extraction
### 4.2 Model Training
- Single machine single card training Custom feature extraction refers to retraining the feature extraction model according to your own task.
``` Based on the `GeneralRecognitionV2_PPLCNetV2_base.yaml` configuration file, the following describes the main four steps: 1) data preparation; 2) model training; 3) model evaluation; 4) model inference
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
- Single machine multi card training <a name="5.1"></a>
``` ### 5.1 Data Preparation
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
**Note:** The configuration file adopts `online evaluation` by default, if you want to speed up the training and remove `online evaluation`, just add `-o eval_during_train=False` after the above command. After training, the final model files `latest`, `best_model` and the training log file `train.log` will be generated under the directory output. Among them, `best_model` is utilized to store the best model under the current evaluation metrics while`latest` is adopted to store the latest generated model, making it convenient to resume the training from where it was interrupted. First you need to customize your own dataset based on the task. Please refer to [Dataset Format Description](../data_preparation/recognition_dataset.md) for the dataset format and file structure.
- Resumption of Training: After the preparation is complete, it is necessary to modify the content related to the data configuration in the configuration file, mainly including the path of the dataset and the number of categories. As is as shown below:
``` - Modify the number of classes:
export CUDA_VISIBLE_DEVICES=0,1,2,3 ```yaml
python -m paddle.distributed.launch \ Head:
--gpus="0,1,2,3" tools/train.py \ name: FC
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ embedding_size: *feat_dim
class_num: 192612 # This is the number of classes
weight_attr:
initializer:
name: Normal
std: 0.001
bias_attr: False
```
- Modify the training dataset configuration:
```yaml
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ # Here is the directory where the train dataset is located
cls_label_path: ./dataset/train_reg_all_data_v2.txt # Here is the path of the label file corresponding to the train dataset
relabel: True
```
- Modify the query data configuration in the evaluation dataset:
```yaml
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/ # Here is the directory where the query dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the query dataset
```
- Modify the gallery data configuration in the evaluation dataset:
```yaml
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/ # This is the directory where the gallery dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the gallery dataset
```
<a name="5.2"></a>
### 5.2 Model training
Model training mainly includes the starting training and restoring training from checkpoint
- Single machine and single card training
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
```
- Single machine multi-card training
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
```
**Notice:**
The online evaluation method is used by default in the configuration file. If you want to speed up the training, you can turn off the online evaluation function, just add `-o Global.eval_during_train=False` after the above scripts.
After training, the final model files `latest.pdparams`, `best_model.pdarams` and the training log file `train.log` will be generated in the output directory. Among them, `best_model` saves the best model under the current evaluation index, and `latest` is used to save the latest generated model, which is convenient to resume training from the checkpoint when training task is interrupted. Training can be resumed from a checkpoint by adding `-o Global.checkpoint="path_to_resume_checkpoint"` to the end of the above training scripts, as shown below.
- Single machine and single card checkpoint recovery training
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.checkpoint="output/RecModel/latest" -o Global.checkpoint="output/RecModel/latest"
``` ```
- Single-machine multi-card checkpoint recovery training
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.checkpoint="output/RecModel/latest"
```
<a name="4.3"></a> <a name="5.3"></a>
### 4.3 Model Evaluation
- Single Card Evaluation ### 5.3 Model Evaluation
``` In addition to the online evaluation of the model during training, the evaluation program can also be started manually to obtain the specified model's accuracy metrics.
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
- Multi Card Evaluation - Single Card Evaluation
```shell
export CUDA_VISIBLE_DEVICES=0
python3.7 tools/eval.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
``` - Multi Card Evaluation
export CUDA_VISIBLE_DEVICES=0,1,2,3 ```shell
python -m paddle.distributed.launch \ export CUDA_VISIBLE_DEVICES=0,1,2,3
--gpus="0,1,2,3" tools/eval.py \ python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ tools/eval.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model" -o Global.pretrained_model="output/RecModel/best_model"
``` ```
**Note:** Multi Card Evaluation is recommended. This method can quickly obtain the metric cross all the data by using multi-card parallel computing, which can speed up the evaluation.
**Recommendation:** It is suggested to employ multi-card evaluation, which can quickly obtain the feature set of the overall dataset using multi-card parallel computing, accelerating the evaluation process. <a name="5.4"></a>
<a name="4.4"></a> ### 5.4 Model Inference
### 4.4 Model Inference
Two steps are included in the inference: 1)exporting the inference model; 2)obtaining the feature vector. The inference process consists of two steps: 1) Export the inference model; 2) Model inference to obtain feature vectors
#### 4.4.1 Export Inference Model #### 5.4.1 Export inference model
``` First, you need to convert the `*.pdparams` model file into inference format. The conversion script is as follows.
python tools/export_model.py \ ```shell
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ python3.7 tools/export_model.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model" -o Global.pretrained_model="output/RecModel/best_model"
``` ```
The generated inference model is located in the `PaddleClas/inference` directory by default, which contains three files, `inference.pdmodel`, `inference.pdiparams`, `inference.pdiparams.info`.
Where `inference.pdmodel` is used to store the structure of the inference model, `inference.pdiparams` and `inference.pdiparams.info` are used to store parameter information related to the inference model.
The generated inference models are under the directory `inference`, which comprises three files, namely, `inference.pdmodel``inference.pdiparams``inference.pdiparams.info`. Among them, `inference.pdmodel` serves to store the structure of inference model while `inference.pdiparams` and `inference.pdiparams.info` are mobilized to store model-related parameters. #### 5.4.2 Get feature vector
#### 4.4.2 Obtain Feature Vector Use the inference model converted in the previous step to convert the input image into corresponding feature vector. The inference script is as follows.
``` ```shell
cd deploy cd deploy
python python/predict_rec.py \ python3.7 python/predict_rec.py \
-c configs/inference_rec.yaml \ -c configs/inference_rec.yaml \
-o Global.rec_inference_model_dir="../inference" -o Global.rec_inference_model_dir="../inference"
``` ```
The resulting feature output format is as follows:
```log
wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02
4.39076796e-02 -2.34078392e-02 -9.49947070e-03 1.28221214e-02
5.53947650e-02 1.01355985e-02 -1.06436480e-02 4.97181974e-02
-2.21862812e-02 -1.75557341e-02 1.55848479e-02 -3.33278324e-03
...
-3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02]
```
In most cases, just getting the features may not meet the users' requirements. If you want to go further on the image recognition task, you can refer to the document [Vector Search](./vector_search.md).
<a name="6"></a>
## 6. Summary
As a key part of image recognition, the feature extraction module has a lot of points for improvement in the network structure and the the loss function. Different datasets have their own characteristics, such as person re-identification, commodity recognition, face recognition. According to these characteristics, the academic community has proposed various methods, such as PCB, MGN, ArcFace, CircleLoss, TripletLoss, etc., which focus on the ultimate goal of increasing the gap between classes and reducing the gap within classes, so as to make a retrieval model robust enough in most scenes.
<a name="7"></a>
The output format of the obtained features is shown in the figure below:![img](../../images/feature_extraction_output.png) ## 7. References
In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](./vector_search_en.md). 1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf)
2. [Bag of Tricks and A Strong Baseline for Deep Person Re-identification](https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf)
...@@ -25,17 +25,16 @@ PaddleClas supports Python wheel package for prediction. At present, PaddleClas ...@@ -25,17 +25,16 @@ PaddleClas supports Python wheel package for prediction. At present, PaddleClas
## 1. Installation ## 1. Installation
* installing from pypi * **[Recommended]** Installing from PyPI:
```bash ```bash
pip3 install paddleclas==2.2.1 pip3 install paddleclas
``` ```
* build own whl package and install * Please build and install locally if you need to use the develop branch of PaddleClas to experience the latest functions, or need to redevelop based on PaddleClas. The command is as follows:
```bash ```bash
python3 setup.py bdist_wheel python3 setup.py install
pip3 install dist/*
``` ```
<a name="2"></a> <a name="2"></a>
......
...@@ -25,14 +25,14 @@ git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop ...@@ -25,14 +25,14 @@ git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
## 2. Install PaddleClas and requirements ## 2. Install PaddleClas and requirements
It is recommanded that installing from PyPI: * **[Recommended]** Installing from PyPI:
```shell ```shell
pip install paddleclas pip install paddleclas
``` ```
PaddleClas dependencies are listed in file `requirements.txt`, you can use the following command to install the dependencies. * Please build and install locally if you need to use the develop branch of PaddleClas to experience the latest functions, or need to redevelop based on PaddleClas. The command is as follows:
``` ```shell
pip install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple python setup.py install
``` ```
# Quick Start of Recognition # Quick Start of Recognition
This tutorial contains 3 parts: Environment Preparation, Image Recognition Experience, and Unknown Category Image Recognition Experience. This document contains 2 parts: PP-ShiTu android demo quick start and PP-ShiTu PC demo quick start.
If the image category already exists in the image index database, then you can take a reference to chapter [Image Recognition Experience](#2),to complete the progress of image recognition;If you wish to recognize unknow category image, which is not included in the index database,you can take a reference to chapter [Unknown Category Image Recognition Experience](#3),to complete the process of creating an index to recognize it。 If the image category already exists in the image index library, you can directly refer to the [Image Recognition Experience](#image recognition experience) chapter to complete the image recognition process; if you want to recognize images of unknown classes, that is, the image category did not exist in the index library before , then you can refer to the [Unknown Category Image Recognition Experience](#Unknown Category Image Recognition Experience) chapter to complete the process of indexing and recognition.
## Catalogue ## Catalogue
* [1. Enviroment Preparation](#1) - [1. PP-ShiTu android demo for quick start](#1-pp-shitu-android-demo-for-quick-start)
* [2. Image Recognition Experience](#2) - [1.1 Install PP-ShiTu android demo](#11-install-pp-shitu-android-demo)
* [2.1 Download and Unzip the Inference Model and Demo Data](#2.1) - [1.2 Feature Experience](#12-feature-experience)
* [2.2 Product Recognition and Retrieval](#2.2) - [1.2.1 Image Retrieval](#121-image-retrieval)
* [2.2.1 Single Image Recognition](#2.2.1) - [1.2.2 Update Index](#122-update-index)
* [2.2.2 Folder-based Batch Recognition](#2.2.2) - [1.2.3 Save Index](#123-save-index)
* [3. Unknown Category Image Recognition Experience](#3) - [1.2.4 Initialize Index](#124-initialize-index)
* [3.1 Prepare for the new images and labels](#3.1) - [1.2.5 Preview Index](#125-preview-index)
* [3.2 Build a new Index Library](#3.2) - [1.3 Feature Details](#13-feature-details)
* [3.3 Recognize the Unknown Category Images](#3.3) - [1.3.1 Image Retrieval](#131-image-retrieval)
- [1.3.2 Update Index](#132-update-index)
- [1.3.3 Save Index](#133-save-index)
- [1.3.4 Initialize Index](#134-initialize-index)
- [1.3.5 Preview Index](#135-preview-index)
- [2. PP-ShiTu PC demo for quick start](#2-pp-shitu-pc-demo-for-quick-start)
- [2.1 Environment configuration](#21-environment-configuration)
- [2.2 Image recognition experience](#22-image-recognition-experience)
- [2.2.1 Download and unzip the inference model and demo data](#221-download-and-unzip-the-inference-model-and-demo-data)
- [2.2.2 Drink recognition and retrieval](#222-drink-recognition-and-retrieval)
- [2.2.2.1 single image recognition](#2221-single-image-recognition)
- [2.2.2.2 Folder-based batch recognition](#2222-folder-based-batch-recognition)
- [2.3 Image of Unknown categories recognition experience](#23-image-of-unknown-categories-recognition-experience)
- [2.3.1 Prepare new data and labels](#231-prepare-new-data-and-labels)
- [2.3.2 Create a new index database](#232-create-a-new-index-database)
- [2.3.3 Image recognition based on the new index database](#233-image-recognition-based-on-the-new-index-database)
- [2.4 List of server recognition models](#24-list-of-server-recognition-models)
<a name="PP-ShiTu android quick start"></a>
<a name="1"></a> ## 1. PP-ShiTu android demo for quick start
## 1. Enviroment Preparation
* Installation:Please take a reference to [Quick Installation ](../installation/)to configure the PaddleClas environment. <a name="install"></a>
* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`. ### 1.1 Install PP-ShiTu android demo
``` You can download and install the APP by scanning the QR code or [click the link](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk)
cd deploy
```
<a name="2"></a> <div align=center><img src="../../images/quick_start/android_demo/PPShiTu_qrcode.png" height="400" width="400"/></div>
## 2. Image Recognition Experience
The detection model with the recognition inference model for the 4 directions (Logo, Cartoon Face, Vehicle, Product), the address for downloading the test data and the address of the corresponding configuration file are as follows. <a name="Feature Experience"></a>
| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database | ### 1.2 Feature Experience
| ------------ | ------------- | -------- | ------- | -------- | At present, the PP-ShiTu android demo has basic features such as image retrieval, add image to the index database, saving the index database, initializing the index database, and viewing the index database. Next, we will introduce how to experience these features.
| Generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - |
| Logo Recognition Model | Logo Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | [build_logo.yaml](../../../deploy/configs/build_logo.yaml) |
| Cartoon Face Recognition Model| Cartoon Face Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) |
| Vehicle Fine-Grained Classfication Model | Vehicle Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) |
| Product Recignition Model | Product Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
| Vehicle ReID Model | Vehicle ReID Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - |
| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database | #### 1.2.1 Image Retrieval
| ------------ | ------------- | -------- | ------- | -------- | Click the "photo recognition" button below <img src="../../images/quick_start/android_demo/paizhaoshibie_100.png" width="25" height="25"/> or the "file recognition" button<img src ="../../images/quick_start/android_demo/bendishibie_100.png" width="25" height="25"/>, you can take an image or select an image, then wait a few seconds, main object in the image will be marked and the predicted class and inference time will be shown below the image.
| Lightweight generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - |
| Lightweight generic recognition model | General Scenarios | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
Take the following image as an example:
Demo data in this tutorial can be downloaded here: [download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar). <img src="../../images/recognition/drink_data_demo/test_images/nongfu_spring.jpeg" width="400" height="600"/>
The retrieval results obtained are visualized as follows:
**Attention** <img src="../../images/quick_start/android_demo/android_nongfu_spring.JPG" width="400" height="800"/>
1. If you do not have wget installed on Windows, you can download the model by copying the link into your browser and unzipping it in the appropriate folder; for Linux or macOS users, you can right-click and copy the download link to download it via the `wget` command.
2. If you want to install `wget` on macOS, you can run the following command.
3. The predict config file of the lightweight generic recognition model and the config file to build index database are used for the config of product recognition model of server-side. You can modify the path of the model to complete the index building and prediction.
```shell #### 1.2.2 Update Index
# install homebrew Click the "photo upload" button above <img src="../../images/quick_start/android_demo/paizhaoshangchuan_100.png" width="25" height="25"/> or the "file upload" button <img src ="../../images/quick_start/android_demo/bendishangchuan_100.png" width="25" height="25"/>, you can take an image or select an image and enter the class name of the uploaded image (such as `keyboard`), click the "OK" button, then the feature vector and classname corresponding to the image will be added to the index database.
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
# install wget
brew install wget
```
3. If you want to isntall `wget` on Windows, you can refer to [link](https://www.cnblogs.com/jeshy/p/10518062.html). If you want to install `tar` on Windows, you can refer to [link](https://www.cnblogs.com/chooperman/p/14190107.html). #### 1.2.3 Save Index
Click the "save index" button above <img src="../../images/quick_start/android_demo/baocunxiugai_100.png" width="25" height="25"/>, you can save the current index database as `latest`.
#### 1.2.4 Initialize Index
Click the "initialize index" button above <img src="../../images/quick_start/android_demo/reset_100.png" width="25" height="25"/> to initialize the current library to `original`.
* You can download and unzip the data and models by following the command below #### 1.2.5 Preview Index
Click the "class preview" button <img src="../../images/quick_start/android_demo/leibichaxun_100.png" width="25" height="25"/> to view it in the pop-up window.
```shell <a name="Feature introduction"></a>
mkdir models
cd models
# Download and unzip the inference model
wget {Models download link} && tar -xf {Name of the tar archive}
cd ..
# Download the demo data and unzip ### 1.3 Feature Details
wget {Data download link} && tar -xf {Name of the tar archive}
``` #### 1.3.1 Image Retrieval
After selecting the image to be retrieved, firstly, the mainbody detection will be performed through the detection model to obtain the bounding box of ​the object in the image, and then the image will be cropped and is input into the feature extraction model to obtain the corresponding feature vector and retrieved in the index database, returns and displays the final search result.
#### 1.3.2 Update Index
After selecting the picture to be stored, firstly, the mainbody detection will be performed through the detection model to obtain the bounding box of ​the object in the image, and then the image will be cropped and is input into the feature extraction model to obtain the corresponding feature vector, and then added into index database.
#### 1.3.3 Save Index
Save the index database in the current program index database name of `latest`, and automatically switch to `latest`. The saving logic is similar to "Save As" in general software. If the current index is already `latest`, it will be automatically overwritten, or it will switch to `latest`.
#### 1.3.4 Initialize Index
When initializing the index database, it will automatically switch the search index database to `original.index` and `original.txt`, and automatically delete `latest.index` and `latest.txt` (if exists).
#### 1.3.5 Preview Index
One can preview it according to the instructions in [Function Experience - Preview Index](#125-preview-index).
## 2. PP-ShiTu PC demo for quick start
<a name="Environment Configuration"></a>
### 2.1 Environment configuration
* Installation: Please refer to the document [Environment Preparation](../installation/install_paddleclas.md) to configure the PaddleClas operating environment.
* Go to the `deploy` run directory. All the content and scripts in this section need to be run in the `deploy` directory, you can enter the `deploy` directory with the following scripts.
```shell
cd deploy
```
<a name="Image Recognition Experience"></a>
<a name="2.1"></a> ### 2.2 Image recognition experience
### 2.1 Download and Unzip the Inference Model and Demo Data
Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands. The lightweight general object detection model, lightweight general recognition model and configuration file are available in following table.
<a name="Lightweight General object Detection Model and Lightweight General Recognition Model"></a>
| Model Introduction | Recommended Scenarios | Inference Model | Prediction Profile |
| ------------------------------------------ | --------------------- | ------------------ | ------------------------------------------------------------------------ |
| Lightweight General MainBody Detection Model | General Scene | [tar format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainMainBody_lite_v1.0_infer.tar ) \| [zip format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainMainBody_lite_v1.0_infer.zip) | - |
| Lightweight General Recognition Model | General Scene | [tar format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar) \| [zip format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.zip) | [inference_general.yaml](../../../deploy/configs/inference_general.yaml) |
Note: Since some decompression software has problems in decompressing the above `tar` format files, it is recommended that non-script line users download the `zip` format files and decompress them. `tar` format file is recommended to use the script `tar -xf xxx.tar`unzip.
The demo data download path of this chapter is as follows: [drink_dataset_v2.0.tar (drink data)](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar),
The following takes **drink_dataset_v2.0.tar** as an example to introduce the PP-ShiTu quick start process on the PC. Users can also download and decompress the data of other scenarios to experience: [22 scenarios data download](../../zh_CN/introduction/ppshitu_application_scenarios.md#22-下载解压场景库数据).
If you want to experience the server object detection and the recognition model of each scene, you can refer to [2.4 Server recognition model list](#24-list-of-server-identification-models)
**Notice**
- If wget is not installed in the windows environment, you can install the `wget` and tar scripts according to the following steps, or you can copy the link to the browser to download the model, decompress it and place it in the corresponding directory.
- If the `wget` script is not installed in the macOS environment, you can run the following script to install it.
```shell
# install homebrew
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
# install wget
brew install wget
```
- If you want to install `wget` in the windows environment, you can refer to: [link](https://www.cnblogs.com/jeshy/p/10518062.html); if you want to install the `tar` script in the windows environment, you can refer to: [Link](https://www.cnblogs.com/chooperman/p/14190107.html).
<a name="2.2.1"></a>
#### 2.2.1 Download and unzip the inference model and demo data
Download the demo dataset and the lightweight subject detection and recognition model. The scripts are as follows.
```shell ```shell
mkdir models mkdir models
cd models cd models
# Download the generic detection inference model and unzip it # Download the mainbody detection inference model and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
# Download and unpack the inference model # Download the feature extraction inference model and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar
cd ..
cd ../
# Download the demo data and unzip it # Download demo data and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar && tar -xf recognition_demo_data_en_v1.1.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar
``` ```
Once unpacked, the `recognition_demo_data_v1.1` folder should have the following file structure. After decompression, the `drink_dataset_v2.0/` folder be structured as follows:
``` ```log
├── recognition_demo_data_v1.1 ├── drink_dataset_v2.0/
│ ├── gallery_cartoon │ ├── gallery/
│ ├── gallery_logo │ ├── index/
│ ├── gallery_product │ ├── index_all/
│ ├── gallery_vehicle │ └── test_images/
│ ├── test_cartoon
│ ├── test_logo
│ ├── test_product
│ └── test_vehicle
├── ... ├── ...
``` ```
here, original images to build index are in folder `gallery_xxx`, test images are in folder `test_xxx`. You can also access specific folder for more details. The `gallery` folder stores the original images used to build the index database, `index` represents the index database constructed based on the original images, and the `test_images` folder stores the list of images for query.
The `models` folder should have the following file structure. The `models` folder should be structured as follows:
``` ```log
├── product_ResNet50_vd_aliproduct_v1.0_infer ├── general_PPLCNetV2_base_pretrained_v1.0_infer
│ ├── inference.pdiparams │ ├── inference.pdiparams
│ ├── inference.pdiparams.info │ ├── inference.pdiparams.info
│ └── inference.pdmodel │ └── inference.pdmodel
├── ppyolov2_r50vd_dcn_mainbody_v1.0_infer ├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer
│ ├── inference.pdiparams │ ├── inference.pdiparams
│ ├── inference.pdiparams.info │ ├── inference.pdiparams.info
│ └── inference.pdmodel │ └── inference.pdmodel
``` ```
**Attention** **Notice**
If you want to use the lightweight generic recognition model, you need to re-extract the features of the demo data and re-build the index. The way is as follows:
If the general feature extraction model is changed, the index for demo data must be rebuild, as follows:
```shell ```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer python3.7 python/build_gallery.py \
-c configs/inference_general.yaml \
-o Global.rec_inference_model_dir=./models/general_PPLCNetV2_base_pretrained_v1.0_infer
``` ```
<a name="2.2"></a> <a name="Drink Recognition and Retrieval"></a>
### 2.2 Product Recognition and Retrieval
Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction).
**Note:** `faiss` is used as search library. The installation method is as follows: #### 2.2.2 Drink recognition and retrieval
``` Take the drink recognition demo as an example to show the recognition and retrieval process.
pip install faiss-cpu==1.7.1post2
```
If error happens when using `import faiss`, please uninstall `faiss` and reinstall it, especially on `Windows`. Note that this section will uses `faiss` as the retrieval tool, and the installation script is as follows:
<a name="2.2.1"></a> ```python
python3.7 -m pip install faiss-cpu==1.7.1post2
```
#### 2.2.1 Single Image Recognition If `faiss` cannot be importted, try reinstall it, especially for windows users.
Run the following command to identify and retrieve the image `./recognition_demo_data_v1.1/test_product/daoxiangcunjinzhubing_6.jpg` for recognition and retrieval <a name="single image recognition"></a>
```shell ##### 2.2.2.1 single image recognition
# use the following command to predict using GPU.
python3.7 python/predict_system.py -c configs/inference_product.yaml
# use the following command to predict using CPU
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.use_gpu=False
```
Run the following script to recognize the image `./drink_dataset_v2.0/test_images/100.jpeg`
The image to be retrieved is shown below. The images to be retrieved are as follows
![](../../images/recognition/product_demo/query/daoxiangcunjinzhubing_6.jpg) ![](../../images/recognition/drink_data_demo/test_images/100.jpeg)
```shell
# Use the script below to make predictions using the GPU
python3.7 python/predict_system.py -c configs/inference_general.yaml
The final output is shown below. # Use the following script to make predictions using the CPU
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False
```
[{'bbox': [287, 129, 497, 326], 'rec_docs': 'Daoxaingcun Golden Piggie Cake', 'rec_scores': 0.8309420347213745}, {'bbox': [99, 242, 313, 426], 'rec_docs': 'Daoxaingcun Golden Piggie Cake', 'rec_scores': 0.7245651483535767}]
``` ```
The final output is as follows.
where bbox indicates the location of the detected object, rec_docs indicates the labels corresponding to the label in the index dabase that are most similar to the detected object, and rec_scores indicates the corresponding confidence. ```log
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs' : '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
```
Where `bbox` represents the location of the detected object, `rec_docs` represents the most similar category to the detection box in the index database, and `rec_scores` represents the corresponding similarity.
The detection result is also saved in the folder `output`, for this image, the visualization result is as follows. The visualization results of the recognition are saved in the `output` folder by default. For this image, the visualization of the recognition results is shown below.
![](../../images/recognition/product_demo/result/daoxiangcunjinzhubing_6_en.jpg) ![](../../images/recognition/drink_data_demo/output/100.jpeg)
<a name="Folder-based batch recognition"></a>
<a name="2.2.2"></a> ##### 2.2.2.2 Folder-based batch recognition
#### 2.2.2 Folder-based Batch Recognition
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter. If you want to use multi images in the folder for prediction, you can modify the `Global.infer_imgs` field in the configuration file, or you can modify the corresponding configuration through the `-o` parameter below.
```shell ```shell
# using the following command to predict using GPU, you can append `-o Global.use_gpu=False` to predict using CPU. # Use the following script to use GPU for prediction, if you want to use CPU prediction, you can add -o Global.use_gpu=False after the script
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/" python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/"
``` ```
The recognition results of all images in the folder will be output in the terminal, as shown below.
The results on the screen are shown as following. ```log
```
... ...
[{'bbox': [37, 29, 123, 89], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.6163763999938965}, {'bbox': [153, 96, 235, 175], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5279821157455444}] [{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}]
[{'bbox': [735, 562, 1133, 851], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5588355660438538}] Inference: 120.39852142333984 ms per batch image
[{'bbox': [124, 50, 230, 129], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.6980369687080383}] [{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}]
[{'bbox': [0, 0, 275, 183], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5818190574645996}] Inference: 32.045602798461914 ms per batch image
[{'bbox': [400, 1179, 905, 1537], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.9814301133155823}, {'bbox': [295, 713, 820, 1046], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.9496176242828369}, {'bbox': [153, 236, 694, 614], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.8395382761955261}] [{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}]
[{'bbox': [544, 4, 1482, 932], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5143815279006958}] Inference: 113.41428756713867 ms per batch image
[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}]
Inference: 122.04337120056152 ms per batch image
[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}]
Inference: 37.95266151428223 ms per batch image
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
... ...
``` ```
All the visualization results are also saved in folder `output`. Visualizations of recognition results for all images are also saved in the `output` folder.
Furthermore, you can change the path of the recognition inference model by modifying the `Global.rec_inference_model_dir` field, and change the path of the index database by modifying the `IndexProcess.index_dir` field.
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_dir` field. <a name="Image of Unknown categories recognition experience"></a>
### 2.3 Image of Unknown categories recognition experience
<a name="3"></a> Now we try to recognize the unseen image `./drink_dataset_v2.0/test_images/mosilian.jpeg`
## 3. Recognize Images of Unknown Category
To recognize the image `./recognition_demo_data_v1.1/test_product/anmuxi.jpg`, run the command as follows: The images to be retrieved are as follows
![](../../images/recognition/drink_data_demo/test_images/mosilian.jpeg)
Execute the following identification script
```shell ```shell
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/anmuxi.jpg" # Use the following script to use GPU for prediction, if you want to use CPU prediction, you can add -o Global.use_gpu=False after the script
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg"
``` ```
The image to be retrieved is shown below. It can be found that the output result is empty
Since the default index database does not contain the unknown category's information, the recognition result here is wrong. At this time, we can achieve the image recognition of unknown classes by building a new index database.
When the images in the index database cannot cover the scene we actually recognize, i.e. recognizing an image of an unknown category, we need to add a similar image(at least one) belong the unknown category to the index database. This process does not require re-training the model. Take `mosilian.jpeg` as an example, just follow the steps below to rebuild a new index database.
<a name="Prepare new data and labels"></a>
![](../../images/recognition/product_demo/query/anmuxi.jpg) #### 2.3.1 Prepare new data and labels
The output is empty. First, copy the image(s) belong to unknown category(except the query image) to the original image folder of the index database. Here we already put all the image data in the folder `drink_dataset_v2.0/gallery/`.
Since the index infomation is not included in the corresponding index databse, the recognition result is empty or not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database. Then we need to edit the text file that records the image path and label information. Here we already put the updated label information file in the `drink_dataset_v2.0/gallery/drink_label_all.txt` file. Comparing with the original `drink_dataset_v2.0/gallery/drink_label.txt` label file, it can be found that the index images of the bright and ternary series of milk have been added.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining. In each line of text, the first field represents the relative path of the image, and the second field represents the label information corresponding to the image, separated by the `\t` key (Note: some editors will automatically convert `tab` is `space`, in which case it will cause a file parsing error).
<a name="3.1"></a> <a name="Create a new index database"></a>
### 3.1 Prepare for the new images and labels
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows. #### 2.3.2 Create a new index database
Build a new index database `index_all` with the following scripts.
```shell ```shell
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./recognition_demo_data_/gallery_product/gallery/ python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v2.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all"
``` ```
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one. The final constructed new index database is saved in the folder `./drink_dataset_v2.0/index_all`. For specific instructions on yaml `yaml`, please refer to [Vector Search Documentation](../image_recognition_pipeline/vector_search.md).
<a name="Image recognition based on the new index database"></a>
#### 2.3.3 Image recognition based on the new index database
To re-recognize the `mosilian.jpeg` image using the new index database, run the following scripts.
```shell ```shell
# copy the file # run the following script predict with GPU, if you want to use CPU, you can add -o Global.use_gpu=False after the script
cp recognition_demo_data_v1.1/gallery_product/data_file.txt recognition_demo_data_v1.1/gallery_product/data_file_update.txt python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all"
``` ```
Then add some new lines into the new label file, which is shown as follows. The output is as follows.
``` ```log
gallery/anmuxi/001.jpg Anmuxi Ambrosial Yogurt [{'bbox': [290, 297, 564, 919], 'rec_docs': 'Bright_Mosleyan', 'rec_scores': 0.59137374}]
gallery/anmuxi/002.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/003.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/004.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/005.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/006.jpg Anmuxi Ambrosial Yogurt
``` ```
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `tab` here. The final recognition result is `光明_莫斯利安`, we can see the recognition result is correct now , and the visualization of the recognition result is shown below.
![](../../images/recognition/drink_data_demo/output/mosilian.jpeg)
<a name="3.2"></a>
### 3.2 Build a new Index Base Library
Use the following command to build the index to accelerate the retrieval process after recognition. <a name="5"></a>
```shell ### 2.4 List of server recognition models
python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./recognition_demo_data_v1.1/gallery_product/data_file_update.txt" -o IndexProcess.index_dir="./recognition_demo_data_v1.1/gallery_product/index_update"
``` At present, we recommend to use model in [Lightweight General Object Detection Model and Lightweight General Recognition Model](#22-image-recognition-experience) to get better test results. However, if you want to experience the general recognition model, general object detection model and other recognition model for server, the test data download path, and the corresponding configuration file path are as follows.
Finally, the new index information is stored in the folder`./recognition_demo_data_v1.1/gallery_product/index_update`. Use the new index database for the above index. | Model Introduction | Recommended Scenarios | Inference Model | Prediction Profile |
| --------------------------------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| General Body Detection Model | General Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - |
| Logo Recognition Model | Logo Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo. yaml](../../../deploy/configs/inference_logo.yaml) |
| Anime Character Recognition Model | Anime Character Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [ inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) |
| Vehicle Subdivision Model | Vehicle Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle .yaml](../../../deploy/configs/inference_vehicle.yaml) |
| Product Recognition Model | Product Scene | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product. yaml](../../../deploy/configs/inference_product.yaml) |
| Vehicle ReID Model | Vehicle ReID Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | [inference_vehicle .yaml](../../../deploy/configs/inference_vehicle.yaml) |
The above models can be downloaded to the `deploy/models` folder by the following script for use in recognition tasks
```shell
cd ./deploy
mkdir -p models
<a name="3.3"></a> cd ./models
### 3.3 Recognize the Unknown Category Images # Download the generic object detection model for server and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# Download the generic recognition model and unzip it
wget {recognize model download link path} && tar -xf {name of compressed package}
```
To recognize the image `./recognition_demo_data_v1.1/test_product/anmuxi.jpg`, run the command as follows. Then use the following scripts to download the test data for other recognition scenario:
```shell ```shell
# using the following command to predict using GPU, you can append `-o Global.use_gpu=False` to predict using CPU. # Go back to the deploy directory
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/anmuxi.jpg" -o IndexProcess.index_dir="./recognition_demo_data_v1.1/gallery_product/index_update" cd..
# Download test data and unzip
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar && tar -xf recognition_demo_data_en_v1.1.tar
``` ```
The output is as follows: After decompression, the `recognition_demo_data_v1.1` folder should have the following file structure:
``` ```log
[{'bbox': [243, 80, 523, 522], 'rec_docs': 'Anmuxi Ambrosial Yogurt', 'rec_scores': 0.5570770502090454}] ├── recognition_demo_data_v1.1
│ ├── gallery_cartoon
│ ├── gallery_logo
│ ├── gallery_product
│ ├── gallery_vehicle
│ ├── test_cartoon
│ ├── test_logo
│ ├── test_product
│ └── test_vehicle
├── ...
``` ```
The final recognition result is `Anmuxi Ambrosial Yogurt`, which is corrrect, the visualization result is as follows. After downloading the model and test data according to the above steps, you can re-build the index database and test the relevant recognition model.
![](../../images/recognition/product_demo/result/anmuxi_en.jpg) * For more introduction to object detection, please refer to: [Object Detection Tutorial Document](../image_recognition_pipeline/mainbody_detection.md); for the introduction of feature extraction, please refer to: [Feature Extraction Tutorial Document](../image_recognition_pipeline/feature_extraction.md); for the introduction to vector search, please refer to: [vector search tutorial document](../image_recognition_pipeline/vector_search.md).
</div>
docs/images/structure.jpg

98.7 KB | W: | H:

docs/images/structure.jpg

1.8 MB | W: | H:

docs/images/structure.jpg
docs/images/structure.jpg
docs/images/structure.jpg
docs/images/structure.jpg
  • 2-up
  • Swipe
  • Onion skin
## PP-ShiTu V2图像识别系统
## 目录
- [1. PP-ShiTu V2模型和应用场景介绍](#1-pp-shituv2模型和应用场景介绍)
- [2. 模型快速体验](#2-模型快速体验)
- [2.1 PP-ShiTu android demo 快速体验](#21-pp-shitu-android-demo-快速体验)
- [2.2 命令行代码快速体验](#22-命令行代码快速体验)
- [3 模块介绍与训练](#3-模块介绍与训练)
- [3.1 主体检测](#31-主体检测)
- [3.2 特征提取](#32-特征提取)
- [3.3 向量检索](#33-向量检索)
- [4. 推理部署](#4-推理部署)
- [4.1 推理模型准备](#41-推理模型准备)
- [4.1.1 基于训练得到的权重导出 inference 模型](#411-基于训练得到的权重导出-inference-模型)
- [4.1.2 直接下载 inference 模型](#412-直接下载-inference-模型)
- [4.2 测试数据准备](#42-测试数据准备)
- [4.3 基于 Python 预测引擎推理](#43-基于-python-预测引擎推理)
- [4.3.1 预测单张图像](#431-预测单张图像)
- [4.3.2 基于文件夹的批量预测](#432-基于文件夹的批量预测)
- [4.4 基于 C++ 预测引擎推理](#44-基于-c-预测引擎推理)
- [4.5 服务化部署](#45-服务化部署)
- [4.6 端侧部署](#46-端侧部署)
- [4.7 Paddle2ONNX 模型转换与预测](#47-paddle2onnx-模型转换与预测)
- [参考文献](#参考文献)
## 1. PP-ShiTuV2模型和应用场景介绍
PP-ShiTuV2 是基于 PP-ShiTuV1 改进的一个实用轻量级通用图像识别系统,由主体检测、特征提取、向量检索三个模块构成,相比 PP-ShiTuV1 具有更高的识别精度、更强的泛化能力以及相近的推理速度<sup>*</sup>。主要针对训练数据集、特征提取两个部分进行优化,使用了更优的骨干网络、损失函数与训练策略,使得 PP-ShiTuV2 在多个实际应用场景上的检索性能有显著提升。
**本文档提供了用户使用 PaddleClas 的 PP-ShiTuV2 图像识别方案进行快速构建轻量级、高精度、可落地的图像识别pipeline。该pipeline可以广泛应用于商场商品识别场景、安防人脸或行人识别场景、海量图像检索过滤等场景中。**
<div align="center">
<img src="../../images/structure.jpg" />
</div>
下表列出了 PP-ShiTuV2 用不同的模型结构与训练策略所得到的相关指标,
| 模型 | 存储(主体检测+特征提取) | product |
| :--------- | :---------------------- | :------------------ |
| | | recall@1 |
| PP-ShiTuV1 | 64(30+34)MB | 66.8% |
| PP-ShiTuV2 | 49(30+19) | 73.8% |
**注:**
- recall及mAP指标的介绍可以参考 [常用指标](../algorithm_introduction/reid.md#22-常用指标)
- 延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到,开启 MKLDNN 加速策略,线程数为10。
## 2. 模型快速体验
### 2.1 PP-ShiTu android demo 快速体验
可以通过扫描二维码或者 [点击链接](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk) 下载并安装APP
<div align=center><img src="../../images/quick_start/android_demo/PPShiTu_qrcode.png" width="300"/></div>
然后将以下体验图片保存到手机上:
<div align=center><img src="../../images/recognition/drink_data_demo/test_images/nongfu_spring.jpeg" width=30% height=30% /></div>
打开安装好的APP,点击下方“**本地识别**”按钮,选择上面这张保存的图片,再点击确定,就能得到如下识别结果:
<div align=center><img src="../../images/quick_start/android_demo/android_nongfu_spring.JPG" width=30% height=30%/></div>
更详细的说明参考[PP-ShiTu android demo功能说明](https://github.com/weisy11/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_recognition.md)
### 2.2 命令行代码快速体验
- 首先按照以下命令,安装paddlepaddle和faiss
```shell
# 如果您的机器安装的是 CUDA9 或 CUDA10,请运行以下命令安装
python3.7 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
# 如果您的机器是CPU,请运行以下命令安装
python3.7 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
# 安装 faiss 库
python3.7 -m pip install faiss-cpu==1.7.1post2
```
- 然后按照以下命令,安装paddleclas whl包
```shell
# 进入到PaddleClas根目录下
cd PaddleClas
# 安装paddleclas
python3.7 setup.py install
```
- 然后执行以下命令下载并解压好demo数据,最后执行一行命令体验图像识别
```shell
# 下载并解压demo数据
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar
# 执行识别命令
paddleclas \
--model_name=PP-ShiTuV2 \
--infer_imgs=./drink_dataset_v2.0/test_images/100.jpeg \
--index_dir=./drink_dataset_v2.0/index/ \
--data_file=./drink_dataset_v2.0/gallery/drink_label.txt
```
## 3 模块介绍与训练
### 3.1 主体检测
主体检测是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来进行识别。主体检测是识别任务的前序步骤,输入图像经过主体检测后再进行识别,可以过滤复杂背景,有效提升识别精度。
考虑到检测速度、模型大小、检测精度等因素,最终选择 PaddleDetection 自研的轻量级模型 `PicoDet-LCNet_x2_5` 作为 PP-ShiTuV2 的主体检测模型
主体检测模型的数据集、训练、评估、推理等详细信息可以参考文档:[picodet_lcnet_x2_5_640_mainbody](../image_recognition_pipeline/mainbody_detection.md)
### 3.2 特征提取
特征提取是图像识别中的关键一环,它的作用是将输入的图片转化为固定维度的特征向量,用于后续的 [向量检索](./vector_search.md) 。考虑到特征提取模型的速度、模型大小、特征提取性能等因素,最终选择 PaddleClas 自研的 [`PPLCNetV2_base`](../models/PP-LCNetV2.md) 作为特征提取网络。相比 PP-ShiTuV1 所使用的 `PPLCNet_x2_5``PPLCNetV2_base` 基本保持了较高的分类精度,并减少了40%的推理时间<sup>*</sup>
**注:** <sup>*</sup>推理环境基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。
在实验过程中我们也发现可以对 `PPLCNetV2_base` 进行适当的改进,在保持速度基本不变的情况下,让其在识别任务中得到更高的性能,包括:去掉 `PPLCNetV2_base` 末尾的 `ReLU``FC`、将最后一个 stage(RepDepthwiseSeparable) 的 stride 改为1。
特征提取模型的数据集、训练、评估、推理等详细信息可以参考文档:[PPLCNetV2_base_ShiTu](../image_recognition_pipeline/feature_extraction.md)
### 3.3 向量检索
向量检索技术在图像识别、图像检索中应用比较广泛。其主要目标是对于给定的查询向量,在已经建立好的向量库中进行特征向量的相似度或距离计算,返回候选向量的相似度排序结果。
在 PP-ShiTuV2 识别系统中,我们使用了 [Faiss](https://github.com/facebookresearch/faiss) 向量检索开源库对此部分进行支持,其具有适配性好、安装方便、算法丰富、同时支持CPU与GPU的优点。
PP-ShiTuV2 系统中关于 Faiss 向量检索库的安装及使用可以参考文档:[vector search](../image_recognition_pipeline/vector_search.md)
## 4. 推理部署
### 4.1 推理模型准备
Paddle Inference 是飞桨的原生推理库, 作用于服务器端和云端,提供高性能的推理能力。相比于直接基于预训练模型进行预测,Paddle Inference可使用 MKLDNN、CUDNN、TensorRT 进行预测加速,从而实现更优的推理性能。更多关于 Paddle Inference 推理引擎的介绍,可以参考 [Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)
当使用 Paddle Inference 推理时,加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法,如果希望得到和文档相同的结果,请选择 [直接下载 inference 模型](#412-直接下载-inference-模型) 的方式。
#### 4.1.1 基于训练得到的权重导出 inference 模型
- 主体检测模型权重导出请参考文档 [主体检测推理模型准备](../image_recognition_pipeline/mainbody_detection.md#41-推理模型准备),或者参照 [4.1.2](#412-直接下载-inference-模型) 直接下载解压即可。
- 特征提取模型权重导出可以参考以下命令:
```shell
python3.7 tools/export_model.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams" \
-o Global.save_inference_dir=deploy/models/GeneralRecognitionV2_PPLCNetV2_base`
```
执行完该脚本后会在 `deploy/models/` 下生成 `GeneralRecognitionV2_PPLCNetV2_base` 文件夹,具有如下文件结构:
```log
deploy/models/
├── GeneralRecognitionV2_PPLCNetV2_base
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
#### 4.1.2 直接下载 inference 模型
[4.1.1 小节](#411-基于训练得到的权重导出-inference-模型) 提供了导出 inference 模型的方法,此处提供我们导出好的 inference 模型,可以按以下命令,下载模型到指定位置解压进行体验。
```shell
cd deploy/models
# 下载主体检测inference模型并解压
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
# 下载特征提取inference模型并解压
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.
```
### 4.2 测试数据准备
准备好主体检测、特征提取模型之后,还需要准备作为输入的测试数据,可以执行以下命令下载并解压测试数据。
```shell
# 返回deploy
cd ../
# 下载测试数据drink_dataset_v2.0,并解压
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar
```
### 4.3 基于 Python 预测引擎推理
#### 4.3.1 预测单张图像
然后执行以下命令对单张图像 `./drink_dataset_v2.0/test_images/100.jpeg` 进行识别。
```shell
# 执行下面的命令使用 GPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg"
# 执行下面的命令使用 CPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg" -o Global.use_gpu=False
```
最终输出结果如下。
```log
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
```
#### 4.3.2 基于文件夹的批量预测
如果希望预测文件夹内的图像,可以直接修改配置文件中的 Global.infer_imgs 字段,也可以通过下面的 -o 参数修改对应的配置。
```shell
# 使用下面的命令使用 GPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images"
# 使用下面的命令使用 CPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images" -o Global.use_gpu=False
```
终端中会输出该文件夹内所有图像的分类结果,如下所示。
```log
...
[{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}]
Inference: 120.39852142333984 ms per batch image
[{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}]
Inference: 32.045602798461914 ms per batch image
[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}]
Inference: 113.41428756713867 ms per batch image
[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}]
Inference: 122.04337120056152 ms per batch image
[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}]
Inference: 37.95266151428223 ms per batch image
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
...
```
其中 `bbox` 表示检测出的主体所在位置,`rec_docs` 表示索引库中与检测框最为相似的类别,`rec_scores` 表示对应的相似度。
### 4.4 基于 C++ 预测引擎推理
PaddleClas 提供了基于 C++ 预测引擎推理的示例,您可以参考 [服务器端 C++ 预测](../../../deploy/cpp_shitu/readme.md) 来完成相应的推理部署。如果您使用的是 Windows 平台,可以参考 [基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md) 完成相应的预测库编译和模型预测工作。
### 4.5 服务化部署
Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍,可以参考 [Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)
PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例,您可以参考 [模型服务化部署](../inference_deployment/recognition_serving_deploy.md) 来完成相应的部署工作。
### 4.6 端侧部署
Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架,定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍,可以参考 [Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)
### 4.7 Paddle2ONNX 模型转换与预测
Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署,包括TensorRT/OpenVINO/MNN/TNN/NCNN,以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍,可以参考 [Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)
PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例,您可以参考 [Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md) 来完成相应的部署工作。
## 参考文献
1. Schall, Konstantin, et al. "GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval." International Conference on Multimedia Modeling. Springer, Cham, 2022.
2. Luo, Hao, et al. "A strong baseline and batch normalization neck for deep person re-identification." IEEE Transactions on Multimedia 22.10 (2019): 2597-2609.
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
- [1.2.5 DKD](#1.2.5) - [1.2.5 DKD](#1.2.5)
- [1.2.6 DIST](#1.2.6) - [1.2.6 DIST](#1.2.6)
- [1.2.7 MGD](#1.2.7) - [1.2.7 MGD](#1.2.7)
- [1.2.8 WSL](#1.2.8)
- [2. 使用方法](#2) - [2. 使用方法](#2)
- [2.1 环境配置](#2.1) - [2.1 环境配置](#2.1)
- [2.2 数据准备](#2.2) - [2.2 数据准备](#2.2)
...@@ -399,7 +400,7 @@ DKD将蒸馏中常用的 KD Loss 进行了解耦成为Target Class Knowledge Dis ...@@ -399,7 +400,7 @@ DKD将蒸馏中常用的 KD Loss 进行了解耦成为Target Class Knowledge Dis
| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 | | 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - | | baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
| AFD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - | | DKD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - |
##### 1.2.5.2 DKD 配置 ##### 1.2.5.2 DKD 配置
...@@ -533,7 +534,7 @@ Loss: ...@@ -533,7 +534,7 @@ Loss:
| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 | | 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - | | baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
| MGD | ResNet18 | [resnet34_distill_resnet18_dist.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - | | MGD | ResNet18 | [resnet34_distill_resnet18_mgd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - |
##### 1.2.7.2 MGD 配置 ##### 1.2.7.2 MGD 配置
...@@ -583,6 +584,73 @@ Loss: ...@@ -583,6 +584,73 @@ Loss:
weight: 1.0 weight: 1.0
``` ```
<a name='1.2.8'></a>
#### 1.2.8 WSL
##### 1.2.8.1 WSL 算法介绍
论文信息:
> [Rethinking Soft Labels For Knowledge Distillation: A Bias-variance Tradeoff Perspective](https://arxiv.org/abs/2102.0650)
>
> Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang
>
> ICLR, 2021
WSL (Weighted Soft Labels) 损失函数根据教师模型与学生模型关于真值标签的 CE Loss 比值,对每个样本的 KD Loss 分别赋予权重。若学生模型相对教师模型在某个样本上预测结果更好,则对该样本赋予较小的权重。该方法简单、有效,使各个样本的权重可自适应调节,提升了蒸馏精度。
在ImageNet1k公开数据集上,效果如下所示。
| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 |
| --- | --- | --- | --- | --- |
| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - |
| WSL | ResNet18 | [resnet34_distill_resnet18_wsl.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml) | 72.23%(**+1.43%**) | - |
##### 1.2.8.2 WSL 配置
WSL 配置如下所示。在模型构建Arch字段中,需要同时定义学生模型与教师模型,教师模型固定参数,且需要加载预训练模型。在损失函数Loss字段中,需要定义`DistillationGTCELoss`(学生与真值标签之间的CE loss)以及`DistillationWSLLoss`(学生与教师之间的WSL loss),作为训练的损失函数。
```yaml
# model architecture
Arch:
name: "DistillationModel"
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
models:
- Teacher:
name: ResNet34
pretrained: True
- Student:
name: ResNet18
pretrained: False
infer_model_name: "Student"
# loss function config for traing/eval process
Loss:
Train:
- DistillationGTCELoss:
weight: 1.0
model_names: ["Student"]
- DistillationWSLLoss:
weight: 2.5
model_name_pairs: [["Student", "Teacher"]]
temperature: 2
Eval:
- CELoss:
weight: 1.0
```
<a name="2"></a> <a name="2"></a>
## 2. 模型训练、评估和预测 ## 2. 模型训练、评估和预测
......
...@@ -69,14 +69,14 @@ MobileNetV1 ...@@ -69,14 +69,14 @@ MobileNetV1
│   . │   .
│   . │   .
│   └── blocks12 (DepthwiseSeparable).............("blocks[12]") │   └── blocks12 (DepthwiseSeparable).............("blocks[12]")
│      ├── depthwise_conv (ConvBNLayer)..........("blocks[0].depthwise_conv") │      ├── depthwise_conv (ConvBNLayer)..........("blocks[12].depthwise_conv")
│      │   ├── conv (nn.Conv2D)..................("blocks[0].depthwise_conv.conv") │      │   ├── conv (nn.Conv2D)..................("blocks[12].depthwise_conv.conv")
│      │   ├── bn (nn.BatchNorm).................("blocks[0].depthwise_conv.bn") │      │   ├── bn (nn.BatchNorm).................("blocks[12].depthwise_conv.bn")
│      │   └── relu (nn.ReLU)....................("blocks[0].depthwise_conv.relu") │      │   └── relu (nn.ReLU)....................("blocks[12].depthwise_conv.relu")
│      └── pointwise_conv (ConvBNLayer)..........("blocks[0].pointwise_conv") │      └── pointwise_conv (ConvBNLayer)..........("blocks[12].pointwise_conv")
│      ├── conv (nn.Conv2D)..................("blocks[0].pointwise_conv.conv") │      ├── conv (nn.Conv2D)..................("blocks[12].pointwise_conv.conv")
│      ├── bn (nn.BatchNorm).................("blocks[0].pointwise_conv.bn") │      ├── bn (nn.BatchNorm).................("blocks[12].pointwise_conv.bn")
│      └── relu (nn.ReLU)....................("blocks[0].pointwise_conv.relu") │      └── relu (nn.ReLU)....................("blocks[12].pointwise_conv.relu")
├── avg_pool (nn.AdaptiveAvgPool2D)...............("avg_pool") ├── avg_pool (nn.AdaptiveAvgPool2D)...............("avg_pool")
...@@ -94,7 +94,7 @@ MobileNetV1 ...@@ -94,7 +94,7 @@ MobileNetV1
## 3. 方法说明 ## 3. 方法说明
PaddleClas 提供的 backbone 网络均基于图像分类数据集训练得到,因此网络的尾部带有用于分类的全连接层,而在特定任务场景下,需要去掉分类的全连接层。在部分下游任务中,例如目标检测场景,需要获取到网络中间层的输出结果,也可能需要对网络的中间层进行修改,因此 `TheseusLayer` 提供了 3 个接口函数用于实现不同的修改功能。 PaddleClas 提供的 backbone 网络均基于图像分类数据集训练得到,因此网络的尾部带有用于分类的全连接层,而在特定任务场景下,需要去掉分类的全连接层。在部分下游任务中,例如目标检测场景,需要获取到网络中间层的输出结果,也可能需要对网络的中间层进行修改,因此 `TheseusLayer` 提供了 3 个接口函数用于实现不同的修改功能。下面基于 PaddleClas whl 进行说明,首先需要安装 PaddleClas:`pip install paddleclas`
<a name="3.1"></a> <a name="3.1"></a>
...@@ -122,7 +122,6 @@ def stop_after(self, stop_layer_name: str) -> bool: ...@@ -122,7 +122,6 @@ def stop_after(self, stop_layer_name: str) -> bool:
`MobileNetV1` 网络为例,参数 `stop_layer_name``"blocks[0].depthwise_conv.conv"`,具体效果可以参考下方代码案例进行尝试。 `MobileNetV1` 网络为例,参数 `stop_layer_name``"blocks[0].depthwise_conv.conv"`,具体效果可以参考下方代码案例进行尝试。
```python ```python
# cd <root-path-to-PaddleClas> or pip install paddleclas to import paddleclas
import paddleclas import paddleclas
net = paddleclas.MobileNetV1() net = paddleclas.MobileNetV1()
...@@ -168,7 +167,6 @@ def update_res( ...@@ -168,7 +167,6 @@ def update_res(
import numpy as np import numpy as np
import paddle import paddle
# cd <root-path-to-PaddleClas> or pip install paddleclas to import paddleclas
import paddleclas import paddleclas
np_input = np.zeros((1, 3, 224, 224)) np_input = np.zeros((1, 3, 224, 224))
...@@ -186,8 +184,8 @@ print("The result returned by update_res(): ", res) ...@@ -186,8 +184,8 @@ print("The result returned by update_res(): ", res)
output = net(pd_input) output = net(pd_input)
print("The output's keys of processed net: ", output.keys()) print("The output's keys of processed net: ", output.keys())
# The output's keys of net: dict_keys(['output', 'blocks[0]', 'blocks[2]', 'blocks[4]', 'blocks[10]']) # The output's keys of net: dict_keys(['logits', 'blocks[0]', 'blocks[2]', 'blocks[4]', 'blocks[10]'])
# 网络前向输出 output 为 dict 类型对象,其中,output["output"] 为网络最终输出,output["blocks[0]"] 等为网络中间层输出结果 # 网络前向输出 output 为 dict 类型对象,其中,output["logits"] 为网络最终输出,output["blocks[0]"] 等为网络中间层输出结果
``` ```
除了通过调用方法 `update_res()` 的方式之外,也同样可以在实例化网络对象时,通过指定参数 `return_patterns` 实现相同效果: 除了通过调用方法 `update_res()` 的方式之外,也同样可以在实例化网络对象时,通过指定参数 `return_patterns` 实现相同效果:
...@@ -241,7 +239,6 @@ def upgrade_sublayer(self, ...@@ -241,7 +239,6 @@ def upgrade_sublayer(self,
```python ```python
from paddle import nn from paddle import nn
# cd <root-path-to-PaddleClas> or pip install paddleclas to import paddleclas
import paddleclas import paddleclas
# 该函数必须有两个形参 # 该函数必须有两个形参
......
...@@ -354,24 +354,24 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 ...@@ -354,24 +354,24 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------| |------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------|
| ViT_small_<br/>patch16_224 | 0.7769 | 0.9342 | 3.71 | 9.05 | 16.72 | 9.41 | 48.60 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_small_patch16_224_infer.tar) | | ViT_small_<br/>patch16_224 | 0.7553 | 0.9211 | 3.71 | 9.05 | 16.72 | 9.41 | 48.60 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_small_patch16_224_infer.tar) |
| ViT_base_<br/>patch16_224 | 0.8195 | 0.9617 | 6.12 | 14.84 | 28.51 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_224_infer.tar) | | ViT_base_<br/>patch16_224 | 0.8187 | 0.9618 | 6.12 | 14.84 | 28.51 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_224_infer.tar) |
| ViT_base_<br/>patch16_384 | 0.8414 | 0.9717 | 14.15 | 48.38 | 95.06 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_384_infer.tar) | | ViT_base_<br/>patch16_384 | 0.8414 | 0.9717 | 14.15 | 48.38 | 95.06 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_384_infer.tar) |
| ViT_base_<br/>patch32_384 | 0.8176 | 0.9613 | 4.94 | 13.43 | 24.08 | 12.66 | 88.19 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch32_384_infer.tar) | | ViT_base_<br/>patch32_384 | 0.8176 | 0.9613 | 4.94 | 13.43 | 24.08 | 12.66 | 88.19 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch32_384_infer.tar) |
| ViT_large_<br/>patch16_224 | 0.8323 | 0.9650 | 15.53 | 49.50 | 94.09 | 59.65 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_224_infer.tar) | | ViT_large_<br/>patch16_224 | 0.8303 | 0.9655 | 15.53 | 49.50 | 94.09 | 59.65 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_224_infer.tar) |
|ViT_large_<br/>patch16_384| 0.8513 | 0.9736 | 39.51 | 152.46 | 304.06 | 174.70 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_384_infer.tar) | |ViT_large_<br/>patch16_384| 0.8513 | 0.9736 | 39.51 | 152.46 | 304.06 | 174.70 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_384_infer.tar) |
|ViT_large_<br/>patch32_384| 0.8153 | 0.9608 | 11.44 | 36.09 | 70.63 | 44.24 | 306.48 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch32_384_infer.tar) | |ViT_large_<br/>patch32_384| 0.8153 | 0.9608 | 11.44 | 36.09 | 70.63 | 44.24 | 306.48 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch32_384_infer.tar) |
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------| |------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------|
| DeiT_tiny_<br>patch16_224 | 0.718 | 0.910 | 3.61 | 3.94 | 6.10 | 1.07 | 5.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_patch16_224_infer.tar) | | DeiT_tiny_<br>patch16_224 | 0.7208 | 0.9112 | 3.61 | 3.94 | 6.10 | 1.07 | 5.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_patch16_224_infer.tar) |
| DeiT_small_<br>patch16_224 | 0.796 | 0.949 | 3.61 | 6.24 | 10.49 | 4.24 | 21.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_patch16_224_infer.tar) | | DeiT_small_<br>patch16_224 | 0.7982 | 0.9495 | 3.61 | 6.24 | 10.49 | 4.24 | 21.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_patch16_224_infer.tar) |
| DeiT_base_<br>patch16_224 | 0.817 | 0.957 | 6.13 | 14.87 | 28.50 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_224_infer.tar) | | DeiT_base_<br>patch16_224 | 0.8180 | 0.9558 | 6.13 | 14.87 | 28.50 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_224_infer.tar) |
| DeiT_base_<br>patch16_384 | 0.830 | 0.962 | 14.12 | 48.80 | 97.60 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_384_infer.tar) | | DeiT_base_<br>patch16_384 | 0.8289 | 0.9624 | 14.12 | 48.80 | 97.60 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_384_infer.tar) |
| DeiT_tiny_<br>distilled_patch16_224 | 0.741 | 0.918 | 3.51 | 4.05 | 6.03 | 1.08 | 5.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_distilled_patch16_224_infer.tar) | | DeiT_tiny_<br>distilled_patch16_224 | 0.7449 | 0.9192 | 3.51 | 4.05 | 6.03 | 1.08 | 5.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_distilled_patch16_224_infer.tar) |
| DeiT_small_<br>distilled_patch16_224 | 0.809 | 0.953 | 3.70 | 6.20 | 10.53 | 4.26 | 22.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_distilled_patch16_224_infer.tar) | | DeiT_small_<br>distilled_patch16_224 | 0.8117 | 0.9538 | 3.70 | 6.20 | 10.53 | 4.26 | 22.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_distilled_patch16_224_infer.tar) |
| DeiT_base_<br>distilled_patch16_224 | 0.831 | 0.964 | 6.17 | 14.94 | 28.58 | 16.93 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_224_infer.tar) | | DeiT_base_<br>distilled_patch16_224 | 0.8330 | 0.9647 | 6.17 | 14.94 | 28.58 | 16.93 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_224_infer.tar) |
| DeiT_base_<br>distilled_patch16_384 | 0.851 | 0.973 | 14.12 | 48.76 | 97.09 | 49.43 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_384_infer.tar) | | DeiT_base_<br>distilled_patch16_384 | 0.8520 | 0.9720 | 14.12 | 48.76 | 97.09 | 49.43 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_384_infer.tar) |
<a name="RepVGG"></a> <a name="RepVGG"></a>
...@@ -426,14 +426,14 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 ...@@ -426,14 +426,14 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 6.59 | 9.68 | 16.32 | 4.35 | 28.26 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar) | | SwinTransformer_tiny_patch4_window7_224 | 0.8110 | 0.9549 | 6.59 | 9.68 | 16.32 | 4.35 | 28.26 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar) |
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 12.54 | 17.07 | 28.08 | 8.51 | 49.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_small_patch4_window7_224_infer.tar) | | SwinTransformer_small_patch4_window7_224 | 0.8321 | 0.9622 | 12.54 | 17.07 | 28.08 | 8.51 | 49.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_small_patch4_window7_224_infer.tar) |
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 13.37 | 23.53 | 39.11 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) | | SwinTransformer_base_patch4_window7_224 | 0.8337 | 0.9643 | 13.37 | 23.53 | 39.11 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) |
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 19.52 | 64.56 | 123.30 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) | | SwinTransformer_base_patch4_window12_384 | 0.8417 | 0.9674 | 19.52 | 64.56 | 123.30 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) |
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | 13.53 | 23.46 | 39.13 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) | | SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8516 | 0.9748 | 13.53 | 23.46 | 39.13 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) |
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | 19.65 | 64.72 | 123.42 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) | | SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8634 | 0.9798 | 19.65 | 64.72 | 123.42 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) |
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | 15.74 | 38.57 | 71.49 | 34.02 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window7_224_22kto1k_infer.tar) | | SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8619 | 0.9788 | 15.74 | 38.57 | 71.49 | 34.02 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window7_224_22kto1k_infer.tar) |
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | 32.61 | 116.59 | 223.23 | 99.97 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window12_384_22kto1k_infer.tar) | | SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8706 | 0.9814 | 32.61 | 116.59 | 223.23 | 99.97 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window12_384_22kto1k_infer.tar) |
[1]:基于 ImageNet22k 数据集预训练,然后在 ImageNet1k 数据集迁移学习得到。 [1]:基于 ImageNet22k 数据集预训练,然后在 ImageNet1k 数据集迁移学习得到。
...@@ -446,7 +446,7 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 ...@@ -446,7 +446,7 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(M) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(M) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| LeViT_128S | 0.7598 | 0.9269 | | | | 281 | 7.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128S_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128S_infer.tar) | | LeViT_128S | 0.7598 | 0.9269 | | | | 281 | 7.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128S_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128S_infer.tar) |
| LeViT_128 | 0.7810 | 0.9371 | | | | 365 | 8.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128_infer.tar) | | LeViT_128 | 0.7810 | 0.9372 | | | | 365 | 8.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128_infer.tar) |
| LeViT_192 | 0.7934 | 0.9446 | | | | 597 | 10.61 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_192_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_192_infer.tar) | | LeViT_192 | 0.7934 | 0.9446 | | | | 597 | 10.61 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_192_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_192_infer.tar) |
| LeViT_256 | 0.8085 | 0.9497 | | | | 1049 | 18.45 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_256_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_256_infer.tar) | | LeViT_256 | 0.8085 | 0.9497 | | | | 1049 | 18.45 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_256_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_256_infer.tar) |
| LeViT_384 | 0.8191 | 0.9551 | | | | 2234 | 38.45 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_384_infer.tar) | | LeViT_384 | 0.8191 | 0.9551 | | | | 2234 | 38.45 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_384_infer.tar) |
...@@ -461,12 +461,12 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 ...@@ -461,12 +461,12 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| pcpvt_small | 0.8082 | 0.9552 | 7.32 | 10.51 | 15.27 |3.67 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_small_infer.tar) | | pcpvt_small | 0.8115 | 0.9567 | 7.32 | 10.51 | 15.27 |3.67 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_small_infer.tar) |
| pcpvt_base | 0.8242 | 0.9619 | 12.20 | 16.22 | 23.16 | 6.44 | 43.83 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_base_infer.tar) | | pcpvt_base | 0.8268 | 0.9627 | 12.20 | 16.22 | 23.16 | 6.44 | 43.83 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_base_infer.tar) |
| pcpvt_large | 0.8273 | 0.9650 | 16.47 | 22.90 | 32.73 | 9.50 | 60.99 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_large_infer.tar) | | pcpvt_large | 0.8306 | 0.9659 | 16.47 | 22.90 | 32.73 | 9.50 | 60.99 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_large_infer.tar) |
| alt_gvt_small | 0.8140 | 0.9546 | 6.94 | 9.01 | 12.27 |2.81 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_small_infer.tar) | | alt_gvt_small | 0.8177 | 0.9557 | 6.94 | 9.01 | 12.27 |2.81 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_small_infer.tar) |
| alt_gvt_base | 0.8294 | 0.9621 | 9.37 | 15.02 | 24.54 | 8.34 | 56.07 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_base_infer.tar) | | alt_gvt_base | 0.8315 | 0.9629 | 9.37 | 15.02 | 24.54 | 8.34 | 56.07 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_base_infer.tar) |
| alt_gvt_large | 0.8331 | 0.9642 | 11.76 | 22.08 | 35.12 | 14.81 | 99.27 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_large_infer.tar) | | alt_gvt_large | 0.8364 | 0.9651 | 11.76 | 22.08 | 35.12 | 14.81 | 99.27 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_large_infer.tar) |
**注**:与 Reference 的精度差异源于数据预处理不同。 **注**:与 Reference 的精度差异源于数据预处理不同。
...@@ -551,13 +551,13 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 ...@@ -551,13 +551,13 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| PVT_V2_B0 | 0.705 | 0.902 | - | - | - | 0.53 | 3.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B0_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B0_infer.tar) | | PVT_V2_B0 | 0.7052 | 0.9016 | - | - | - | 0.53 | 3.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B0_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B0_infer.tar) |
| PVT_V2_B1 | 0.787 | 0.945 | - | - | - | 2.0 | 14.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B1_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B1_infer.tar) | | PVT_V2_B1 | 0.7869 | 0.9450 | - | - | - | 2.0 | 14.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B1_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B1_infer.tar) |
| PVT_V2_B2 | 0.821 | 0.960 | - | - | - | 3.9 | 25.4 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_infer.tar) | | PVT_V2_B2 | 0.8206 | 0.9599 | - | - | - | 3.9 | 25.4 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_infer.tar) |
| PVT_V2_B2_Linear | 0.821 | 0.961 | - | - | - | 3.8 | 22.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_Linear_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_Linear_infer.tar) | | PVT_V2_B2_Linear | 0.8205 | 0.9605 | - | - | - | 3.8 | 22.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_Linear_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_Linear_infer.tar) |
| PVT_V2_B3 | 0.831 | 0.965 | - | - |- | 6.7 | 45.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B3_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B3_infer.tar) | | PVT_V2_B3 | 0.8310 | 0.9648 | - | - |- | 6.7 | 45.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B3_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B3_infer.tar) |
| PVT_V2_B4 | 0.836 | 0.967 | - | - | - | 9.8 | 62.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B4_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B4_infer.tar) | | PVT_V2_B4 | 0.8361 | 0.9666 | - | - | - | 9.8 | 62.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B4_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B4_infer.tar) |
| PVT_V2_B5 | 0.837 | 0.966 | - | - | - | 11.4 | 82.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B5_infer.tar) | | PVT_V2_B5 | 0.8374 | 0.9662 | - | - | - | 11.4 | 82.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B5_infer.tar) |
<a name="MobileViT"></a> <a name="MobileViT"></a>
......
# Deep Hashing算法介绍
----
## 目录
* [1. 简介](#1)
* [2. 算法介绍](#2)
* [2.1 DCH](#2.1)
* [2.2 DSHSD](#2.2)
* [2.3 LCDSH](#2.3)
* [3. 快速体验](#3)
* [4. 总结及建议](#4)
<a name='1'></a>
## 1. 简介
最近邻搜索是指在数据库中查找与查询数据距离最近的点,在计算机视觉、推荐系统、机器学习等领域中广泛使用。在PP-ShiTu中,输入图像经过主体检测模型去掉背景后,再经过特征提取模型提取特征,之后经过检索得到检索图像等类别。在这个过程中,一般来说,提取的特征是float32数据类型。当离线特征库中存储的feature比较多时,就占用较大的存储空间,同时检索过程也会变慢。如果利用哈希编码将特征由float32转成0或者1表示的二值特征,那么不仅降低存储空间,同时也能大大加快检索速度。
<a name='2'></a>
## 2. 算法介绍
目前PaddleClas中,主要复现了三种DeepHash的方法,分别是:[DCH](http://ise.thss.tsinghua.edu.cn/~mlong/doc/deep-cauchy-hashing-cvpr18.pdf)[DSHSD](https://ieeexplore.ieee.org/document/8648432/), [LCDSH](https://www.ijcai.org/Proceedings/2017/0499.pdf)。以下做简要介绍。
<a name='2.1'></a>
## 2.1 DCH
此方法基于柯西分布,提出一种成对的交叉熵损失函数,能够较好的得到紧密的hamming特征。在多个数据集上取得较好的结果。详见[论文](http://ise.thss.tsinghua.edu.cn/~mlong/doc/deep-cauchy-hashing-cvpr18.pdf)。方法示意图如下:
<div align="center">
<img src="../../images/deep_hash/DCH.png" width = "400" />
</div>
<a name='2.2'></a>
## 2.2 DSHSD
DSHSD主要创新点在于,在保证分布一致性的情况下消除差异。首先,作者利用平滑投影函数来放松离散约束,而不是使用任何量化正则化器,其中平滑量是可调整的。其次,在平滑投影和特征分布之间建立数学联系,以保持分布的一致性。进而提出了一种多语义信息融合方法,使hash码学习后能够保留更多的语义信息,从而加快训练收敛速度。其方法在在CIFAR-10、NUS-WIDE和ImageNet数据集上的大量实验表现良好。具体可查看[论文](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8648432)
<div align="center">
<img src="../../images/deep_hash/DSHSD.png" width = "400" />
</div>
<a name='2.3'></a>
## 2.3 LCDSH
LCDSH是一种局部约束深度监督哈希算法。该方案通过学习图像对之间的相似特征使得,哈希码保持了DCNN特征的分布,从而有利于准确的图像检索。具体可查看[论文](https://www.ijcai.org/Proceedings/2017/0499.pdf)
<div align="center">
<img src="../../images/deep_hash/LCDSH.png" width = "400" />
</div>
<a name='3'></a>
## 3. 快速体验
这个三个哈希算法的配置文件具体位置:
`DCH`: ppcls/configs/DeepHash/DCH.yaml
`DSHSD`: ppcls/configs/DeepHash/DSHSD.yaml
`LCDSH`: ppcls/configs/DeepHash/LCDSH.yaml
具体训练方法,请参考[分类模型训练文档](../models_training/classification.md)
<a name='4'></a>
## 4. 总结及建议
不同的DeepHash方法,具有不同特性。可以分别对不同的哈希方法进行尝试,选取最合适自己数据集的方法。
...@@ -344,7 +344,7 @@ PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模 ...@@ -344,7 +344,7 @@ PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模
#### 5.1 方法总结与对比 #### 5.1 方法总结与对比
上述算法能快速地迁移至多数的ReID模型中,能进一步提升ReID模型的性能。 上述算法能快速地迁移至多数的ReID模型中(参考 [PP-ShiTuV2](../PPShiTu/PPShiTuV2_introduction.md) ),能进一步提升ReID模型的性能,
#### 5.2 使用建议/FAQ #### 5.2 使用建议/FAQ
......
# 哈希编码
最近邻搜索是指在数据库中查找与查询数据距离最近的点,在计算机视觉、推荐系统、机器学习等领域中广泛使用。在`PP-ShiTu`中,输入图像经过主体检测模型去掉背景后,再经过特征提取模型提取特征,之后经过检索得到输入图像的类别。在这个过程中,一般来说,提取的特征是`float32`数据类型。当离线特征库中存储的`feature`比较多时,就占用较大的存储空间,同时检索过程也会变慢。如果利用`哈希编码`将特征由`float32`转成`0`或者`1`表示的二值特征,那么不仅降低存储空间,同时也能大大加快检索速度。
哈希编码,主要用在`PP-ShiTu`**特征提取模型**部分,将模型输出特征直接二值化。即训练特征提取模型时,将模型的输出映射到二值空间。
注意,由于使用二值特征表示图像特征,精度可能会下降,请根据实际情况,酌情使用。
## 目录
- [1. 特征模型二值特征训练](#1)
- [1.1 PP-ShiTu特征提取模型二值训练](#1.1)
- [1.2 其他特征模型二值训练](#1.2)
- [2. 检索算法配置](#2)
<a name="1"></a>
## 1. 特征模型二值特征训练
<a name="1.1"></a>
注意,此模块目前只支持`PP-ShiTuV1`,`PP-ShiTuV2`暂未适配。
### 1.1 PP-ShiTu特征提取模型二值训练
PP-ShiTu特征提取模型二值特征模型,配置文件位于`ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_binary.yaml`,相关训练方法如下。
```shell
# 单卡 GPU
python3.7 tools/train.py \
-c ./ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_binary.yaml \
-o Arch.Backbone.pretrained=True \
-o Global.device=gpu
# 多卡 GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_binary.yaml \
-o Arch.Backbone.pretrained=True \
-o Global.device=gpu
```
其中`数据准备``模型评估`等,请参考[此文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/models_training/recognition.md)
<a name="1.2"></a>
### 1.2 其他特征模型二值训练
其他二值特征训练模型的配置文件位于`ppcls/configs/DeepHash/`文件夹下,此文件夹下的相关配置文件主要是复现相关`deep hashing`相关算法。包括:`DCH, DSHSD, LCDSH`三种算法。这三种算法相关介绍,详见[Deep Hashing相关算法介绍](../algorithm_introduction/deep_hashing_introduction.md)
相关训练方法,请参考[分类模型训练文档](../models_training/classification.md)
<a name="2"></a>
## 2. 检索算法配置
在PP-ShiTu中使用二值特征,部署及离线推理配置请参考`deploy/configs/inference_general_binary.yaml`。配置文件中相关参数介绍请参考[向量检索文档](./vector_search.md).
其中需值得注意的是,二值检索相关配置应设置如下:
```yaml
IndexProcess:
index_method: "FLAT" # supported: HNSW32, IVF, Flat
delimiter: "\t"
dist_type: "hamming"
hamming_radius: 100
```
其中`hamming_radius`可以根据自己实际精度要求,适当调节。
简体中文|[English](../../en/image_recognition_pipeline/feature_extraction_en.md) 简体中文 | [English](../../en/image_recognition_pipeline/feature_extraction_en.md)
# 特征提取 # 特征提取
## 目录 ## 目录
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
- [3.2 Neck](#32-neck) - [3.2 Neck](#32-neck)
- [3.3 Head](#33-head) - [3.3 Head](#33-head)
- [3.4 Loss](#34-loss) - [3.4 Loss](#34-loss)
- [3.5 Data Augmentation](#35-data-augmentation)
- [4. 实验部分](#4-实验部分) - [4. 实验部分](#4-实验部分)
- [5. 自定义特征提取](#5-自定义特征提取) - [5. 自定义特征提取](#5-自定义特征提取)
- [5.1 数据准备](#51-数据准备) - [5.1 数据准备](#51-数据准备)
...@@ -35,56 +36,76 @@ ...@@ -35,56 +36,76 @@
![](../../images/feature_extraction_framework.png) ![](../../images/feature_extraction_framework.png)
图中各个模块的功能为: 图中各个模块的功能为:
- **Backbone**: 用于提取输入图像初步特征的骨干网络,一般由配置文件中的 [`Backbone`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L26-L29) 以及 [`BackboneStopLayer`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L30-L31) 字段共同指定。 - **Backbone**: 用于提取输入图像初步特征的骨干网络,一般由配置文件中的 [Backbone](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L33-L37) 以及 [BackboneStopLayer](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L38-L39) 字段共同指定。
- **Neck**: 用以特征增强及特征维度变换。可以是一个简单的 FC Layer,用来做特征维度变换;也可以是较复杂的 FPN 结构,用以做特征增强,一般由配置文件中的 [`Neck`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L32-L35)字段指定。 - **Neck**: 用以特征增强及特征维度变换。可以是一个简单的 FC Layer,用来做特征维度变换;也可以是较复杂的 FPN 结构,用以做特征增强,一般由配置文件中的 [Neck](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L40-L51) 字段指定。
- **Head**: 用来将 feature 转化为 logits,让模型在训练阶段能以分类任务的形式进行训练。除了常用的 FC Layer 外,还可以替换为 cosmargin, arcmargin, circlemargin 等模块,一般由配置文件中的 [`Head`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L36-L41)字段指定。 - **Head**: 用来将 `Neck` 的输出 feature 转化为 logits,让模型在训练阶段能以分类任务的形式进行训练。除了常用的 FC Layer 外,还可以替换为 [CosMargin](../../../ppcls/arch/gears/cosmargin.py), [ArcMargin](../../../ppcls/arch/gears/arcmargin.py), [CircleMargin](../../../ppcls/arch/gears/circlemargin.py) 等模块,一般由配置文件中的 [Head](`../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L52-L60) 字段指定。
- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式,可以方便地将 Classification Loss 和 Metric learning Loss 组合在一起,一般由配置文件中的 [`Loss`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L44-L50)字段指定。 - **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式,可以方便地将 Classification Loss 和 Metric learning Loss 组合在一起,一般由配置文件中的 [Loss](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-L77) 字段指定。
<a name="3"></a> <a name="3"></a>
## 3. 方法 ## 3. 方法
### 3.1 Backbone #### 3.1 Backbone
Backbone 部分采用了 [PP_LCNet_x2_5](../models/PP-LCNet.md),其针对Intel CPU端的性能优化探索了多个有效的结构设计方案,最终实现了在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型 Backbone 部分采用了 [PP-LCNetV2_base](../models/PP-LCNetV2.md),其在 `PPLCNet_V1` 的基础上,加入了包括Rep 策略、PW 卷积、Shortcut、激活函数改进、SE 模块改进等多个优化点,使得最终分类精度与 `PPLCNet_x2_5` 相近,且推理延时减少了40%<sup>*</sup>。在实验过程中我们对 `PPLCNetV2_base` 进行了适当的改进,在保持速度基本不变的情况下,让其在识别任务中得到更高的性能,包括:去掉 `PPLCNetV2_base` 末尾的 `ReLU``FC`、将最后一个 stage(RepDepthwiseSeparable) 的 stride 改为1
### 3.2 Neck
Neck 部分采用了 [FC Layer](../../../ppcls/arch/gears/fc.py),对 Backbone 抽取得到的特征进行降维,减少了特征存储的成本与计算量 **注:** <sup>*</sup>推理环境基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台
### 3.3 Head #### 3.2 Neck
Head 部分选用 [ArcMargin](../../../ppcls/arch/gears/arcmargin.py),在训练时通过指定margin,增大同类特征之间的角度差异再进行分类,进一步提升抽取特征的表征能力 Neck 部分采用了 [BN Neck](../../../ppcls/arch/gears/bnneck.py),对 Backbone 抽取得到的特征的每个维度进行标准化操作,减少了同时优化度量学习损失函数和分类损失函数的难度,加快收敛速度
### 3.4 Loss #### 3.3 Head
Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训练时以分类任务的损失函数来指导网络进行优化。详细的配置文件见[通用识别配置文件](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml) Head 部分选用 [FC Layer](../../../ppcls/arch/gears/fc.py),使用分类头将 feature 转换成 logits 供后续计算分类损失。
#### 3.4 Loss
Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py)[TripletAngularMarginLoss](../../../ppcls/loss/tripletangularmarginloss.py),在训练时以分类损失和基于角度的三元组损失来指导网络进行优化。我们基于原始的 TripletLoss (困难三元组损失)进行了改进,将优化目标从 L2 欧几里得空间更换成余弦空间,并加入了 anchor 与 positive/negtive 之间的硬性距离约束,让训练与测试的目标更加接近,提升模型的泛化能力。详细的配置文件见 [GeneralRecognitionV2_PPLCNetV2_base.yaml](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-77)
#### 3.5 Data Augmentation
我们考虑到实际相机拍摄时目标主体可能出现一定的旋转而不一定能保持正立状态,因此我们在数据增强中加入了适当的 [随机旋转增强](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117),以提升模型在真实场景中的检索能力。
<a name="4"></a> <a name="4"></a>
## 4. 实验部分 ## 4. 实验部分
训练数据为如下 7 个公开数据集的汇总: 我们对原有的训练数据进行了合理扩充与优化,最终使用如下 17 个公开数据集的汇总:
| 数据集 | 数据量 | 类别数 | 场景 | 数据集地址 | | 数据集 | 数据量 | 类别数 | 场景 | 数据集地址 |
| :----------: | :-----: | :------: | :------: | :--------------------------------------------------------------------------: | | :--------------------- | :-----: | :------: | :---: | :----------------------------------------------------------------------------------: |
| Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | | Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
| GLDv2 | 1580470 | 81313 | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) | | GLDv2 | 1580470 | 81313 | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) |
| VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild) | | VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild) |
| LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | | LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| iCartoonFace | 389678 | 5013 | 动漫人物 | [地址](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
| SOP | 59551 | 11318 | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) | | SOP | 59551 | 11318 | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) |
| Inshop | 25882 | 3997 | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | | Inshop | 25882 | 3997 | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
| **Total** | **5M** | **185K** | ---- | ---- | | bird400 | 58388 | 400 | 鸟类 | [地址](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) |
| 104flows | 12753 | 104 | 花类 | [地址](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) |
最终的模型效果如下表所示: | Cars | 58315 | 112 | 车辆 | [地址](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) |
| Fashion Product Images | 44441 | 47 | 商品 | [地址](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) |
| 模型 | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP | Inshop | Latency(ms) | | flowerrecognition | 24123 | 59 | 花类 | [地址](https://www.kaggle.com/datasets/aymenktari/flowerrecognition) |
| :-----------------------------: | :--------: | :-------: | :--------: | :----------: | :---: | :----: | :---------: | | food-101 | 101000 | 101 | 食物 | [地址](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) |
| GeneralRecognition_PPLCNet_x2_5 | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 | | fruits-262 | 225639 | 262 | 水果 | [地址](https://www.kaggle.com/datasets/aelchimminut/fruits262) |
| inaturalist | 265213 | 1010 | 自然 | [地址](https://github.com/visipedia/inat_comp/tree/master/2017) |
* 预训练模型地址:[通用识别预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams) | indoor-scenes | 15588 | 67 | 室内 | [地址](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019) |
* 采用的评测指标为:`Recall@1` | Products-10k | 141931 | 9691 | 商品 | [地址](https://products-10k.github.io/) |
| CompCars | 16016 | 431 | 车辆 | [地址](http://​​​​​​http://ai.stanford.edu/~jkrause/cars/car_dataset.html​) |
| **Total** | **6M** | **192K** | - | - |
最终的模型精度指标如下表所示:
| 模型 | 延时(ms) | 存储(MB) | product<sup>*</sup> | | Aliproduct | | VeRI-Wild | | LogoDet-3k | | iCartoonFace | | SOP | | Inshop | | gldv2 | | imdb_face | | iNat | | instre | | sketch | | sop | |
| :--------------------- | :------- | :------- | :------------------ | :--- | ---------- | ---- | --------- | ---- | ---------- | ---- | ------------ | ---- | -------- | ---- | -------- | ---- | -------- | ---- | --------- | ---- | -------- | ---- | -------- | ---- | -------- | ---- | -------- | ---- |
| | | | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP |
| PP-ShiTuV1_general_rec | 5.0 | 34 | 65.9 | 54.3 | 83.9 | 83.2 | 88.7 | 60.1 | 86.1 | 73.6 | 84.1 | 72.3 | 79.7 | 58.6 | 89.1 | 69.4 | 98.2 | 91.6 | 28.8 | 8.42 | 12.6 | 6.1 | 72.0 | 50.4 | 27.9 | 9.5 | 97.6 | 90.3 |
| PP-ShiTuV2_general_rec | 6.1 | 19 | 73.7 | 61.0 | 84.2 | 83.3 | 87.8 | 68.8 | 88.0 | 63.2 | 53.6 | 27.5 | 77.6 | 55.3 | 90.8 | 74.3 | 98.1 | 90.5 | 35.9 | 11.2 | 38.6 | 23.9 | 87.7 | 71.4 | 39.3 | 15.6 | 98.3 | 90.9 |
* product数据集是为了验证PP-ShiTu的泛化性能而制作的数据集,所有的数据都没有在训练和测试集中出现。该数据包含7个大类(化妆品、地标、红酒、手表、车、运动鞋、饮料),250个小类。测试时,使用250个小类的标签进行测试;sop数据集来自[GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122),可视为“SOP”数据集的子集。
* 预训练模型地址:[general_PPLCNetV2_base_pretrained_v1.0.pdparams](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams)
* 采用的评测指标为:`Recall@1``mAP`
* 速度评测机器的 CPU 具体信息为:`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz` * 速度评测机器的 CPU 具体信息为:`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`
* 速度指标的评测条件为: 开启 MKLDNN, 线程数设置为 10 * 速度指标的评测条件为: 开启 MKLDNN, 线程数设置为 10
...@@ -94,31 +115,36 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -94,31 +115,36 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
自定义特征提取,是指依据自己的任务,重新训练特征提取模型。 自定义特征提取,是指依据自己的任务,重新训练特征提取模型。
下面基于`GeneralRecognition_PPLCNet_x2_5.yaml`配置文件,介绍主要的四个步骤:1)数据准备;2)模型训练;3)模型评估;4)模型推理 下面基于 `GeneralRecognitionV2_PPLCNetV2_base.yaml` 配置文件,介绍主要的四个步骤:1)数据准备;2)模型训练;3)模型评估;4)模型推理
<a name="5.1"></a> <a name="5.1"></a>
### 5.1 数据准备 ### 5.1 数据准备
首先需要基于任务定制自己的数据集。数据集格式与文件结构详见[数据集格式说明](../data_preparation/recognition_dataset.md) 首先需要基于任务定制自己的数据集。数据集格式与文件结构详见 [数据集格式说明](../data_preparation/recognition_dataset.md)
准备完毕之后还需要在配置文件中修改数据配置相关的内容, 主要包括数据集的地址以及类别数量。对应到配置文件中的位置如下所示: 准备完毕之后还需要在配置文件中修改数据配置相关的内容, 主要包括数据集的地址以及类别数量。对应到配置文件中的位置如下所示:
- 修改类别数: - 修改类别数:
```yaml ```yaml
Head: Head:
name: ArcMargin name: FC
embedding_size: 512 embedding_size: *feat_dim
class_num: 185341 # 此处表示类别数 class_num: 192612 # 此处表示类别数
weight_attr:
initializer:
name: Normal
std: 0.001
bias_attr: False
``` ```
- 修改训练数据集配置: - 修改训练数据集配置:
```yaml ```yaml
Train: Train:
dataset: dataset:
name: ImageNetDataset name: ImageNetDataset
image_root: ./dataset/ # 此处表示train数据所在的目录 image_root: ./dataset/ # 此处表示train数据集所在的目录
cls_label_path: ./dataset/train_reg_all_data.txt # 此处表示train数据集label文件的地址 cls_label_path: ./dataset/train_reg_all_data_v2.txt # 此处表示train数据集对应标注文件的地址
relabel: True
``` ```
- 修改评估数据集中query数据配置: - 修改评估数据集中query数据配置:
```yaml ```yaml
...@@ -126,7 +152,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -126,7 +152,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
dataset: dataset:
name: VeriWild name: VeriWild
image_root: ./dataset/Aliproduct/ # 此处表示query数据集所在的目录 image_root: ./dataset/Aliproduct/ # 此处表示query数据集所在的目录
cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示query数据集label文件的地址 cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示query数据集对应标注文件的地址
``` ```
- 修改评估数据集中gallery数据配置: - 修改评估数据集中gallery数据配置:
```yaml ```yaml
...@@ -134,7 +160,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -134,7 +160,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
dataset: dataset:
name: VeriWild name: VeriWild
image_root: ./dataset/Aliproduct/ # 此处表示gallery数据集所在的目录 image_root: ./dataset/Aliproduct/ # 此处表示gallery数据集所在的目录
cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示gallery数据集label文件的地址 cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示gallery数据集对应标注文件的地址
``` ```
<a name="5.2"></a> <a name="5.2"></a>
...@@ -147,14 +173,14 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -147,14 +173,14 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
```shell ```shell
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py \ python3.7 tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
``` ```
- 单机多卡训练 - 单机多卡训练
```shell ```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \ python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
--gpus="0,1,2,3" tools/train.py \ tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
``` ```
**注意:** **注意:**
配置文件中默认采用`在线评估`的方式,如果你想加快训练速度,可以关闭`在线评估`功能,只需要在上述命令的后面,增加 `-o Global.eval_during_train=False` 配置文件中默认采用`在线评估`的方式,如果你想加快训练速度,可以关闭`在线评估`功能,只需要在上述命令的后面,增加 `-o Global.eval_during_train=False`
...@@ -165,15 +191,15 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -165,15 +191,15 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
```shell ```shell
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py \ python3.7 tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.checkpoint="output/RecModel/latest" -o Global.checkpoint="output/RecModel/latest"
``` ```
- 单机多卡断点恢复训练 - 单机多卡断点恢复训练
```shell ```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \ python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
--gpus="0,1,2,3" tools/train.py \ tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.checkpoint="output/RecModel/latest" -o Global.checkpoint="output/RecModel/latest"
``` ```
...@@ -187,16 +213,16 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -187,16 +213,16 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
```shell ```shell
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python3.7 tools/eval.py \ python3.7 tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model" -o Global.pretrained_model="output/RecModel/best_model"
``` ```
- 多卡评估 - 多卡评估
```shell ```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \ python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
--gpus="0,1,2,3" tools/eval.py \ tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model" -o Global.pretrained_model="output/RecModel/best_model"
``` ```
**注:** 建议使用多卡评估。该方式可以利用多卡并行计算快速得到全部数据的特征,能够加速评估的过程。 **注:** 建议使用多卡评估。该方式可以利用多卡并行计算快速得到全部数据的特征,能够加速评估的过程。
...@@ -212,7 +238,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ...@@ -212,7 +238,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训
首先需要将 `*.pdparams` 模型文件转换成 inference 格式,转换命令如下。 首先需要将 `*.pdparams` 模型文件转换成 inference 格式,转换命令如下。
```shell ```shell
python3.7 tools/export_model.py \ python3.7 tools/export_model.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model" -o Global.pretrained_model="output/RecModel/best_model"
``` ```
生成的推理模型默认位于 `PaddleClas/inference` 目录,里面包含三个文件,分别为 `inference.pdmodel``inference.pdiparams``inference.pdiparams.info` 生成的推理模型默认位于 `PaddleClas/inference` 目录,里面包含三个文件,分别为 `inference.pdmodel``inference.pdiparams``inference.pdiparams.info`
...@@ -228,10 +254,18 @@ python3.7 python/predict_rec.py \ ...@@ -228,10 +254,18 @@ python3.7 python/predict_rec.py \
-c configs/inference_rec.yaml \ -c configs/inference_rec.yaml \
-o Global.rec_inference_model_dir="../inference" -o Global.rec_inference_model_dir="../inference"
``` ```
得到的特征输出格式如下图所示: 得到的特征输出格式如下所示:
![](../../images/feature_extraction_output.png)
```log
wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02
4.39076796e-02 -2.34078392e-02 -9.49947070e-03 1.28221214e-02
5.53947650e-02 1.01355985e-02 -1.06436480e-02 4.97181974e-02
-2.21862812e-02 -1.75557341e-02 1.55848479e-02 -3.33278324e-03
...
-3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02]
```
在实际使用过程中,仅仅得到特征可能并不能满足业务需求。如果想进一步通过特征检索来进行图像识别,可以参照文档[向量检索](./vector_search.md) 在实际使用过程中,仅仅得到特征可能并不能满足业务需求。如果想进一步通过特征检索来进行图像识别,可以参照文档 [向量检索](./vector_search.md)
<a name="6"></a> <a name="6"></a>
...@@ -244,4 +278,4 @@ python3.7 python/predict_rec.py \ ...@@ -244,4 +278,4 @@ python3.7 python/predict_rec.py \
## 7. 参考文献 ## 7. 参考文献
1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf) 1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf)
2. [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698) 2. [Bag of Tricks and A Strong Baseline for Deep Person Re-identification](https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf)
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
- [1. 数据集](#1) - [1. 数据集](#1)
- [2. 模型选择](#2) - [2. 模型选择](#2)
- [2.1 轻量级主体检测模型](#2.1) - [2.1 轻量级主体检测模型](#2.1)
- [2.2 服务端主体检测模型](#2.2)
- [3. 模型训练](#3) - [3. 模型训练](#3)
- [3.1 环境准备](#3.1) - [3.1 环境准备](#3.1)
- [3.2 数据准备](#3.2) - [3.2 数据准备](#3.2)
...@@ -45,14 +44,13 @@ ...@@ -45,14 +44,13 @@
## 2. 模型选择 ## 2. 模型选择
目标检测方法种类繁多,比较常用的有两阶段检测器(如 FasterRCNN 系列等);单阶段检测器(如 YOLO、SSD 等);anchor-free 检测器(如 PicoDet、FCOS 等)。PaddleDetection 中针对服务端使用场景,自研了 PP-YOLO 系列模型;针对端侧(CPU 和移动端等)使用场景,自研了 PicoDet 系列模型,在服务端和端侧均处于业界较为领先的水平。 目标检测方法种类繁多,比较常用的有两阶段检测器(如 FasterRCNN 系列等);单阶段检测器(如 YOLO、SSD 等);anchor-free 检测器(如 PicoDet、FCOS 等)。在主体检测中,我们使用[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/picodet)系列模型,其在CPU端与移动端,速度较快、精度较好,处于较为领先的业界水平。
基于上述研究,PaddleClas 中提供了 2 个通用主体检测模型,为轻量级与服务端主体检测模型,分别适用于端侧场景以及服务端场景。下面的表格中给出了在上述 5 个数据集上的平均 mAP 以及它们的模型大小、预测速度对比信息。 基于上述研究,PaddleClas 中提供了 1 个通用主体检测模型,既轻量级主体检测模型,分别适用于端侧场景以及服务端场景。下面的表格中给出了在上述 5 个数据集上的平均 mAP 以及它们的模型大小、预测速度对比信息。
| 模型 | 模型结构 | 预训练模型下载地址 | inference 模型下载地址 | mAP | inference 模型大小(MB) | 单张图片预测耗时(不包含预处理)(ms) | | 模型 | 模型结构 | 预训练模型下载地址 | inference 模型下载地址 | mAP | inference 模型大小(MB) |
| ------------------ | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----- | ---------------------- | ---------------------------------- | | ------------------ | -------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----- | ---------------------- |
| 轻量级主体检测模型 | PicoDet | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [tar 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) [zip 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | 40.1% | 30.1 | 29.8 | | 轻量级主体检测模型 | PicoDet | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [tar 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) [zip 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | 41.5% | 30.1 |
| 服务端主体检测模型 | PP-YOLOv2 | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams) | [tar 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) [zip 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.zip) | 42.5% | 210.5 | 466.6 |
* 注意 * 注意
* 由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar xf xxx.tar` 解压。 * 由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar xf xxx.tar` 解压。
...@@ -65,37 +63,16 @@ ...@@ -65,37 +63,16 @@
PicoDet 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出,是一个适用于 CPU 或者移动端场景的目标检测算法。具体地,它融合了下面一系列优化算法。 PicoDet 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出,是一个适用于 CPU 或者移动端场景的目标检测算法。具体地,它融合了下面一系列优化算法。
- [ATSS](https://arxiv.org/abs/1912.02424) - [VFL](https://arxiv.org/abs/2008.13367) + [GFL](https://arxiv.org/abs/2006.04388)
- [Generalized Focal Loss](https://arxiv.org/abs/2006.04388) - 新的PAN Neck结构
- 余弦学习率策略 - 余弦学习率策略
- Cycle-EMA - Cycle-EMA
- 轻量级检测 head - [ATSS](https://arxiv.org/abs/1912.02424)[SimOTA](https://arxiv.org/abs/2107.08430) 标签分配策略
更多关于 PicoDet 的优化细节与 benchmark 可以参考 [PicoDet 系列模型介绍](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md) 更多关于 PicoDet 的优化细节与 benchmark 可以参考 [PicoDet 系列模型介绍](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet)
在轻量级主体检测任务中,为了更好地兼顾检测速度与效果,我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络,同时将训练与预测的图像尺度修改为了 640x640,其余配置与 [picodet_lcnet_1_5x_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/more_config/picodet_lcnet_1_5x_416_coco.yml) 完全一致。将数据集更换为自定义的主体检测数据集,进行训练,最终得到检测模型。 在轻量级主体检测任务中,为了更好地兼顾检测速度与效果,我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络,同时将训练与预测的图像尺度修改为了 640x640,其余配置与 [picodet_l_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/picodet/picodet_l_416_coco.yml) 完全一致。将数据集更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
<a name="2.2"></a>
### 2.2 服务端主体检测模型
PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出,从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对 yolov3 模型进行深度优化,最终在“速度-精度”方面达到了业界领先的水平。具体地,优化的策略如下。
- 更优的骨干网络: ResNet50vd-DCN
- 更大的训练 batch size: 8 GPUs,每 GPU batch_size=24,对应调整学习率和迭代轮数
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- 更优的预训练模型
更多关于 PP-YOLO 的详细介绍可以参考:[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)
在服务端主体检测任务中,为了保证检测效果,我们使用 ResNet50vd-DCN 作为检测模型的骨干网络,使用配置文件 [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml),更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
<a name="3"></a> <a name="3"></a>
...@@ -112,19 +89,20 @@ PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) ...@@ -112,19 +89,20 @@ PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
```shell ```shell
cd <path/to/clone/PaddleDetection> cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git git clone https://github.com/PaddlePaddle/PaddleDetection.git
cd PaddleDetection cd PaddleDetection
# 切换到2.3分支
git checkout release/2.3
# 安装其他依赖 # 安装其他依赖
pip install -r requirements.txt pip install -r requirements.txt
``` ```
更多安装教程,请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md) 更多安装教程,请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/docs/tutorials/INSTALL_cn.md)
<a name="3.2"></a> <a name="3.2"></a>
### 3.2 数据准备 ### 3.2 数据准备
对于自定义数据集,首先需要将自己的数据集修改为 COCO 格式,可以参考[自定义检测数据集教程](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md)制作 COCO 格式的数据集。 对于自定义数据集,首先需要将自己的数据集修改为 COCO 格式,可以参考[自定义检测数据集教程](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/docs/tutorials/PrepareDataSet.md)制作 COCO 格式的数据集。
主体检测任务中,所有的检测框均属于前景,在这里需要将标注文件中,检测框的 `category_id` 修改为 1,同时将整个标注文件中的 `categories` 映射表修改为下面的格式,即整个类别映射表中只包含`前景`类别。 主体检测任务中,所有的检测框均属于前景,在这里需要将标注文件中,检测框的 `category_id` 修改为 1,同时将整个标注文件中的 `categories` 映射表修改为下面的格式,即整个类别映射表中只包含`前景`类别。
...@@ -136,22 +114,20 @@ pip install -r requirements.txt ...@@ -136,22 +114,20 @@ pip install -r requirements.txt
### 3.3 配置文件改动和说明 ### 3.3 配置文件改动和说明
我们使用 `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` 配置进行训练,配置文件摘要如下: 我们使用 [mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml) 配置进行训练,配置文件摘要如下:
![](../../images/det/PaddleDetection_config.png) ![](../../images/det/PaddleDetection_config.png)
从上图看到 `ppyolov2_r50vd_dcn_365e_coco.yml` 配置需要依赖其他的配置文件,这些配置文件的含义如下: 从上图看到 `mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml` 配置需要依赖其他的配置文件,这些配置文件的含义如下:
``` ```
coco_detection.yml:主要说明了训练数据和验证数据的路径
runtime.yml:主要说明了公共的运行参数,比如是否使用 GPU、每多少个 epoch 存储 checkpoint 等 runtime.yml:主要说明了公共的运行参数,比如是否使用 GPU、每多少个 epoch 存储 checkpoint 等
optimizer_365e.yml:主要说明了学习率和优化器的配置 optimizer_100e.yml:主要说明了学习率和优化器的配置
ppyolov2_r50vd_dcn.yml:主要说明模型和主干网络的情况 picodet_esnet.yml:主要说明模型和主干网络的情况
ppyolov2_reader.yml:主要说明数据读取器配置,如 batch size,并发加载子进程数等,同时包含读取后预处理操作,如 resize、数据增强等等 picodet_640_reader.yml:主要说明数据读取器配置,如 batch size,并发加载子进程数等,同时包含读取后预处理操作,如 resize、数据增强等等
``` ```
在主体检测任务中,需要将 `datasets/coco_detection.yml` 中的 `num_classes` 参数修改为 1(只有 1 个前景类别),同时将训练集和测试集的路径修改为自定义数据集的路径。 在主体检测任务中,需要将 `datasets/coco_detection.yml` 中的 `num_classes` 参数修改为 1(只有 1 个前景类别),同时将训练集和测试集的路径修改为自定义数据集的路径。
...@@ -169,14 +145,14 @@ PaddleDetection 提供了单卡/多卡训练模式,满足用户多种训练需 ...@@ -169,14 +145,14 @@ PaddleDetection 提供了单卡/多卡训练模式,满足用户多种训练需
```bash ```bash
# windows 和 Mac 下不需要执行该命令 # windows 和 Mac 下不需要执行该命令
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml python tools/train.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml
``` ```
* GPU 多卡训练 * GPU 多卡训练
```bash ```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/legacy_model/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --eval
``` ```
--eval:表示边训练边验证。 --eval:表示边训练边验证。
...@@ -188,7 +164,7 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy ...@@ -188,7 +164,7 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy
```bash ```bash
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
# 指定 pretrain_weights 参数,加载通用的主体检测预训练模型 # 指定 pretrain_weights 参数,加载通用的主体检测预训练模型
python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pretrain_weights=https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams python tools/train.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams
``` ```
* 模型恢复训练 * 模型恢复训练
...@@ -197,10 +173,14 @@ python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pret ...@@ -197,10 +173,14 @@ python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pret
```bash ```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval -r output/ppyolov2_r50vd_dcn_365e_coco/10000 python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --eval -r output/picodet_lcnet_x2_5_640_mainbody/20
``` ```
注意:如果遇到 "`Out of memory error`" 问题, 尝试在 `ppyolov2_reader.yml` 文件中调小 `batch_size`,同时等比例调小学习率。 注意:
- `-r`命令中最后`20`表示从第20个epoch保存的权重开始训练,使用时确保`20.pdparams 20.pdopt`文件存在。请根据实际自行修改
- 如果遇到 "`Out of memory error`" 问题, 尝试在 `picodet_640_reader.yml` 文件中调小 `batch_size`,同时等比例调小学习率。
<a name="3.5"></a> <a name="3.5"></a>
...@@ -210,12 +190,13 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy ...@@ -210,12 +190,13 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy
```bash ```bash
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final python tools/infer.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/picodet_lcnet_x2_5_640_mainbody/model_final
``` ```
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果 `keep_top_k` 表示设置输出目标的最大数量,默认值为 100,用户可以根据自己的实际情况进行设定。 `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果 `keep_top_k` 表示设置输出目标的最大数量,默认值为 100,用户可以根据自己的实际情况进行设定。
<a name="4"></a> <a name="4"></a>
## 4. 模型推理部署 ## 4. 模型推理部署
<a name="4.1"></a> <a name="4.1"></a>
...@@ -224,16 +205,16 @@ python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer ...@@ -224,16 +205,16 @@ python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer
执行导出模型脚本: 执行导出模型脚本:
```bash ```bash
python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams python tools/export_model.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --output_dir=./inference -o weights=output/picodet_lcnet_x2_5_640_mainbody/model_final.pdparams
``` ```
预测模型会导出到 `inference/ppyolov2_r50vd_dcn_365e_coco` 目录下,分别为 `infer_cfg.yml` (预测不需要), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` 预测模型会导出到 `inference/picodet_lcnet_x2_5_640_mainbody` 目录下,分别为 `infer_cfg.yml` (预测不需要), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`
注意: `PaddleDetection` 导出的 inference 模型的文件格式为 `model.xxx`,这里如果希望与 PaddleClas 的 inference 模型文件格式保持一致,需要将其 `model.xxx` 文件修改为 `inference.xxx` 文件,用于后续主体检测的预测部署。 注意: `PaddleDetection` 导出的 inference 模型的文件格式为 `model.xxx`,这里如果希望与 PaddleClas 的 inference 模型文件格式保持一致,需要将其 `model.xxx` 文件修改为 `inference.xxx` 文件,用于后续主体检测的预测部署。
更多模型导出教程,请参考: [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/EXPORT_MODEL.md) 更多模型导出教程,请参考: [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/EXPORT_MODEL.md)
最终,目录 `inference/ppyolov2_r50vd_dcn_365e_coco` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件,其中 `inference.pdiparams` 为保存的 inference 模型权重文件,`inference.pdmodel` 为保存的 inference 模型结构文件。 最终,目录 `inference/picodet_lcnet_x2_5_640_mainbody` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件,其中 `inference.pdiparams` 为保存的 inference 模型权重文件,`inference.pdmodel` 为保存的 inference 模型结构文件。
<a name="4.2"></a> <a name="4.2"></a>
### 4.2 基于python预测引擎推理 ### 4.2 基于python预测引擎推理
...@@ -244,7 +225,7 @@ python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml ...@@ -244,7 +225,7 @@ python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
<a name="4.3"></a> <a name="4.3"></a>
### 4.3 其他推理方式 ### 4.3 其他推理方式
其他推理方法,如C++推理部署、PaddleServing部署等请参考[检测模型推理部署](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/README.md) 其他推理方法,如C++推理部署、PaddleServing部署等请参考[检测模型推理部署](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/deploy/README.md)
### FAQ ### FAQ
......
...@@ -46,7 +46,7 @@ python tools/export_model.py \ ...@@ -46,7 +46,7 @@ python tools/export_model.py \
<a name="3"></a> <a name="3"></a>
## 3. 主体检测模型导出 ## 3. 主体检测模型导出
主体检测模型的导出,可以参考[检测介绍](../image_recognition_pipeline/mainbody_detection.md) 主体检测模型的导出,可以参考[检测介绍](../image_recognition_pipeline/mainbody_detection.md)
<a name="4"></a> <a name="4"></a>
## 4. 识别模型导出 ## 4. 识别模型导出
......
# PP-ShiTu在Paddle-Lite端侧部署
本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleClas PP-ShiTu模型的详细步骤。
Paddle Lite是飞桨轻量化推理引擎,为手机、IoT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。
## 目录
- [1. 环境准备](#1)
- [1.1 准备交叉编译环境](#1.1)
- [1.2 准备预测库](#1.2)
- [2. 编译流程](#2)
- [2.1 模型准备](#2.1)
- [2.1.1 使用PaddleClase提供的推理模型](#2.1.1)
- [2.1.2 使用其他模型](#2.1.2)
- [2.1.2.1 安装paddle_lite_opt工具](#2.1.2.1)
- [2.1.2.2 转换示例](#2.1.2.2)
- [2.2 生成新的索引库](#2.2)
- [2.2.1 数据集环境配置](#2.2.1)
- [2.2.2 生成新的index文件](#2.2.2)
- [2.3 将yaml文件转换成json文件](#2.3)
- [2.4 index字典转换](#2.4)
- [2.5 与手机联调](#2.5)
- [FAQ](#FAQ)
<a name="1"></a>
## 1. 环境准备
### 运行准备
- 电脑(编译Paddle Lite)
- 安卓手机(armv7或armv8)
<a name="1.1"></a>
### 1.1 准备交叉编译环境
交叉编译环境用于编译 Paddle Lite 和 PaddleClas 的PP-ShiTu Lite demo。
支持多种开发环境,不同开发环境的编译流程请参考对应文档,请确保安装完成Java jdk、Android NDK(R17以上)。
1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker)
2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux)
3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os)
```shell
# 配置完成交叉编译环境后,更新环境变量
# for docker、Linux
source ~/.bashrc
# for Mac OS
source ~/.bash_profile
```
<a name="1.2"></a>
### 1.2 准备预测库
预测库有两种获取方式:
1. [**建议**]直接下载,预测库下载链接如下:
|平台| 架构 | 预测库下载链接|
|-|-|-|
|Android| arm7 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv7.clang.c++_static.with_extra.with_cv.tar.gz) |
| Android | arm8 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv.tar.gz) |
| Android | arm8(FP16) | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip) |
**注意**:1. 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo)
2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下:
```shell
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
# 如果使用编译方式,建议使用develop分支编译预测库
git checkout develop
# FP32
./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON
# FP16
./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON --with_arm82_fp16=ON
```
**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/arm_cpu.html)
直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv/`文件夹,通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。
预测库的文件目录如下:
```
inference_lite_lib.android.armv8/
|-- cxx C++ 预测库和头文件
| |-- include C++ 头文件
| | |-- paddle_api.h
| | |-- paddle_image_preprocess.h
| | |-- paddle_lite_factory_helper.h
| | |-- paddle_place.h
| | |-- paddle_use_kernels.h
| | |-- paddle_use_ops.h
| | `-- paddle_use_passes.h
| `-- lib C++预测库
| |-- libpaddle_api_light_bundled.a C++静态库
| `-- libpaddle_light_api_shared.so C++动态库
|-- java Java预测库
| |-- jar
| | `-- PaddlePredictor.jar
| |-- so
| | `-- libpaddle_lite_jni.so
| `-- src
|-- demo C++和Java示例代码
| |-- cxx C++ 预测库demo
| `-- java Java 预测库demo
```
<a name="2"></a>
## 2 编译流程
<a name="2.1"></a>
### 2.1 模型准备
PaddleClas 提供了转换并优化后的推理模型,可以直接参考下方 2.1.1 小节进行下载。如果需要使用其他模型,请参考后续 2.1.2 小节自行转换并优化模型。
<a name="2.1.1"></a>
#### 2.1.1 使用PaddleClas提供的推理模型
```shell
# 进入lite_ppshitu目录
cd $PaddleClas/deploy/lite_shitu
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/lite/ppshitu_lite_models_v1.2.tar
tar -xf ppshitu_lite_models_v1.2.tar
rm -f ppshitu_lite_models_v1.2.tar
```
<a name="2.1.2"></a>
#### 2.1.2 使用其他模型
Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。
**注意**:如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。
<a name="2.1.2.1"></a>
##### 2.1.2.1 安装paddle_lite_opt工具
安装`paddle_lite_opt`工具有如下两种方法:
1. [**建议**]pip安装paddlelite并进行转换
```shell
pip install paddlelite==2.10rc
```
2. 源码编译Paddle-Lite生成`paddle_lite_opt`工具
模型优化需要Paddle-Lite的`opt`可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下:
```shell
# 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout develop
# 启动编译
./lite/tools/build.sh build_optimize_tool
```
编译完成后,`opt`文件位于`build.opt/lite/api/`下,可通过如下方式查看`opt`的运行选项和使用方式;
```shell
cd build.opt/lite/api/
./opt
```
`opt`的使用方式与参数与上面的`paddle_lite_opt`完全一致。
之后使用`paddle_lite_opt`工具可以进行inference模型的转换。`paddle_lite_opt`的部分参数如下:
|选项|说明|
|-|-|
|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径|
|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径|
|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现,默认为naive_buffer|
|--optimize_out|优化模型的输出路径|
|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm|
更详细的`paddle_lite_opt`工具使用说明请参考[使用opt转化模型文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html)
`--model_file`表示inference模型的model文件地址,`--param_file`表示inference模型的param文件地址;`optimize_out`用于指定输出文件的名称(不需要添加`.nb`的后缀)。直接在命令行中运行`paddle_lite_opt`,也可以查看所有参数及其说明。
<a name="2.1.2.2"></a>
##### 2.1.2.2 转换示例
下面介绍使用`paddle_lite_opt`完成主体检测模型和识别模型的预训练模型,转成inference模型,最终转换成Paddle-Lite的优化模型的过程。
1. 转换主体检测模型
```shell
# 当前目录为 $PaddleClas/deploy/lite_shitu
# $code_path需替换成相应的运行目录,可以根据需要,将$code_path设置成需要的目录
export code_path=~
cd $code_path
git clone https://github.com/PaddlePaddle/PaddleDetection.git
# 进入PaddleDetection根目录
cd PaddleDetection
# 切换到2.3分支
git checkout release/2.3
# 将预训练模型导出为inference模型
python tools/export_model.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml -o weights=https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams export_post_process=False --output_dir=inference
# 将inference模型转化为Paddle-Lite优化模型
paddle_lite_opt --model_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdmodel --param_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdiparams --optimize_out=inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det
# 将转好的模型复制到lite_shitu目录下
cd $PaddleClas/deploy/lite_shitu
mkdir models
cp $code_path/PaddleDetection/inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det.nb $PaddleClas/deploy/lite_shitu/models
```
2. 转换识别模型
```shell
# 识别模型下载
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
# 解压模型
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
# 转换为Paddle-Lite模型
paddle_lite_opt --model_file=general_PPLCNet_x2_5_lite_v1.0_infer/inference.pdmodel --param_file=general_PPLCNet_x2_5_lite_v1.0_infer/inference.pdiparams --optimize_out=general_PPLCNet_x2_5_lite_v1.0_infer/rec
# 将模型文件拷贝到lite_shitu下
cp general_PPLCNet_x2_5_lite_v1.0_infer/rec.nb deploy/lite_shitu/models/
```
**注意**`--optimize_out` 参数为优化后模型的保存路径,无需加后缀`.nb``--model_file` 参数为模型结构信息文件的路径,`--param_file` 参数为模型权重信息文件的路径,请注意文件名。
<a name="2.2"></a>
### 2.2 生成新的检索库
由于lite 版本的检索库用的是`faiss1.5.3`版本,与新版本不兼容,因此需要重新生成index库
<a name="2.2.1"></a>
#### 2.2.1 数据及环境配置
```shell
# 进入PaddleClas根目录
cd $PaddleClas
# 安装PaddleClas
python setup.py install
cd deploy
# 下载瓶装饮料数据集
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar
rm -rf drink_dataset_v1.0.tar
rm -rf drink_dataset_v1.0/index
# 安装1.5.3版本的faiss
pip install faiss-cpu==1.5.3
# 下载通用识别模型,可替换成自己的inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
rm -rf general_PPLCNet_x2_5_lite_v1.0_infer.tar
```
<a name="2.2.2"></a>
#### 2.2.2 生成新的index文件
```shell
# 生成新的index库,注意指定好识别模型的路径,同时将index_mothod修改成Flat,HNSW32和IVF在此版本中可能存在bug,请慎重使用。
# 如果使用自己的识别模型,对应的修改inference model的目录
python python/build_gallery.py -c configs/inference_drink.yaml -o Global.rec_inference_model_dir=general_PPLCNet_x2_5_lite_v1.0_infer -o IndexProcess.index_method=Flat
# 进入到lite_shitu目录
cd lite_shitu
mv ../drink_dataset_v1.0 .
```
<a name="2.3"></a>
### 2.3 将yaml文件转换成json文件
```shell
# 如果测试单张图像,路径使用相对路径
python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_path images/demo.jpeg
# or
# 如果测试多张图像
python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_dir images
# 执行完成后,会在lit_shitu下生成shitu_config.json配置文件
```
<a name="2.4"></a>
### 2.4 index字典转换
由于python的检索库字典,使用`pickle`进行的序列化存储,导致C++不方便读取,因此需要进行转换
```shell
# 转化id_map.pkl为id_map.txt
python transform_id_map.py -c ../configs/inference_drink.yaml
```
转换成功后,会在`IndexProcess.index_dir`目录下生成`id_map.txt`
<a name="2.5"></a>
### 2.5 与手机联调
首先需要进行一些准备工作。
1. 准备一台arm8的安卓手机,如果编译的预测库是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI=arm7`
2. 电脑上安装ADB工具,用于调试。 ADB安装方式如下:
2.1. MAC电脑安装ADB:
```shell
brew cask install android-platform-tools
```
2.2. Linux安装ADB
```shell
sudo apt update
sudo apt install -y wget adb
```
2.3. Window安装ADB
win上安装需要去谷歌的安卓平台下载ADB软件包进行安装:[链接](https://developer.android.com/studio)
3. 手机连接电脑后,开启手机`USB调试`选项,选择`文件传输`模式,在电脑终端中输入:
```shell
adb devices
```
如果有device输出,则表示安装成功,如下所示:
```
List of devices attached
744be294 device
```
4. 编译lite部署代码生成移动端可执行文件
```shell
cd $PaddleClas/deploy/lite_shitu
# ${lite prediction library path}下载的Paddle-Lite库路径
inference_lite_path=${lite prediction library path}/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.with_cv/
mkdir $inference_lite_path/demo/cxx/ppshitu_lite
cp -r * $inference_lite_path/demo/cxx/ppshitu_lite
cd $inference_lite_path/demo/cxx/ppshitu_lite
# 执行编译,等待完成后得到可执行文件main
make ARM_ABI=arm8
#如果是arm7,则执行 make ARM_ABI = arm7 (或者在Makefile中修改该项)
```
5. 准备优化后的模型、预测库文件、测试图像。
```shell
mkdir deploy
# 移动的模型路径要和之前生成的json文件中模型路径一致
mv ppshitu_lite_models_v1.2 deploy/
mv drink_dataset_v1.0 deploy/
mv images deploy/
mv shitu_config.json deploy/
cp pp_shitu deploy/
# 将C++预测动态库so文件复制到deploy文件夹中
cp ../../../cxx/lib/libpaddle_light_api_shared.so deploy/
```
执行完成后,deploy文件夹下将有如下文件格式:
```shell
deploy/
|-- ppshitu_lite_models_v1.1/
| |--mainbody_PPLCNet_x2_5_640_quant_v1.1_lite.nb 优化后的主体检测模型文件
| |--general_PPLCNet_x2_5_lite_v1.1_infer.nb 优化后的识别模型文件
|-- images/
| |--demo.jpg 图片文件
|-- drink_dataset_v1.0/ 瓶装饮料demo数据
| |--index 检索index目录
|-- pp_shitu 生成的移动端执行文件
|-- shitu_config.json 执行时参数配置文件
|-- libpaddle_light_api_shared.so Paddle-Lite库文件
```
**注意:**
* `shitu_config.json` 包含了目标检测的超参数,请按需进行修改
6. 启动调试,上述步骤完成后就可以使用ADB将文件夹 `deploy/` push到手机上运行,步骤如下:
```shell
# 将上述deploy文件夹push到手机上
adb push deploy /data/local/tmp/
adb shell
cd /data/local/tmp/deploy
export LD_LIBRARY_PATH=/data/local/tmp/deploy:$LD_LIBRARY_PATH
# 修改权限为可执行
chmod 777 pp_shitu
# 执行程序
./pp_shitu shitu_config.json
```
如果对代码做了修改,则需要重新编译并push到手机上。
运行效果如下:
```
images/demo.jpeg:
result0: bbox[344, 98, 527, 593], score: 0.811656, label: 红牛-强化型
result1: bbox[0, 0, 600, 600], score: 0.729664, label: 红牛-强化型
```
<a name="FAQ"></a>
## FAQ
Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗?
A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件即可,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。
Q2:换一个图测试怎么做?
A2:替换 deploy 下的测试图像为你想要测试的图像,并重新生成json配置文件(或者直接修改图像路径),使用 ADB 再次 push 到手机上即可。
# Python 预测推理 # Python 预测推理
---
首先请参考文档[环境准备](../installation/install_paddleclas.md)配置运行环境。 首先请参考文档[环境准备](../installation/install_paddleclas.md)配置运行环境。
## 目录 ## 目录
...@@ -13,47 +11,50 @@ ...@@ -13,47 +11,50 @@
- [2.3 PP-ShiTu PipeLine推理](#2.3) - [2.3 PP-ShiTu PipeLine推理](#2.3)
<a name="1"></a> <a name="1"></a>
## 1. 图像分类推理 ## 1. 图像分类推理
首先请参考文档[模型导出](./export_model.md)准备 inference 模型,然后进入 PaddleClas 的 `deploy` 目录下: 首先请参考文档[模型导出](./export_model.md)准备 inference 模型,然后进入 PaddleClas 的 `deploy` 目录下:
```shell ```shell
cd /path/to/PaddleClas/deploy cd PaddleClas/deploy
``` ```
使用以下命令进行预测: 使用以下命令进行预测:
```shell ```shell
python python/predict_cls.py -c configs/inference_cls.yaml python3.7 python/predict_cls.py -c configs/inference_cls.yaml
``` ```
在配置文件 `configs/inference_cls.yaml` 中有以下字段用于配置预测参数: 在配置文件 `configs/inference_cls.yaml` 中有以下字段用于配置预测参数:
* `Global.infer_imgs`:待预测的图片文件路径; * `Global.infer_imgs`:待预测的图片文件(夹)路径;
* `Global.inference_model_dir`:inference 模型文件所在目录,该目录下需要有文件 `inference.pdmodel``inference.pdiparams` 两个文件; * `Global.inference_model_dir`:inference 模型文件所在文件夹的路径,该文件夹下需要有文件 `inference.pdmodel``inference.pdiparams` 两个文件;
* `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认为 `False`
* `Global.use_gpu`:是否使用 GPU 预测,默认为 `True` * `Global.use_gpu`:是否使用 GPU 预测,默认为 `True`
* `Global.enable_mkldnn`:是否启用 `MKL-DNN` 加速库,默认为 `False`。注意 `enable_mkldnn``use_gpu` 同时为 `True` 时,将忽略 `enable_mkldnn`,而使用 GPU 预测; * `Global.enable_mkldnn`:是否启用 `MKL-DNN` 加速库,默认为 `False`。注意 `enable_mkldnn``use_gpu` 同时为 `True` 时,将忽略 `enable_mkldnn`,而使用 GPU 预测;
* `Global.use_fp16`:是否启用 `FP16`,默认为 `False` * `Global.use_fp16`:是否启用 `FP16`,默认为 `False`
* `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认为 `False`
* `PreProcess`:用于数据预处理配置; * `PreProcess`:用于数据预处理配置;
* `PostProcess`:由于后处理配置; * `PostProcess`:由于后处理配置;
* `PostProcess.Topk.class_id_map_file`:数据集 label 的映射文件,默认为 `./utils/imagenet1k_label_list.txt`,该文件为 PaddleClas 所使用的 ImageNet 数据集 label 映射文件。 * `PostProcess.Topk.class_id_map_file`:数据集 label 的映射文件,默认为 `../ppcls/utils/imagenet1k_label_list.txt`,该文件为 PaddleClas 所使用的 ImageNet 数据集 label 映射文件。
**注意**: **注意**:
* 如果使用 VisionTransformer 系列模型,如 `DeiT_***_384`, `ViT_***_384` 等,请注意模型的输入数据尺寸,部分模型需要修改参数: `PreProcess.resize_short=384`, `PreProcess.resize=384` * 如果使用 VisionTransformer 系列模型,如 `DeiT_***_384`, `ViT_***_384` 等,请注意模型的输入数据尺寸,该类模型需要修改参数: `PreProcess.resize_short=384`, `PreProcess.resize=384`
* 如果你希望提升评测模型速度,使用 GPU 评测时,建议开启 TensorRT 加速预测,使用 CPU 评测时,建议开启 MKL-DNN 加速预测。 * 如果你希望提升评测模型速度,使用 GPU 评测时,建议开启 TensorRT 加速预测,使用 CPU 评测时,建议开启 MKL-DNN 加速预测。
<a name="2"></a> <a name="2"></a>
## 2. PP-ShiTu模型推理 ## 2. PP-ShiTu模型推理
PP-ShiTu整个Pipeline包含三部分:主体检测、特提取模型、特征检索。其中主体检测、特征模型可以单独推理使用。单独主体检测详见[2.1](#2.1),特征提取模型单独推理详见[2.2](#2.2), PP-ShiTu整体推理详见[2.3](#2.3) PP-ShiTu整个Pipeline包含三部分:主体检测、特征提取模型、特征检索。其中主体检测模型、特征提取模型可以单独推理使用。单独使用主体检测详见[主体检测模型推理](#2.1),特征提取模型单独推理详见[特征提取模型推理](#2.2), PP-ShiTu整体推理详见[PP-ShiTu PipeLine推理](#2.3)
<a name="2.1"></a> <a name="2.1"></a>
### 2.1 主体检测模型推理 ### 2.1 主体检测模型推理
进入 PaddleClas 的 `deploy` 目录下: 进入 PaddleClas 的 `deploy` 目录下:
```shell ```shell
cd /path/to/PaddleClas/deploy cd PaddleClas/deploy
``` ```
准备 PaddleClas 提供的主体检测 inference 模型: 准备 PaddleClas 提供的主体检测 inference 模型:
...@@ -61,28 +62,28 @@ cd /path/to/PaddleClas/deploy ...@@ -61,28 +62,28 @@ cd /path/to/PaddleClas/deploy
```shell ```shell
mkdir -p models mkdir -p models
# 下载通用检测 inference 模型并解压 # 下载通用检测 inference 模型并解压
wget -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar wget -nc -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf ./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar -C ./models/ tar -xf ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar -C ./models/
``` ```
使用以下命令进行预测: 使用以下命令进行预测:
```shell ```shell
python python/predict_det.py -c configs/inference_det.yaml python3.7 python/predict_det.py -c configs/inference_det.yaml
``` ```
在配置文件 `configs/inference_det.yaml` 中有以下字段用于配置预测参数: 在配置文件 `configs/inference_det.yaml` 中有以下字段用于配置预测参数:
* `Global.infer_imgs`:待预测的图片文件路径; * `Global.infer_imgs`:待预测的图片文件路径;
* `Global.use_gpu`: 是否使用 GPU 预测,默认为 `True` * `Global.use_gpu`: 是否使用 GPU 预测,默认为 `True`
<a name="2.2"></a> <a name="2.2"></a>
### 2.2 特征提取模型推理 ### 2.2 特征提取模型推理
下面以商品特征提取为例,介绍特征提取模型推理。首先进入 PaddleClas 的 `deploy` 目录下: 下面以商品图片的特征提取为例,介绍特征提取模型推理。首先进入 PaddleClas 的 `deploy` 目录下:
```shell ```shell
cd /path/to/PaddleClas/deploy cd PaddleClas/deploy
``` ```
准备 PaddleClas 提供的商品特征提取 inference 模型: 准备 PaddleClas 提供的商品特征提取 inference 模型:
...@@ -90,13 +91,24 @@ cd /path/to/PaddleClas/deploy ...@@ -90,13 +91,24 @@ cd /path/to/PaddleClas/deploy
```shell ```shell
mkdir -p models mkdir -p models
# 下载商品特征提取 inference 模型并解压 # 下载商品特征提取 inference 模型并解压
wget -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar wget -nc -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
tar -xf ./models/product_ResNet50_vd_aliproduct_v1.0_infer.tar -C ./models/ tar -xf ./models/general_PPLCNetV2_base_pretrained_v1.0_infer.tar -C ./models/
```
使用以下命令进行预测:
```shell
python3.7 python/predict_rec.py -c configs/inference_rec.yaml
``` ```
上述预测命令可以得到一个 512 维的特征向量,直接输出在在命令行中。 上述预测命令可以得到一个 512 维的特征向量,直接输出在在命令行中。
在配置文件 `configs/inference_det.yaml` 中有以下字段用于配置预测参数:
* `Global.infer_imgs`:待预测的图片文件路径;
* `Global.use_gpu`: 是否使用 GPU 预测,默认为 `True`
<a name="2.3"></a> <a name="2.3"></a>
### 2.3. PP-ShiTu PipeLine推理 ### 2.3. PP-ShiTu PipeLine推理
主体检测、特征提取和向量检索的串联预测,可以参考图像识别[快速体验](../quick_start/quick_start_recognition.md) 主体检测、特征提取和向量检索的串联预测,可以参考[图像识别快速开始](../quick_start/quick_start_recognition.md)
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
- [4. FAQ](#4-faq) - [4. FAQ](#4-faq)
<a name="1"></a> <a name="1"></a>
## 1. 简介 ## 1. 简介
[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻松部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。 [Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻松部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。
...@@ -21,6 +22,7 @@ ...@@ -21,6 +22,7 @@
该部分以 HTTP 预测服务部署为例,介绍怎样在 PaddleClas 中使用 PaddleServing 部署模型服务。目前只支持 Linux 平台部署,暂不支持 Windows 平台。 该部分以 HTTP 预测服务部署为例,介绍怎样在 PaddleClas 中使用 PaddleServing 部署模型服务。目前只支持 Linux 平台部署,暂不支持 Windows 平台。
<a name="2"></a> <a name="2"></a>
## 2. Serving 安装 ## 2. Serving 安装
Serving 官网推荐使用 docker 安装并部署 Serving 环境。首先需要拉取 docker 环境并创建基于 Serving 的 docker。 Serving 官网推荐使用 docker 安装并部署 Serving 环境。首先需要拉取 docker 环境并创建基于 Serving 的 docker。
...@@ -59,12 +61,12 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -59,12 +61,12 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
* 如果安装速度太慢,可以通过 `-i https://pypi.tuna.tsinghua.edu.cn/simple` 更换源,加速安装过程。 * 如果安装速度太慢,可以通过 `-i https://pypi.tuna.tsinghua.edu.cn/simple` 更换源,加速安装过程。
* 其他环境配置安装请参考:[使用Docker安装Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md) * 其他环境配置安装请参考:[使用Docker安装Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
<a name="3"></a> <a name="3"></a>
## 3. 图像识别服务部署 ## 3. 图像识别服务部署
使用 PaddleServing 做图像识别服务化部署时,**需要将保存的多个 inference 模型都转换为 Serving 模型**。 下面以 PP-ShiTu 中的超轻量图像识别模型为例,介绍图像识别服务的部署。 使用 PaddleServing 做图像识别服务化部署时,**需要将保存的多个 inference 模型都转换为 Serving 模型**。 下面以 PP-ShiTu 中的超轻量图像识别模型为例,介绍图像识别服务的部署。
<a name="3.1"></a> <a name="3.1"></a>
### 3.1 模型转换 ### 3.1 模型转换
...@@ -79,8 +81,8 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -79,8 +81,8 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
mkdir models mkdir models
cd models cd models
# 下载并解压通用识别模型 # 下载并解压通用识别模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar
# 下载并解压通用检测模型 # 下载并解压通用检测模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
...@@ -89,37 +91,26 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -89,37 +91,26 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
```shell ```shell
# 转换通用识别模型 # 转换通用识别模型
python3.7 -m paddle_serving_client.convert \ python3.7 -m paddle_serving_client.convert \
--dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \ --dirname ./general_PPLCNetV2_base_pretrained_v1.0_infer/ \
--model_filename inference.pdmodel \ --model_filename inference.pdmodel \
--params_filename inference.pdiparams \ --params_filename inference.pdiparams \
--serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \ --serving_server ./general_PPLCNetV2_base_pretrained_v1.0_serving/ \
--serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/ --serving_client ./general_PPLCNetV2_base_pretrained_v1.0_client/
``` ```
上述命令的参数含义与[#3.1 模型转换](#3.1)相同 上述命令的参数含义与[#3.1 模型转换](#3.1)相同
通用识别 inference 模型转换完成后,会在当前文件夹多出 `general_PPLCNet_x2_5_lite_v1.0_serving/``general_PPLCNet_x2_5_lite_v1.0_client/` 的文件夹,具备如下结构: 通用识别 inference 模型转换完成后,会在当前文件夹多出 `general_PPLCNetV2_base_pretrained_v1.0_serving/``general_PPLCNetV2_base_pretrained_v1.0_client/` 的文件夹,具备如下结构:
```shell ```shell
├── general_PPLCNet_x2_5_lite_v1.0_serving/ ├── general_PPLCNetV2_base_pretrained_v1.0_serving/
│ ├── inference.pdiparams │ ├── inference.pdiparams
│ ├── inference.pdmodel │ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt │ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt │ └── serving_server_conf.stream.prototxt
└── general_PPLCNet_x2_5_lite_v1.0_client/ └── general_PPLCNetV2_base_pretrained_v1.0_client/
├── serving_client_conf.prototxt ├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt └── serving_client_conf.stream.prototxt
``` ```
- 转换通用检测 inference 模型为 Serving 模型: 接下来分别修改 `general_PPLCNetV2_base_pretrained_v1.0_serving/``general_PPLCNetV2_base_pretrained_v1.0_client/` 目录下的 `serving_server_conf.prototxt` 中的 `alias` 名字: 将 `fetch_var` 中的 `alias_name` 改为 `features`。修改后的 `serving_server_conf.prototxt` 内容如下
```shell
# 转换通用检测模型
python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
```
上述命令的参数含义与[#3.1 模型转换](#3.1)相同
识别推理模型转换完成后,会在当前文件夹多出 `general_PPLCNet_x2_5_lite_v1.0_serving/``general_PPLCNet_x2_5_lite_v1.0_client/` 的文件夹。分别修改 `general_PPLCNet_x2_5_lite_v1.0_serving/``general_PPLCNet_x2_5_lite_v1.0_client/` 目录下的 `serving_server_conf.prototxt` 中的 `alias` 名字: 将 `fetch_var` 中的 `alias_name` 改为 `features`。 修改后的 `serving_server_conf.prototxt` 内容如下
```log ```log
feed_var { feed_var {
...@@ -132,13 +123,24 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -132,13 +123,24 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
shape: 224 shape: 224
} }
fetch_var { fetch_var {
name: "save_infer_model/scale_0.tmp_1" name: "batch_norm_25.tmp_2"
alias_name: "features" alias_name: "features"
is_lod_tensor: false is_lod_tensor: false
fetch_type: 1 fetch_type: 1
shape: 512 shape: 512
} }
``` ```
- 转换通用检测 inference 模型为 Serving 模型:
```shell
# 转换通用检测模型
python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
```
上述命令的参数含义与[#3.1 模型转换](#3.1)相同
通用检测 inference 模型转换完成后,会在当前文件夹多出 `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/``picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` 的文件夹,具备如下结构: 通用检测 inference 模型转换完成后,会在当前文件夹多出 `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/``picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` 的文件夹,具备如下结构:
```shell ```shell
├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ ├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
...@@ -151,9 +153,9 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -151,9 +153,9 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
├── serving_client_conf.prototxt ├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt └── serving_client_conf.stream.prototxt
``` ```
上述命令中参数具体含义如下表所示 上述转换命令的参数具体含义如下表所示
| 参数 | 类型 | 默认值 | 描述 | | 参数 | 类型 | 默认值 | 描述 |
| ----------------- | ---- | ------------------ | ------------------------------------------------------------ | | ----------------- | ---- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。 | | `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。 |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 | | `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None | | `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
...@@ -165,11 +167,13 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -165,11 +167,13 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
# 回到deploy目录 # 回到deploy目录
cd ../ cd ../
# 下载构建完成的检索库 index # 下载构建完成的检索库 index
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar
# 解压构建完成的检索库 index # 解压构建完成的检索库 index
tar -xf drink_dataset_v1.0.tar tar -xf drink_dataset_v2.0.tar
``` ```
<a name="3.2"></a> <a name="3.2"></a>
### 3.2 服务部署和请求 ### 3.2 服务部署和请求
**注意:** 识别服务涉及到多个模型,出于性能考虑采用 PipeLine 部署方式。Pipeline 部署方式当前不支持 windows 平台。 **注意:** 识别服务涉及到多个模型,出于性能考虑采用 PipeLine 部署方式。Pipeline 部署方式当前不支持 windows 平台。
...@@ -190,6 +194,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -190,6 +194,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
``` ```
<a name="3.2.1"></a> <a name="3.2.1"></a>
#### 3.2.1 Python Serving #### 3.2.1 Python Serving
- 启动服务: - 启动服务:
...@@ -204,30 +209,32 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -204,30 +209,32 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
``` ```
成功运行后,模型预测的结果会打印在客户端中,如下所示: 成功运行后,模型预测的结果会打印在客户端中,如下所示:
```log ```log
{'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [345, 95, 524, 576], 'rec_docs': '红牛-强化型', 'rec_scores': 0.79903316}]"], 'tensors': []} {'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [438, 71, 660, 712], 'rec_docs': '元气森林', 'rec_scores': 0.7581642}, {'bbox': [220, 72, 449, 689], 'rec_docs': '元气森林', 'rec_scores': 0.68961805}, {'bbox': [794, 104, 978, 652], 'rec_docs': '元气森林', 'rec_scores': 0.63075215}]"], 'tensors': []}
``` ```
<a name="3.2.2"></a> <a name="3.2.2"></a>
#### 3.2.2 C++ Serving #### 3.2.2 C++ Serving
与Python Serving不同,C++ Serving客户端调用 C++ OP来预测,因此在启动服务之前,需要编译并安装 serving server包,并设置 `SERVING_BIN` 与Python Serving不同,C++ Serving客户端调用 C++ OP来预测,因此在启动服务之前,需要编译并安装 serving server包,并设置 `SERVING_BIN`
- 编译并安装Serving server包 - 编译并安装Serving server包
```shell ```shell
# 进入工作目录 # 进入工作目录
cd PaddleClas/deploy/paddleserving cd ./deploy/paddleserving
# 一键编译安装Serving server、设置 SERVING_BIN # 一键编译安装Serving server、设置 SERVING_BIN
source ./build_server.sh python3.7 source ./build_server.sh python3.7
``` ```
**注:**[build_server.sh](../build_server.sh#L55-L62)所设定的路径可能需要根据实际机器上的环境如CUDA、python版本等作一定修改,然后再编译;如果执行`build_server.sh`过程中遇到非网络原因的报错,则可以手动将脚本中的命令逐条复制到终端执行。 **注:** [build_server.sh](../build_server.sh#L55-L62) 所设定的路径可能需要根据实际机器上的环境如CUDA、python版本等作一定修改,然后再编译;如果执行 `build_server.sh` 过程中遇到非网络原因的报错,则可以手动将脚本中的命令逐条复制到终端执行。
- C++ Serving使用的输入输出格式与Python不同,因此需要执行以下命令,将4个文件复制到下的文件覆盖掉[3.1](#31-模型转换)得到文件夹中的对应4个prototxt文件。 - C++ Serving使用的输入输出格式与Python不同,因此需要执行以下命令,将4个文件复制到下的文件覆盖掉[3.1](#31-模型转换)得到文件夹中的对应4个prototxt文件。
```shell ```shell
# 进入PaddleClas/deploy目录 # 回到deploy目录
cd PaddleClas/deploy/ cd ../
# 覆盖prototxt文件 # 覆盖prototxt文件
\cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_serving/ \cp ./paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_serving/*.prototxt ./models/general_PPLCNetV2_base_pretrained_v1.0_serving/
\cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_client/ \cp ./paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_client/*.prototxt ./models/general_PPLCNetV2_base_pretrained_v1.0_client/
\cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/ \cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
\cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
``` ```
...@@ -235,7 +242,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -235,7 +242,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
- 启动服务: - 启动服务:
```shell ```shell
# 进入工作目录 # 进入工作目录
cd PaddleClas/deploy/paddleserving/recognition cd ./paddleserving/recognition
# 端口号默认为9400;运行日志默认保存在 log_PPShiTu.txt 中 # 端口号默认为9400;运行日志默认保存在 log_PPShiTu.txt 中
# CPU部署 # CPU部署
...@@ -252,9 +259,9 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -252,9 +259,9 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
成功运行后,模型预测的结果会打印在客户端中,如下所示: 成功运行后,模型预测的结果会打印在客户端中,如下所示:
```log ```log
WARNING: Logging before InitGoogleLogging() is written to STDERR WARNING: Logging before InitGoogleLogging() is written to STDERR
I0614 03:01:36.273097 6084 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9400"): added 1 I0903 16:03:20.020586 35600 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9400"): added 1
I0614 03:01:37.393564 6084 general_model.cpp:490] [client]logid=0,client_cost=1107.82ms,server_cost=1101.75ms. I0903 16:03:21.346057 35600 general_model.cpp:490] [client]logid=0,client_cost=1306.26ms,server_cost=1293.65ms.
[{'bbox': [345, 95, 524, 585], 'rec_docs': '红牛-强化型', 'rec_scores': 0.8073724}] [{'bbox': [437, 71, 660, 727], 'rec_docs': '元气森林', 'rec_scores': 0.76902336}, {'bbox': [222, 72, 449, 700], 'rec_docs': '元气森林', 'rec_scores': 0.69347066}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305151}]
``` ```
- 关闭服务 - 关闭服务
...@@ -265,6 +272,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ...@@ -265,6 +272,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD
执行完毕后出现`Process stopped`信息表示成功关闭服务。 执行完毕后出现`Process stopped`信息表示成功关闭服务。
<a name="4"></a> <a name="4"></a>
## 4. FAQ ## 4. FAQ
**Q1**: 发送请求后没有结果返回或者提示输出解码报错 **Q1**: 发送请求后没有结果返回或者提示输出解码报错
...@@ -276,6 +284,6 @@ unset http_proxy ...@@ -276,6 +284,6 @@ unset http_proxy
``` ```
**Q2**: 启动服务后没有任何反应 **Q2**: 启动服务后没有任何反应
**A2**: 可以检查`config.yml``model_config`对应的路径是否存在,文件夹命名是否正确 **A2**: 可以检查 `config.yml``model_config` 对应的路径是否存在,文件夹命名是否正确
更多的服务部署类型,如 `RPC 预测服务` 等,可以参考 Serving 的[github 官网](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples) 更多的服务部署类型,如 `RPC 预测服务` 等,可以参考 Serving 的[github 官网](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
# PP-ShiTu 库管理工具
本工具是PP-ShiTu的离线库管理工具,主要功能包括:新建图像库、更改图像库、建立索引库、更新索引库等功能。此工具是为了用户能够可视化的管理图像及对应的index库,用户可根据实际情况,灵活的增删改查相应的gallery图像库及索引文件,在提升用户体验的同时,辅助PP-ShiTu在实际应用的过程中达到更好的效果。
目前此工具支持平台包括:
- Mac
- Windows
- Linux(注意,由于linux输入法问题,可能无法支持中文)
## 目录
- [1. 功能介绍](#1)
- [1.1 新建图像库](#1.1)
- [1.2 打开图像库](#1.2)
- [1.3 导入图像](#1.3)
- [1.4 图像操作](#1.3)
- [1.5 其他功能](#1.5)
- [2. 使用说明](#2)
- [2.1 环境安装](#2.1)
- [2.2 模型准备](#2.2)
- [2.3运行使用](#2.3)
- [3.生成文件介绍](#3)
- [致谢](#4)
- [FAQ](#FAQ)
<a name="1"></a>
## 1. 功能介绍
此工具主要功能包括:
- 构建`PP-ShiTu`中索引库对应的`gallery`图像库
- 根据构建的`gallery`图像库,生成索引库
-`gallery`图像库进行操作,如增删改查等操作,并更新对应的索引库
其中主界面的按钮如下图所示
<div align="center">
<img src="https://user-images.githubusercontent.com/11568925/188273082-b1ada7ed-e56e-4b6a-9e79-2dda01a3db69.png" width = "600" />
<p>界面按钮展示</p>
</div>
上图中第一行包括:`主要功能按钮``保存按钮``新增类别按钮``删减类别按钮`
第二行包括:`搜索框``搜索确定键``新加图像按钮``删除图像按钮`
下面将进行具体功能介绍,其操作入口,可以点击`主要功能按钮`下拉菜单查看,如下图所示:
<div align="center">
<img src="https://user-images.githubusercontent.com/11568925/188273056-04b376f5-7275-47ac-898b-474a667bc6a7.png" width = "600" />
<p>主要功能展示</p>
</div>
<a name="1.1"></a>
### 1.1 新建图像库
点击新建库功能后,会选择一个**空的存储目录**或者**新建目录**,此时所有的图片及对应的索引库都会存放在此目录下。完成操作后,如下图所示
<div align="center">
<img src="https://user-images.githubusercontent.com/11568925/188273108-8789b8cf-d2ab-49d5-bc82-f0bf7b41c686.png" width = "600" />
<p>新建库</p>
</div>
此时,用户可以新建类别具体可以点击`新增类别按钮``删减类别按钮`。选中类别后,可以进行添加图像及相关操作,具体可以点击及`新加图像按钮``删除图像按钮`。完成操作后,**注意保存**
<a name="1.2"></a>
### 1.2 打开图像库
此功能是,用此工具存储好的库,进行打开编辑。注意,**打开库时,请选择打开的是新建库时文件夹路径**。打开库后,示例如下
<div align="center">
<img src="https://user-images.githubusercontent.com/11568925/188273143-00ff0558-ccc9-4b8d-9364-43eef5dce334.png" width = "600" />
<p>打开库</p>
</div>
<a name="1.3"></a>
### 1.3 导入图像
在打开图像库或者新建图像库完成后,可以使用导入图像功能,即导入用户自己生成好的图像库。具体有支持两种导入格式
- image_list格式:打开具体的`.txt`文件。`.txt`文件中每一行格式: `image_path label`。跟据文件路径及label导入
- 多文件夹格式:打开`具体文件夹`,此文件夹下存储多个子文件夹,每个子文件夹名字为`label_name`,每个子文件夹中存储对应的图像数据。
<a name="1.4"></a>
### 1.4 图像操作
选择图像后,鼠标右击可以进行如下操作,可以根据需求,选择具体的操作,**注意修改完成图像后,请点击保存按钮,进行保存**
<div align="center">
<img src="https://user-images.githubusercontent.com/11568925/188273178-5eff2f2e-7a8b-4a2b-809e-78f99479162d.png" width = "600" />
<p>图像操作</p>
</div>
<a name="1.5"></a>
### 1.5 生成、更新index库
在用户完成图像库的新建、打开或者修改,并完成保存操作后。可以点击`主要功能按钮``新建/重建索引库``更新索引库`等功能,进行索引库的新建或者更新,生成`PP-ShiTu`使用的Index库
<a name="2"></a>
## 2. 使用说明
<a name="2.1"></a>
### 2.1 环境安装
安装好`PaddleClas`
```shell
pip install fastapi
pip install uvicorn
pip install pyqt5
```
<a name="2.2"></a>
### 2.2 模型准备
请按照[PP-ShiTu快速体验](../quick_start/quick_start_recognition.md#2.2.1)中下载及准备inference model,并修改好`${PaddleClas}/deploy/configs/inference_drink.yaml`的相关参数。
<a name="2.3"></a>
### 2.3 运行使用
运行方式如下
```shell
cd ${PaddleClas}/deploy/shitu_index_manager
python index_manager.py -c ../configs/inference_drink.yaml
```
<a name="3"></a>
## 3. 生成文件介绍
使用此工具后,会生成如下格式的文件
```shell
index_root/ # 库存储目录
|-- image_list.txt # 图像列表,每行:image_path label。由前端生成及修改,后端只读
|-- images # 图像存储目录,由前端生成及增删查等操作。后端只读
| |-- md5.jpg
| |-- md5.jpg
| |-- ……
|-- features.pkl # 建库之后,保存的embedding向量,后端生成,前端无需操作
|-- index # 真正的生成的index库存储目录,后端生成及操作,前端无需操作。
| |-- vector.index # faiss生成的索引库
| |-- id_map.pkl # 索引文件
```
其中`index_root`是使用此工具时,用户选择的存储目录,库的索引文件存储在`index`文件夹中。
使用`PP-ShiTu`时,索引文件目录需换成`index`文件夹的地址。
<a name="4"></a>
## 致谢
此工具的前端主要由[国内qt论坛](http://www.qtcn.org/)总版主[小熊宝宝](https://github.com/cnhemiya)完成,感谢**小熊宝宝**的大力支持~~
此工具前端原项目地址:https://github.com/cnhemiya/shitu-manager
<a name="FAQ"></a>
## FAQ
- 问题1: 点击新建索引库后,程序假死
答:生成索引库比较耗时,耐心等待一段时间就好
- 问题2: 导入图像是什么格式?
答: 目前支持两种格式 1)image_list 格式,list中每行格式:path label。2)文件夹格式:类似`ImageNet`存储方式
- 问题3: 生成 index库报错
答:在修改图像后,必须点击保存按钮,保存完成后,再继续生成index库。
- 问题4: 报错 图像与index库不一致
答:可能用户自己修改了image_list.txt,修改完成后,请及时更新index库,保证其一致。
...@@ -23,17 +23,16 @@ PaddleClas 支持 Python Whl 包方式进行预测,目前 Whl 包方式仅支 ...@@ -23,17 +23,16 @@ PaddleClas 支持 Python Whl 包方式进行预测,目前 Whl 包方式仅支
<a name="1"></a> <a name="1"></a>
## 1. 安装 paddleclas ## 1. 安装 paddleclas
* pip 安装 * **[推荐]** 直接 pip 安装:
```bash ```bash
pip3 install paddleclas==2.2.1 pip3 install paddleclas
``` ```
* 本地构建并安装 * 如需使用 PaddleClas develop 分支体验最新功能,或是需要基于 PaddleClas 进行二次开发,请本地构建安装:
```bash ```bash
python3 setup.py bdist_wheel python3 setup.py install
pip3 install dist/*
``` ```
<a name="2"></a> <a name="2"></a>
......
...@@ -98,16 +98,16 @@ git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.4 ...@@ -98,16 +98,16 @@ git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.4
### 1.3 安装 PaddleClas 及其 Python 依赖库 ### 1.3 安装 PaddleClas 及其 Python 依赖库
建议直接从 PyPI 安装 PaddleClas: * **[建议]** 直接安装 PaddleClas:
```shell ```shell
pip install paddleclas pip install paddleclas
``` ```
PaddleClas 的 Python 依赖库在 `requirements.txt` 中给出,可通过如下命令安装 * 如需使用 PaddleClas develop 分支体验最新功能,或是需要基于 PaddleClas 进行二次开发,请本地构建安装,命令如下
```shell ```shell
pip install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple python setup.py install
``` ```
<a name='2'></a> <a name='2'></a>
......
# PP-ShiTu应用场景介绍
该文档介绍了PP-ShiTu提供的各种应用场景库简介、下载链接以及使用简介。
------
## 目录
- [1. 应用场景介绍](#1-应用场景介绍)
- [2. 使用说明](#2-使用说明)
- [2.1 环境配置](#21-环境配置)
- [2.2 下载、解压场景库数据](#22-下载解压场景库数据)
- [2.3 准备模型](#23-准备模型)
- [2.4 场景库识别与检索](#24-场景库识别与检索)
- [2.4.1 识别单张图像](#241-识别单张图像)
- [2.4.2 基于文件夹的批量识别](#242-基于文件夹的批量识别)
<a name="1. 应用场景介绍"></a>
## 1. 应用场景介绍
PP-ShiTu对原数据集进行了`Gallery`库和`Query`库划分,并生成了对应的`Index`索引库,具体应用场景介绍和下载地址如下表所示。
| 场景 |示例图|场景简介|Recall@1|场景库下载地址|原数据集下载地址|
|:---:|:---:|:---:|:---:|:---:|:---:|
| 球类 | <img src="../../images/ppshitu_application_scenarios/Ball.jpg" height = "100"> |各种球类识别 | 0.9769 | [Balls](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Balls.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/balls-image-classification) |
| 狗识别 | <img src="../../images/ppshitu_application_scenarios/DogBreeds.jpg" height = "100"> | 狗细分类识别,包括69种狗的图像 | 0.9606 | [DogBreeds](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/DogBreeds.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/70-dog-breedsimage-data-set) |
| 宝石 | <img src="../../images/ppshitu_application_scenarios/Gemstones.jpg" height = "100"> | 宝石种类识别 | 0.9653 | [Gemstones](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Gemstones.tar) | [原数据下载地址](https://www.kaggle.com/datasets/lsind18/gemstones-images) |
| 动物 | <img src="../../images/ppshitu_application_scenarios/AnimalImageDataset.jpg" height = "100"> |各种动物识别 | 0.9078 | [AnimalImageDataset](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/AnimalImageDataset.tar) | [原数据下载地址](https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals) |
| 鸟类 | <img src="../../images/ppshitu_application_scenarios/Bird400.jpg" height = "100"> |鸟细分类识别,包括400种各种姿态的鸟类图像 | 0.9673 | [Bird400](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Bird400.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) |
| 交通工具 | <img src="../../images/ppshitu_application_scenarios/Vechicles.jpg" height = "100"> |车、船等交通工具粗分类识别 | 0.9307 | [Vechicles](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Vechicles.tar) | [原数据下载地址](https://www.kaggle.com/datasets/rishabkoul1/vechicle-dataset) |
| 花 | <img src="../../images/ppshitu_application_scenarios/104flowers.jpeg" height = "100"> |104种花细分类识别 | 0.9788 | [104flowers](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/104flowrs.tar) | [原数据下载地址](https://www.kaggle.com/datasets/msheriey/104-flowers-garden-of-eden) |
| 运动种类 | <img src="../../images/ppshitu_application_scenarios/100sports.jpg" height = "100"> |100种运动图像识别 | 0.9413 | [100sports](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/100sports.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/sports-classification) |
| 乐器 | <img src="../../images/ppshitu_application_scenarios/MusicInstruments.jpg" height = "100"> |30种不同乐器种类识别 | 0.9467 | [MusicInstruments](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/MusicInstruments.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/musical-instruments-image-classification) |
| 宝可梦 | <img src="../../images/ppshitu_application_scenarios/Pokemon.png" height = "100"> |宝可梦神奇宝贝识别 | 0.9236 | [Pokemon](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Pokemon.tar) | [原数据下载地址](https://www.kaggle.com/datasets/lantian773030/pokemonclassification) |
| 船 | <img src="../../images/ppshitu_application_scenarios/Boat.jpg" height = "100"> |船种类识别 |0.9242 | [Boat](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Boat.tar) | [原数据下载地址](https://www.kaggle.com/datasets/imsparsh/dockship-boat-type-classification) |
| 鞋子 | <img src="../../images/ppshitu_application_scenarios/Shoes.jpeg" height = "100"> |鞋子种类识别,包括靴子、拖鞋等 | 0.9000 | [Shoes](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Shoes.tar) | [原数据下载地址](https://www.kaggle.com/datasets/noobyogi0100/shoe-dataset) |
| 巴黎建筑 | <img src="../../images/ppshitu_application_scenarios/Paris.jpg" height = "100"> |巴黎著名建筑景点识别,如:巴黎铁塔、圣母院等 | 1.000 | [Paris](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Paris.tar) | [原数据下载地址](https://www.kaggle.com/datasets/skylord/oxbuildings) |
| 蝴蝶 | <img src="../../images/ppshitu_application_scenarios/Butterfly.jpg" height = "100"> |75种蝴蝶细分类识别 | 0.9360 | [Butterfly](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Butterfly.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/butterfly-images40-species) |
| 野外植物 | <img src="../../images/ppshitu_application_scenarios/WildEdiblePlants.jpg" height = "100"> |野外植物识别 | 0.9758 | [WildEdiblePlants](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/WildEdiblePlants.tar) | [原数据下载地址](https://www.kaggle.com/datasets/ryanpartridge01/wild-edible-plants) |
| 天气 | <img src="../../images/ppshitu_application_scenarios/WeatherImageRecognition.jpg" height = "100"> |各种天气场景识别,如:雨天、打雷、下雪等 | 0.9924 | [WeatherImageRecognition](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/WeatherImageRecognition.tar) | [原数据下载地址](https://www.kaggle.com/datasets/jehanbhathena/weather-dataset) |
| 坚果 | <img src="../../images/ppshitu_application_scenarios/TreeNuts.jpg" height = "100"> |各种坚果种类识别 | 0.9412 | [TreeNuts](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/TreeNuts.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/tree-nuts-image-classification) |
| 时装 | <img src="../../images/ppshitu_application_scenarios/FashionProductsImage.jpg" height = "100"> |首饰、挎包、化妆品等时尚商品识别 | 0.9555 | [FashionProductImageSmall](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/FashionProductImageSmall.tar) | [原数据下载地址](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small) |
| 垃圾 | <img src="../../images/ppshitu_application_scenarios/Garbage12.jpg" height = "100"> |12种垃圾分类识别 | 0.9845 | [Garbage12](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Garbage12.tar) | [原数据下载地址](https://www.kaggle.com/datasets/mostafaabla/garbage-classification) |
| 航拍场景 | <img src="../../images/ppshitu_application_scenarios/AID.jpg" height = "100"> |各种航拍场景识别,如机场、火车站等 | 0.9797 | [AID](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/AID.tar) | [原数据下载地址](https://www.kaggle.com/datasets/jiayuanchengala/aid-scene-classification-datasets) |
| 蔬菜 | <img src="../../images/ppshitu_application_scenarios/Veg200.jpg" height = "100"> |各种蔬菜识别 | 0.8929 | [Veg200](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Veg200.tar) | [原数据下载地址](https://www.kaggle.com/datasets/zhaoyj688/vegfru) |
| 商标 | <img src="../../images/ppshitu_application_scenarios/Logo3K.jpg" height = "100"> |两千多种logo识别 | 0.9313 | [Logo3k](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Logo3k.tar) | [原数据下载地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
<a name="2. 使用说明"></a>
## 2. 使用说明
<a name="2.1 环境配置"></a>
### 2.1 环境配置
- 安装:请先参考文档[环境准备](../installation/install_paddleclas.md)配置PaddleClas运行环境
- 进入`deploy`运行目录,本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面命令进入`deploy`目录。
```shell
cd deploy
```
<a name="2.2 下载、解压场景库数据"></a>
### 2.2 下载、解压场景库数据
首先创建存放场景库的地址`deploy/datasets`:
```shell
mkdir datasets
```
下载并解压对应场景库到`deploy/datasets`中。
```shell
cd datasets
# 下载并解压场景库数据
wget {场景库下载链接} && tar -xf {压缩包的名称}
```
`dataset_name`为例,解压完毕后,`datasets/dataset_name`文件夹下应有如下文件结构:
```shel
├── dataset_name/
│ ├── Gallery/
│ ├── Index/
│ ├── Query/
│ ├── gallery_list.txt/
│ ├── query_list.txt/
│ ├── label_list.txt/
├── ...
```
其中,`Gallery`文件夹中存放的是用于构建索引库的原始图像,`Index`表示基于原始图像构建得到的索引库信息,`Query`文件夹存放的是用于检索的图像列表,`gallery_list.txt``query_list.txt`分别为索引库和检索图像的标签文件,`label_list.txt`是标签的中英文对照文件(注意:商标场景库文件不包含中英文对照文件)。
<a name="2.3 准备识别模型"></a>
### 2.3 准备模型
创建存放模型的文件夹`deploy/models`,并下载轻量级主体检测、识别模型,命令如下:
```shell
cd ..
mkdir models
cd models
# 下载检测模型并解压
# wget {检测模型下载链接} && tar -xf {检测模型压缩包名称}
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar
# 下载识别 inference 模型并解压
#wget {识别模型下载链接} && tar -xf {识别模型压缩包名称}
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
```
解压完成后,`models`文件夹下有如下文件结构:
```
├── inference_model_name
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
└── det_model_name
├── inference.pdiparams
├── inference.pdiparams.info
└── inference.pdmodel
```
<a name="2.4 场景库识别与检索"></a>
### 2.4 场景库识别与检索
`动物识别`场景为例,展示识别和检索过程(如果希望尝试其他场景库的识别与检索效果,在下载解压好对应的场景库数据和模型后,替换对应的配置文件即可完成预测)。
注意,此部分使用了`faiss`作为检索库,安装方法如下:
```shell
pip install faiss-cpu==1.7.1post2
```
若使用时,不能正常引用,则`uninstall`之后,重新`install`,尤其是在windows下。
<a name="2.4.1 识别单张图像"></a>
#### 2.4.1 识别单张图像
假设需要测试`./datasets/AnimalImageDataset/Query/羚羊/0a37838e99.jpg`这张图像识别和检索效果。
首先分别修改配置文件`./configs/inference_general.yaml`中的`Global.det_inference_model_dir``Global.rec_inference_model_dir`字段为对应的检测和识别模型文件夹,以及修改测试图像地址字段`Global.infer_imgs`示例如下:
```shell
Global:
infer_imgs: './datasets/AnimalImageDataset/Query/羚羊/0a37838e99.jpg'
det_inference_model_dir: './models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar'
rec_inference_model_dir: './models/general_PPLCNetV2_base_pretrained_v1.0_infer.tar'
```
并修改配置文件`./configs/inference_general.yaml`中的`IndexProcess.index_dir`字段为对应场景index库地址:
```shell
IndexProcess:
index_dir:'./datasets/AnimalImageDataset/Index/'
```
运行下面的命令,对图像`./datasets/AnimalImageDataset/Query/羚羊/0a37838e99.jpg`进行识别与检索
```shell
# 使用下面的命令使用 GPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml
# 使用下面的命令使用 CPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False
```
最终输出结果如下:
```
[{'bbox': [609, 70, 1079, 629], 'rec_docs': '羚羊', 'rec_scores': 0.6571544}]
```
其中`bbox`表示检测出的主体所在位置,`rec_docs`表示索引库中与检测框最为相似的类别,`rec_scores`表示对应的置信度。
检测的可视化结果也保存在`output`文件夹下,对于本张图像,识别结果可视化如下所示。
![](../../images/ppshitu_application_scenarios/systerm_result.jpg)
<a name="2.4.2 基于文件夹的批量识别"></a>
#### 2.4.2 基于文件夹的批量识别
如果希望预测文件夹内的图像,可以直接修改配置文件中`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。
```shell
# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./datasets/AnimalImageDataset/Query/羚羊"
```
终端中会输出该文件夹内所有图像的识别结果,如下所示。
```
...
[{'bbox': [0, 0, 1200, 675], 'rec_docs': '羚羊', 'rec_scores': 0.6153812}]
[{'bbox': [0, 0, 275, 183], 'rec_docs': '羚羊', 'rec_scores': 0.77218026}]
[{'bbox': [264, 79, 1088, 850], 'rec_docs': '羚羊', 'rec_scores': 0.81452656}]
[{'bbox': [0, 0, 188, 268], 'rec_docs': '羚羊', 'rec_scores': 0.637074}]
[{'bbox': [118, 41, 235, 161], 'rec_docs': '羚羊', 'rec_scores': 0.67315465}]
[{'bbox': [0, 0, 175, 287], 'rec_docs': '羚羊', 'rec_scores': 0.68271667}]
[{'bbox': [0, 0, 310, 163], 'rec_docs': '羚羊', 'rec_scores': 0.6706451}]
...
```
所有图像的识别结果可视化图像也保存在`output`文件夹内。
...@@ -18,7 +18,7 @@ LeViT 是一种快速推理的、用于图像分类任务的混合神经网络 ...@@ -18,7 +18,7 @@ LeViT 是一种快速推理的、用于图像分类任务的混合神经网络
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(M) | Params<br>(M) | | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(M) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305 | 7.8 | | LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305 | 7.8 |
| LeViT-128 | 0.7810 | 0.9371 | 0.786 | 0.940 | 406 | 9.2 | | LeViT-128 | 0.7810 | 0.9372 | 0.786 | 0.940 | 406 | 9.2 |
| LeViT-192 | 0.7934 | 0.9446 | 0.800 | 0.947 | 658 | 11 | | LeViT-192 | 0.7934 | 0.9446 | 0.800 | 0.947 | 658 | 11 |
| LeViT-256 | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 | | LeViT-256 | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 |
| LeViT-384 | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 | | LeViT-384 | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 |
......
...@@ -57,7 +57,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ...@@ -57,7 +57,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在
### 1.2.1 Rep 策略 ### 1.2.1 Rep 策略
卷积核的大小决定了卷积层感受野的大小,通过组合使用不同大小的卷积核,能够获取不同尺度的特征,因此 PPLCNetV2 在 Stage3、Stage4 中,在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积,同时为了避免对模型效率的影响,使用重参数化(Re parameterization,Rep)策略对同层的 DW 卷积进行融合,如下图所示。 卷积核的大小决定了卷积层感受野的大小,通过组合使用不同大小的卷积核,能够获取不同尺度的特征,因此 PPLCNetV2 在 Stage4、Stage5 中,在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积,同时为了避免对模型效率的影响,使用重参数化(Re parameterization,Rep)策略对同层的 DW 卷积进行融合,如下图所示。
![](../../images/PP-LCNetV2/rep.png) ![](../../images/PP-LCNetV2/rep.png)
...@@ -65,7 +65,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ...@@ -65,7 +65,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在
### 1.2.2 PW 卷积 ### 1.2.2 PW 卷积
深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成,用以替换标准卷积,为了使深度可分离卷积具有更强的拟合能力,我们尝试使用两层 PW 卷积,同时为了控制模型效率不受影响,两层 PW 卷积设置为:第一个在通道维度对特征图压缩,第二个再通过放大还原特征图通道,如下图所示。通过实验发现,该策略能够显著提高模型性能,同时为了平衡对模型效率带来的影响,PPLCNetV2 仅在 Stage4、Stage5 中使用了该策略。 深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成,用以替换标准卷积,为了使深度可分离卷积具有更强的拟合能力,我们尝试使用两层 PW 卷积,同时为了控制模型效率不受影响,两层 PW 卷积设置为:第一个在通道维度对特征图压缩,第二个再通过放大还原特征图通道,如下图所示。通过实验发现,该策略能够显著提高模型性能,同时为了平衡对模型效率带来的影响,PPLCNetV2 仅在 Stage4 中使用了该策略。
![](../../images/PP-LCNetV2/split_pw.png) ![](../../images/PP-LCNetV2/split_pw.png)
...@@ -73,7 +73,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ...@@ -73,7 +73,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在
### 1.2.3 Shortcut ### 1.2.3 Shortcut
残差结构(residual)自提出以来,被诸多模型广泛使用,但在轻量级卷积神经网络中,由于残差结构所带来的元素级(element-wise)加法操作,会对模型的速度造成影响,我们在 PP-LCNetV2 中,以 Stage 为单位实验了 残差结构对模型的影响,发现残差结构的使用并非一定会带来性能的提高,因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构:在 Block 中增加 Shortcut,如下图所示。 残差结构(residual)自提出以来,被诸多模型广泛使用,但在轻量级卷积神经网络中,由于残差结构所带来的元素级(element-wise)加法操作,会对模型的速度造成影响,我们在 PP-LCNetV2 中,以 Stage 为单位实验了残差结构对模型的影响,发现残差结构的使用并非一定会带来性能的提高,因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构:在 Block 中增加 Shortcut,如下图所示。
![](../../images/PP-LCNetV2/shortcut.png) ![](../../images/PP-LCNetV2/shortcut.png)
...@@ -87,7 +87,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ...@@ -87,7 +87,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在
### 1.2.5 SE 模块 ### 1.2.5 SE 模块
虽然 SE 模块能够显著提高模型性能,但其对模型速度的影响同样不可忽视,在 PP-LCNetV1 中,我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中,我们以 Stage 为单位对 SE 模块的位置做了进一步实验,并发现在 Stage3 中使用能够取得更好的平衡。 虽然 SE 模块能够显著提高模型性能,但其对模型速度的影响同样不可忽视,在 PP-LCNetV1 中,我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中,我们以 Stage 为单位对 SE 模块的位置做了进一步实验,并发现在 Stage4 中使用能够取得更好的平衡。
<a name="1.3"></a> <a name="1.3"></a>
...@@ -110,7 +110,7 @@ PPLCNetV2 目前提供的模型的精度、速度指标及预训练权重链接 ...@@ -110,7 +110,7 @@ PPLCNetV2 目前提供的模型的精度、速度指标及预训练权重链接
| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|
| MobileNetV3_Large_x1_25 | 7.4 | 714 | 76.4 | 93.00 | 5.19 | | MobileNetV3_Large_x1_25 | 7.4 | 714 | 76.4 | 93.00 | 5.19 |
| PPLCNetV2_x2_5 | 9 | 906 | 76.60 | 93.00 | 7.25 | | PPLCNetV1_x2_5 | 9 | 906 | 76.60 | 93.00 | 7.25 |
| <b>PPLCNetV2_base<b> | <b>6.6<b> | <b>604<b> | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> | | <b>PPLCNetV2_base<b> | <b>6.6<b> | <b>604<b> | <b>77.04<b> | <b>93.27<b> | <b>4.32<b> |
| <b>PPLCNetV2_base_ssld<b> | <b>6.6<b> | <b>604<b> | <b>80.07<b> | <b>94.87<b> | <b>4.32<b> | | <b>PPLCNetV2_base_ssld<b> | <b>6.6<b> | <b>604<b> | <b>80.07<b> | <b>94.87<b> | <b>4.32<b> |
......
...@@ -18,10 +18,10 @@ PVTV2 是 VisionTransformer 系列模型,该模型基于 PVT(Pyramid Vision ...@@ -18,10 +18,10 @@ PVTV2 是 VisionTransformer 系列模型,该模型基于 PVT(Pyramid Vision
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) | | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| PVT_V2_B0 | 0.705 | 0.902 | 0.705 | - | 0.53 | 3.7 | | PVT_V2_B0 | 0.7052 | 0.9016 | 0.705 | - | 0.53 | 3.7 |
| PVT_V2_B1 | 0.787 | 0.945 | 0.787 | - | 2.0 | 14.0 | | PVT_V2_B1 | 0.7869 | 0.9450 | 0.787 | - | 2.0 | 14.0 |
| PVT_V2_B2 | 0.821 | 0.960 | 0.820 | - | 3.9 | 25.4 | | PVT_V2_B2 | 0.8206 | 0.9599 | 0.820 | - | 3.9 | 25.4 |
| PVT_V2_B3 | 0.831 | 0.965 | 0.831 | - | 6.7 | 45.2 | | PVT_V2_B3 | 0.8310 | 0.9648 | 0.831 | - | 6.7 | 45.2 |
| PVT_V2_B4 | 0.836 | 0.967 | 0.836 | - | 9.8 | 62.6 | | PVT_V2_B4 | 0.8361 | 0.9666 | 0.836 | - | 9.8 | 62.6 |
| PVT_V2_B5 | 0.837 | 0.966 | 0.838 | - | 11.4 | 82.0 | | PVT_V2_B5 | 0.8374 | 0.9662 | 0.838 | - | 11.4 | 82.0 |
| PVT_V2_B2_Linear | 0.821 | 0.961 | 0.821 | - | 3.8 | 22.6 | | PVT_V2_B2_Linear | 0.8205 | 0.9605 | 0.820 | - | 3.8 | 22.6 |
...@@ -33,19 +33,17 @@ Swin Transformer 是一种新的视觉 Transformer 网络,可以用作计算 ...@@ -33,19 +33,17 @@ Swin Transformer 是一种新的视觉 Transformer 网络,可以用作计算
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) | | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 0.812 | 0.955 | 4.5 | 28 | | SwinTransformer_tiny_patch4_window7_224 | 0.8110 | 0.9549 | 0.812 | 0.955 | 4.5 | 28 |
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 0.832 | 0.962 | 8.7 | 50 | | SwinTransformer_small_patch4_window7_224 | 0.8321 | 0.9622 | 0.832 | 0.962 | 8.7 | 50 |
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 0.835 | 0.965 | 15.4 | 88 | | SwinTransformer_base_patch4_window7_224 | 0.8337 | 0.9643 | 0.835 | 0.965 | 15.4 | 88 |
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 0.845 | 0.970 | 47.1 | 88 | | SwinTransformer_base_patch4_window12_384 | 0.8417 | 0.9674 | 0.845 | 0.970 | 47.1 | 88 |
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | 0.852 | 0.975 | 15.4 | 88 | | SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8516 | 0.9748 | 0.852 | 0.975 | 15.4 | 88 |
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | 0.864 | 0.980 | 47.1 | 88 | | SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8634 | 0.9798 | 0.864 | 0.980 | 47.1 | 88 |
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | 0.863 | 0.979 | 34.5 | 197 | | SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8619 | 0.9788 | 0.863 | 0.979 | 34.5 | 197 |
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | 0.873 | 0.982 | 103.9 | 197 | | SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8706 | 0.9814 | 0.873 | 0.982 | 103.9 | 197 |
[1]:基于 ImageNet22k 数据集预训练,然后在 ImageNet1k 数据集迁移学习得到。 [1]:基于 ImageNet22k 数据集预训练,然后在 ImageNet1k 数据集迁移学习得到。
**注**:与 Reference 的精度差异源于数据预处理不同。
<a name='3'></a> <a name='3'></a>
### 1.3 Benchmark ### 1.3 Benchmark
...@@ -131,4 +129,3 @@ PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例, ...@@ -131,4 +129,3 @@ PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例,
Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署,包括TensorRT/OpenVINO/MNN/TNN/NCNN,以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍,可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX) Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署,包括TensorRT/OpenVINO/MNN/TNN/NCNN,以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍,可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)
PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例,您可以参考[Paddle2ONNX 模型转换与预测](@shuilong)来完成相应的部署工作。 PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例,您可以参考[Paddle2ONNX 模型转换与预测](@shuilong)来完成相应的部署工作。
...@@ -17,14 +17,12 @@ Twins 网络包括 Twins-PCPVT 和 Twins-SVT,其重点对空间注意力机制 ...@@ -17,14 +17,12 @@ Twins 网络包括 Twins-PCPVT 和 Twins-SVT,其重点对空间注意力机制
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) | | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| pcpvt_small | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1 | | pcpvt_small | 0.8115 | 0.9567 | 0.812 | - | 3.7 | 24.1 |
| pcpvt_base | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8 | | pcpvt_base | 0.8268 | 0.9627 | 0.827 | - | 6.4 | 43.8 |
| pcpvt_large | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9 | | pcpvt_large | 0.8306 | 0.9659 | 0.831 | - | 9.5 | 60.9 |
| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8 | 24 | | alt_gvt_small | 0.8177 | 0.9557 | 0.817 | - | 2.8 | 24 |
| alt_gvt_base | 0.8294 | 0.9621 | 0.832 | - | 8.3 | 56 | | alt_gvt_base | 0.8315 | 0.9629 | 0.832 | - | 8.3 | 56 |
| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2 | | alt_gvt_large | 0.8364 | 0.9651 | 0.837 | - | 14.8 | 99.2 |
**注**:与 Reference 的精度差异源于数据预处理不同。
<a name='3'></a> <a name='3'></a>
......
...@@ -21,27 +21,25 @@ DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020 ...@@ -21,27 +21,25 @@ DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) | | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ViT_small_patch16_224 | 0.7769 | 0.9342 | 0.7785 | 0.9342 | 9.41 | 48.60 | | ViT_small_patch16_224 | 0.7553 | 0.9211 | 0.7785 | 0.9342 | 9.41 | 48.60 |
| ViT_base_patch16_224 | 0.8195 | 0.9617 | 0.8178 | 0.9613 | 16.85 | 86.42 | | ViT_base_patch16_224 | 0.8187 | 0.9618 | 0.8178 | 0.9613 | 16.85 | 86.42 |
| ViT_base_patch16_384 | 0.8414 | 0.9717 | 0.8420 | 0.9722 | 49.35 | 86.42 | | ViT_base_patch16_384 | 0.8414 | 0.9717 | 0.8420 | 0.9722 | 49.35 | 86.42 |
| ViT_base_patch32_384 | 0.8176 | 0.9613 | 0.8166 | 0.9613 | 12.66 | 88.19 | | ViT_base_patch32_384 | 0.8176 | 0.9613 | 0.8166 | 0.9613 | 12.66 | 88.19 |
| ViT_large_patch16_224 | 0.8323 | 0.9650 | 0.8306 | 0.9644 | 59.65 | 304.12 | | ViT_large_patch16_224 | 0.8303 | 0.9655 | 0.8306 | 0.9644 | 59.65 | 304.12 |
| ViT_large_patch16_384 | 0.8513 | 0.9736 | 0.8517 | 0.9736 | 174.70 | 304.12 | | ViT_large_patch16_384 | 0.8513 | 0.9736 | 0.8517 | 0.9736 | 174.70 | 304.12 |
| ViT_large_patch32_384 | 0.8153 | 0.9608 | 0.815 | - | 44.24 | 306.48 | | ViT_large_patch32_384 | 0.8153 | 0.9608 | 0.815 | - | 44.24 | 306.48 |
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) | | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| DeiT_tiny_patch16_224 | 0.718 | 0.910 | 0.722 | 0.911 | 1.07 | 5.68 | | DeiT_tiny_patch16_224 | 0.7208 | 0.9112 | 0.722 | 0.911 | 1.07 | 5.68 |
| DeiT_small_patch16_224 | 0.796 | 0.949 | 0.799 | 0.950 | 4.24 | 21.97 | | DeiT_small_patch16_224 | 0.7982 | 0.9495 | 0.799 | 0.950 | 4.24 | 21.97 |
| DeiT_base_patch16_224 | 0.817 | 0.957 | 0.818 | 0.956 | 16.85 | 86.42 | | DeiT_base_patch16_224 | 0.8180 | 0.9558 | 0.818 | 0.956 | 16.85 | 86.42 |
| DeiT_base_patch16_384 | 0.830 | 0.962 | 0.829 | 0.972 | 49.35 | 86.42 | | DeiT_base_patch16_384 | 0.8289 | 0.9624 | 0.829 | 0.972 | 49.35 | 86.42 |
| DeiT_tiny_distilled_patch16_224 | 0.741 | 0.918 | 0.745 | 0.919 | 1.08 | 5.87 | | DeiT_tiny_distilled_patch16_224 | 0.7449 | 0.9192 | 0.745 | 0.919 | 1.08 | 5.87 |
| DeiT_small_distilled_patch16_224 | 0.809 | 0.953 | 0.812 | 0.954 | 4.26 | 22.36 | | DeiT_small_distilled_patch16_224 | 0.8117 | 0.9538 | 0.812 | 0.954 | 4.26 | 22.36 |
| DeiT_base_distilled_patch16_224 | 0.831 | 0.964 | 0.834 | 0.965 | 16.93 | 87.18 | | DeiT_base_distilled_patch16_224 | 0.8330 | 0.9647 | 0.834 | 0.965 | 16.93 | 87.18 |
| DeiT_base_distilled_patch16_384 | 0.851 | 0.973 | 0.852 | 0.972 | 49.43 | 87.18 | | DeiT_base_distilled_patch16_384 | 0.8520 | 0.9720 | 0.852 | 0.972 | 49.43 | 87.18 |
关于 Params、FLOPs、Inference speed 等信息,敬请期待。
<a name='3'></a> <a name='3'></a>
...@@ -67,4 +65,3 @@ DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020 ...@@ -67,4 +65,3 @@ DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020
| DeiT_small_<br>distilled_patch16_224 | 256 | 224 | 3.70 | 6.20 | 10.53 | | DeiT_small_<br>distilled_patch16_224 | 256 | 224 | 3.70 | 6.20 | 10.53 |
| DeiT_base_<br>distilled_patch16_224 | 256 | 224 | 6.17 | 14.94 | 28.58 | | DeiT_base_<br>distilled_patch16_224 | 256 | 224 | 6.17 | 14.94 | 28.58 |
| DeiT_base_<br>distilled_patch16_384 | 384 | 384 | 14.12 | 48.76 | 97.09 | | DeiT_base_<br>distilled_patch16_384 | 384 | 384 | 14.12 | 48.76 | 97.09 |
# 图像识别 # 图像识别
---
在 PaddleClas 中,图像识别,是指给定一张查询图像,系统能够识别该查询图像类别。广义上,图像分类也是图像识别的一种。但是与普通图像识别不同的是,图像分类只能判别出模型已经学习的类别,如果需要添加新的类别,分类模型只能重新训练。PaddleClas 中的图像识别,**对于陌生类别,只需要更新相应的检索库**,就能够正确的识别出查询图像的类别,而无需重新训练模型,这大大增加了识别系统的可用性,同时降低了更新模型的需求,方便用户部署应用。 在 PaddleClas 中,**图像识别**是指给定一张查询图像,系统能够识别该查询图像类别。广义上,图像分类也是图像识别的一种。但图像分类只能判断模型学习过的类别,如果需要添加新的类别,分类模型只能重新训练,这显然会增加实际应用的成本,限制了应用场景。
因此 PaddleClas 通过主体检测+特征提取+特征检索的方式来实现图像识别,其好处是**对于陌生类别,只需要更新相应的检索库**,就能够正确的识别出查询图像的类别,而无需重新训练模型,这大大增加了识别系统的可用性,同时降低了更新模型的需求,方便用户部署应用。
对于一张待查询图片,PaddleClas 中的图像识别流程主要分为三部分: 对于一张待查询图片,PaddleClas 中的图像识别流程主要分为三部分:
1. 主体检测:对于给定一个查询图像,主体检测器首先检测出图像的物体,从而去掉无用背景信息,提高识别精度。 1. 主体检测:对于一张给定的查询图像,主体检测器检测出图像中的主体候选区域,过滤掉无用的背景信息,提高后续识别精度。
2. 特征提取:对主体检测的各个候选区域,通过特征模型,进行特征提取 2. 特征提取:将主体检测的各个候选区域裁剪出来,输入到通过特征提取模型中进行特征提取。
3. 特征检索:将提取的特征与特征库中的向量进行相似度比对,得到其标签信息 3. 特征检索:将提取的特征与特征库中的向量进行相似度比对,计算其相似度和标签信息。
完整的图像识别系统,如下图所示
<img src="../../images/structure.png"/>
其中特征库,需要利用已经标注好的图像数据集提前建立。完整的图像识别系统,如下图所示 在Android端或PC端体验整体图像识别系统,或查看特征库建立方法,可以参考 [图像识别快速开始文档](../quick_start/quick_start_recognition.md)
![](../../images/structure.jpg) 以下内容,主要对上述三个步骤的训练部分进行介绍。
体验整体图像识别系统,或查看特征库建立方法,详见[图像识别快速开始文档](../quick_start/quick_start_recognition.md)。其中,图像识别快速开始文档主要讲解整体流程的使用过程。以下内容,主要对上述三个步骤的训练部分进行介绍。
首先,请参考[安装指南](../installation/install_paddleclas.md)配置运行环境。 在训练开始之前,请参考 [安装指南](../installation/install_paddleclas.md) 配置运行环境。
## 目录 ## 目录
- [1. 主体检测](#1) - [1. 主体检测](#1-主体检测)
- [2. 特征模型训练](#2) - [2. 特征提取模型训练](#2-特征提取模型训练)
- [2.1. 特征模型数据准备与处理](#2.1) - [2.1 特征提取模型数据的准备与处理](#21-特征提取模型数据的准备与处理)
- [2. 2 特征模型基于单卡 GPU 上的训练与评估](#2.2) - [2.2 特征提取模型在 GPU 上的训练与评估](#22-特征提取模型在-gpu-上的训练与评估)
- [2.2.1 特征模型训练](#2.2.2) - [2.2.1 特征提取模型训练](#221-特征提取模型训练)
- [2.2.2 特征模型恢复训练](#2.2.2) - [2.2.2 特征提取模型恢复训练](#222-特征提取模型恢复训练)
- [2.2.3 特征模型评估](#2.2.3) - [2.2.3 特征提取模型评估](#223-特征提取模型评估)
- [2.3 特征模型导出 inference 模型](#2.3) - [2.3 特征提取模型导出 inference 模型](#23-特征提取模型导出-inference-模型)
- [3. 特征检索](#3) - [3. 特征检索](#3-特征检索)
- [4. 基础知识](#4) - [4. 基础知识](#4-基础知识)
<a name="1"></a> <a name="1"></a>
...@@ -38,142 +43,143 @@ ...@@ -38,142 +43,143 @@
[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}] [{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}]
``` ```
关于主体检测训练方法可以参考: [PaddleDetection 训练教程](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#4-%E8%AE%AD%E7%BB%83) 关于主体检测数据集构造与模型训练方法可以参考: [30分钟快速上手PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#30%E5%88%86%E9%92%9F%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8Bpaddledetection)
更多关于 PaddleClas 中提供的主体检测的模型介绍与下载请参考:[主体检测教程](../image_recognition_pipeline/mainbody_detection.md) 更多关于 PaddleClas 中提供的主体检测的模型介绍与下载请参考:[主体检测教程](../image_recognition_pipeline/mainbody_detection.md)
<a name="2"></a> <a name="2"></a>
## 2. 特征模型训练 ## 2. 特征提取模型训练
为了快速体验 PaddleClas 图像检索模块,以下使用经典的200类鸟类细粒度分类数据集 [CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf) 为例,介绍特征提取模型训练过程。CUB_200_2011 下载方式请参考 [CUB_200_2011官网](https://www.vision.caltech.edu/datasets/cub_200_2011/)
<a name="2.1"></a> <a name="2.1"></a>
### 2.1 特征模型数据的准备与处理 ### 2.1 特征提取模型数据的准备与处理
* 进入 `PaddleClas` 目录 * 进入 `PaddleClas` 目录
```bash ```shell
## linux or mac, $path_to_PaddleClas 表示 PaddleClas 的根目录,用户需要根据自己的真实目录修改 cd PaddleClas
cd $path_to_PaddleClas ```
```
* 进入 `dataset` 目录,为了快速体验 PaddleClas 图像检索模块,此处使用的数据集为 [CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf),其是一个包含 200 类鸟的细粒度鸟类数据集。首先,下载 CUB_200_2011 数据集,下载方式请参考[官网](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) * 进入 `dataset` 目录
```shell ```shell
# linux or mac # 进入dataset目录
cd dataset cd dataset
# 将下载后的数据拷贝到此目录 # 将下载后的数据拷贝到dataset目录下
cp {数据存放的路径}/CUB_200_2011.tgz . cp {数据存放的路径}/CUB_200_2011.tgz ./
# 解压 # 解压该数据集
tar -xzvf CUB_200_2011.tgz tar -xzvf CUB_200_2011.tgz
#进入 CUB_200_2011 目录 #进入 CUB_200_2011 目录
cd CUB_200_2011 cd CUB_200_2011
``` ```
该数据集在用作图像检索任务时,通常将前 100 类当做训练集,后 100 类当做测试集,所以此处需要将下载的数据集做一些后处理,来更好的适应 PaddleClas 的图像检索训练。 * 该数据集在用作图像检索任务时,通常将前 100 类当做训练集,后 100 类当做测试集,所以此处需要将下载的数据集做一些后处理,来更好的适应 PaddleClas 的图像检索训练。
```shell ```shell
#新建 train 和 test 目录 #新建 train 和 test 目录
mkdir train && mkdir test mkdir train
mkdir test
#将数据分成训练集和测试集,前 100 类作为训练集,后 100 类作为测试集 #将数据分成训练集和测试集,前 100 类作为训练集,后 100 类作为测试集
ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh
ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh
#生成 train_list 和 test_list #生成 train_list 和 test_list
tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt
tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt
``` ```
至此,现在已经得到 `CUB_200_2011` 的训练集(`train` 目录)、测试集(`test` 目录)、`train_list.txt``test_list.txt` 至此,现在已经得到 `CUB_200_2011` 的训练集(`train` 目录)、测试集(`test` 目录)、`train_list.txt``test_list.txt`
数据处理完毕后,`CUB_200_2011` 中的 `train` 目录下应有如下结构: 数据处理完毕后,`CUB_200_2011` 中的 `train` 目录下应有如下结构:
``` ```
├── 1 CUB_200_2011/train/
│ ├── Black_Footed_Albatross_0001_796111.jpg ├── 1
│ ├── Black_Footed_Albatross_0002_55.jpg │ ├── Black_Footed_Albatross_0001_796111.jpg
│ ├── Black_Footed_Albatross_0002_55.jpg
... ...
├── 10 ├── 10
│ ├── Red_Winged_Blackbird_0001_3695.jpg │ ├── Red_Winged_Blackbird_0001_3695.jpg
│ ├── Red_Winged_Blackbird_0005_5636.jpg │ ├── Red_Winged_Blackbird_0005_5636.jpg
... ...
``` ```
`train_list.txt` 应为:
``` `train_list.txt` 应为:
train/99/Ovenbird_0137_92639.jpg 99 1
train/99/Ovenbird_0136_92859.jpg 99 2
train/99/Ovenbird_0135_93168.jpg 99 3
train/99/Ovenbird_0131_92559.jpg 99 4
train/99/Ovenbird_0130_92452.jpg 99 5
...
```
其中,分隔符为空格" ", 三列数据的含义分别是训练数据的路径、训练数据的 label 信息、训练数据的 unique id。
测试集格式与训练集格式相同。 ```
train/99/Ovenbird_0137_92639.jpg 99 1
train/99/Ovenbird_0136_92859.jpg 99 2
train/99/Ovenbird_0135_93168.jpg 99 3
train/99/Ovenbird_0131_92559.jpg 99 4
train/99/Ovenbird_0130_92452.jpg 99 5
...
```
其中,分隔符为空格`" "`, 三列数据的含义分别是`训练数据的相对路径``训练数据的 label 标签``训练数据的 unique id`。测试集格式与训练集格式相同。
**注意** * 构建完毕后返回 `PaddleClas` 根目录
* 当 gallery dataset 和 query dataset 相同时,为了去掉检索得到的第一个数据(检索图片本身无须评估),每个数据需要对应一个 unique id,用于后续评测 mAP、recall@1 等指标。关于 gallery dataset 与 query dataset 的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于 mAP、recall@1 等评测指标请参考[图像检索评价指标](#图像检索评价指标) ```shell
# linux or mac
cd ../../
```
返回 `PaddleClas` 根目录 **注意**
```shell * 当 gallery dataset 和 query dataset 相同时,为了去掉检索得到的第一个数据(检索图片本身不能出现在gallery中),每个数据需要对应一个 unique id(一般使用从1开始的自然数为unique id,如1,2,3,...),用于后续评测 `mAP``recall@1` 等指标。关于 gallery dataset 与 query dataset 的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于 `mAP``recall@1` 等评测指标请参考[图像检索评价指标](#图像检索评价指标)
# linux or mac
cd ../../
```
<a name="2.2"></a> <a name="2.2"></a>
### 2.2 特征模型 GPU 上的训练与评估 ### 2.2 特征提取模型在 GPU 上的训练与评估
在基于单卡 GPU 上训练与评估,推荐使用 `tools/train.py``tools/eval.py` 脚本。
PaddleClas 支持使用 VisualDL 可视化训练过程。VisualDL 是飞桨可视化分析工具,以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、高维数据分布等。可帮助用户更清晰直观地理解深度学习模型训练过程及模型结构,进而实现高效的模型优化。更多细节请查看[VisualDL](../others/VisualDL.md) 下面以 MobileNetV1 模型为例,介绍特征提取模型在 GPU 上的训练与评估流程
<a name="2.2.1"></a> <a name="2.2.1"></a>
#### 2.2.1 特征模型训练 #### 2.2.1 特征提取模型训练
准备好配置文件之后,可以使用下面的方式启动图像检索任务的训练。PaddleClas 训练图像检索任务的方法是度量学习,关于度量学习的解析请参考[度量学习](#度量学习) 准备好配置文件之后,可以使用下面的方式启动图像检索任务的训练。PaddleClas 训练图像检索任务的方法是度量学习,关于度量学习的解析请参考[度量学习](#度量学习)
```shell ```shell
# 单卡 GPU # 单卡 GPU
python3 tools/train.py \ python3.7 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Arch.Backbone.pretrained=True \ -o Arch.Backbone.pretrained=True \
-o Global.device=gpu -o Global.device=gpu
# 多卡 GPU # 多卡 GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch tools/train.py \ python3.7 -m paddle.distributed.launch tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Arch.Backbone.pretrained=True \ -o Arch.Backbone.pretrained=True \
-o Global.device=gpu -o Global.device=gpu
``` ```
其中,`-c` 用于指定配置文件的路径,`-o` 用于指定需要修改或者添加的参数,其中 `-o Arch.Backbone.pretrained=True` 表示 Backbone 部分使用预训练模型,此外,`Arch.Backbone.pretrained` 也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu` 表示使用 GPU 进行训练。如果希望使用 CPU 进行训练,则需要将 `Global.device` 设置为 `cpu` **注**:其中,`-c` 用于指定配置文件的路径,`-o` 用于指定需要修改或者添加的参数,其中 `-o Arch.Backbone.pretrained=True` 表示 Backbone 在训练开始前会加载预训练模型;`-o Arch.Backbone.pretrained` 也可以指定为模型权重文件的路径,使用时换成自己的预训练模型权重文件的路径即可;`-o Global.device=gpu` 表示使用 GPU 进行训练。如果希望使用 CPU 进行训练,则设置 `-o Global.device=cpu`即可
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md) 更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)
运行上述命令,可以看到输出日志,示例如下: 运行上述训练命令,可以看到输出日志,示例如下:
``` ```log
... ...
[Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154 [Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154
... ...
[Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238 [Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238
... ...
``` ```
此处配置文件的 Backbone 是 MobileNetV1,如果想使用其他 Backbone,可以重写参数 `Arch.Backbone.name`,比如命令中增加 `-o Arch.Backbone.name={其他 Backbone}`。此外,由于不同模型 `Neck` 部分的输入维度不同,更换 Backbone 后可能需要改写此处的输入大小,改写方式类似替换 Backbone 的名字。
此处配置文件的 Backbone 是 MobileNetV1,如果想使用其他 Backbone,可以重写参数 `Arch.Backbone.name`,比如命令中增加 `-o Arch.Backbone.name={其他 Backbone 的名字}`。此外,由于不同模型 `Neck` 部分的输入维度不同,更换 Backbone 后可能需要改写 `Neck` 的输入大小,改写方式类似替换 Backbone 的名字。
在训练 Loss 部分,此处使用了 [CELoss](../../../ppcls/loss/celoss.py)[TripletLossV2](../../../ppcls/loss/triplet.py),配置文件如下: 在训练 Loss 部分,此处使用了 [CELoss](../../../ppcls/loss/celoss.py)[TripletLossV2](../../../ppcls/loss/triplet.py),配置文件如下:
``` ```yaml
Loss: Loss:
Train: Train:
- CELoss: - CELoss:
...@@ -183,43 +189,46 @@ Loss: ...@@ -183,43 +189,46 @@ Loss:
margin: 0.5 margin: 0.5
``` ```
最终的总 Loss 是所有 Loss 的加权和,其中 weight 定义了特定 Loss 在最终总 Loss 的权重。如果想替换其他 Loss,也可以在配置文件中更改 Loss 字段,目前支持的 Loss 请参考 [Loss](../../../ppcls/loss) 最终的总 Loss 是所有 Loss 的加权和,其中 weight 定义了特定 Loss 在最终总 Loss 的权重。如果想替换其他 Loss,也可以在配置文件中更改 Loss 字段,目前支持的 Loss 请参考 [Loss](../../../ppcls/loss/__init__.py)
<a name="2.2.2"></a> <a name="2.2.2"></a>
#### 2.2.2 特征模型恢复训练 #### 2.2.2 特征提取模型恢复训练
如果训练任务因为其他原因被终止,可以加载断点权重文件,继续训练: 如果训练任务因为其他原因被终止,且训练过程中有保存权重文件,可以加载断点权重文件,继续训练:
```shell ```shell
# 单卡 # 单卡恢复训练
python3 tools/train.py \ python33.7 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \ -o Global.checkpoints="./output/RecModel/epoch_5" \
-o Global.device=gpu -o Global.device=gpu
# 多卡
# 多卡恢复训练
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch tools/train.py \ python3.7 -m paddle.distributed.launch tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \ -o Global.checkpoints="./output/RecModel/epoch_5" \
-o Global.device=gpu -o Global.device=gpu
``` ```
其中配置文件不需要做任何修改,只需要在继续训练时设置 `Global.checkpoints` 参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。 其中配置文件不需要做任何修改,只需要在继续训练时设置 `Global.checkpoints` 参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
**注意** **注意**
* `-o Global.checkpoints` 参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点 `5` 继续训练,则 `Global.checkpoints` 参数只需设置为 `"./output/RecModel/epoch_5"`,PaddleClas 会自动补充后缀名。 * `-o Global.checkpoints` 后的参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点 `epoch_5` 继续训练,则 `Global.checkpoints` 参数只需设置为 `"./output/RecModel/epoch_5"`,PaddleClas 会自动补充后缀名。
```shell `epoch_5.pdparams`所在目录如下所示:
```log
output/ output/
└── RecModel └── RecModel
├── best_model.pdopt ├── best_model.pdopt
├── best_model.pdparams ├── best_model.pdparams
├── best_model.pdstates ├── best_model.pdstates
├── epoch_1.pdopt ├── epoch_5.pdopt
├── epoch_1.pdparams ├── epoch_5.pdparams
├── epoch_1.pdstates ├── epoch_5.pdstates
. .
. .
. .
...@@ -227,51 +236,51 @@ python3 -m paddle.distributed.launch tools/train.py \ ...@@ -227,51 +236,51 @@ python3 -m paddle.distributed.launch tools/train.py \
<a name="2.2.3"></a> <a name="2.2.3"></a>
#### 2.2.3 特征模型评估 #### 2.2.3 特征提取模型评估
可以通过以下命令进行模型评估。 可以通过以下命令进行指定模型进行评估。
```bash ```bash
# 单卡 # 单卡评估
python3 tools/eval.py \ python3.7 tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model -o Global.pretrained_model=./output/RecModel/best_model
# 多卡
# 多卡评估
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch tools/eval.py \ python3.7 -m paddle.distributed.launch tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model -o Global.pretrained_model=./output/RecModel/best_model
``` ```
上述命令将使用 `./configs/quick_start/MobileNetV1_retrieval.yaml` 作为配置文件,对上述训练得到的模型 `./output/RecModel/best_model` 进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过 `-o` 参数更新配置,如上所示。 上述命令将使用 `./configs/quick_start/MobileNetV1_retrieval.yaml` 作为配置文件,对上述训练得到的模型 `./output/RecModel/best_model.pdparams` 进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过 `-o` 参数更新配置,如上所示。
可配置的部分评估参数说明如下: 可配置的部分评估参数说明如下:
* `Arch.name`:模型名称
* `Global.pretrained_model`:待评估的模型的预训练模型文件路径,不同于 `Global.Backbone.pretrained`,此处的预训练模型是整个模型的权重,而 `Global.Backbone.pretrained` 只是 Backbone 部分的权重。当需要做模型评估时,需要加载整个模型的权重。 * `Global.pretrained_model`:待评估的模型的预训练模型文件路径,不同于 `Global.Backbone.pretrained`,此处的预训练模型是整个模型的权重,而 `Global.Backbone.pretrained` 只是 Backbone 部分的权重。当需要做模型评估时,需要加载整个模型的权重。
* `Metric.Eval`:待评估的指标,默认评估 recall@1、recall@5、mAP。当你不准备评测某一项指标时,可以将对应的试标从配置文件中删除;当你想增加某一项评测指标时,也可以参考 [Metric](../../../ppcls/metric/metrics.py) 部分在配置文件 `Metric.Eval` 中添加相关的指标。 * `Metric.Eval`:待评估的指标,默认评估 `recall@1``recall@5``mAP`。当你不准备评测某一项指标时,可以将对应的试标从配置文件中删除;当你想增加某一项评测指标时,也可以参考 [Metric](../../../ppcls/metric/metrics.py) 部分在配置文件 `Metric.Eval` 中添加相关的指标。
**注意:** **注意:**
* 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas 会自动补齐 `.pdparams` 的后缀,如 [2.2.2 特征模型恢复训练](#2.2.2) * 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas 会自动补齐 `.pdparams` 的后缀,如 [2.2.2 特征提取模型恢复训练](#2.2.2)
* Metric learning 任务一般不评测 TopkAcc * Metric learning 任务一般不评测 `TopkAcc` 指标
<a name="2.3"></a> <a name="2.3"></a>
### 2.3 特征模型导出 inference 模型 ### 2.3 特征提取模型导出 inference 模型
通过导出 inference 模型,PaddlePaddle 支持使用预测引擎进行预测推理。对训练好的模型进行转换: 通过导出 inference 模型,PaddlePaddle 支持使用预测引擎进行预测推理。对训练好的模型进行转换:
```bash ```bash
python3 tools/export_model.py \ python3.7 tools/export_model.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=output/RecModel/best_model \ -o Global.pretrained_model=output/RecModel/best_model \
-o Global.save_inference_dir=./inference -o Global.save_inference_dir=./inference
``` ```
其中,`Global.pretrained_model` 用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2.2 特征模型恢复训练](#2.2.2))。当执行后,会在当前目录下生成 `./inference` 目录,目录下包含 `inference.pdiparams``inference.pdiparams.info``inference.pdmodel` 文件。`Global.save_inference_dir` 可以指定导出 inference 模型的路径。此处保存的 inference 模型在 embedding 特征层做了截断,即模型最终的输出为 n 维 embedding 特征。 其中,`Global.pretrained_model` 用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2.2 特征提取模型恢复训练](#2.2.2))。当执行后,会在当前目录下生成 `./inference` 目录,目录下包含 `inference.pdiparams``inference.pdiparams.info``inference.pdmodel` 文件。`Global.save_inference_dir` 可以指定导出 inference 模型文件夹的路径。此处保存的 inference 模型在 embedding 特征层做了截断,即模型的推理输出为 n 维特征。
上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理。使用 inference 模型推理的流程可以参考[基于 Python 预测引擎预测推理](../inference_deployment/python_deploy.md) 有了上述命令将生成的模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),接下来就可以使用预测引擎进行推理。使用 inference 模型推理的流程可以参考[基于 Python 预测引擎预测推理](../inference_deployment/python_deploy.md)
<a name="3"></a> <a name="3"></a>
...@@ -279,14 +288,14 @@ python3 tools/export_model.py \ ...@@ -279,14 +288,14 @@ python3 tools/export_model.py \
PaddleClas 图像检索部分目前支持的环境如下: PaddleClas 图像检索部分目前支持的环境如下:
```shell | 操作系统 | 推理硬件 |
└── CPU/单卡 GPU | :------- | :------- |
├── Linux | Linux | CPU/GPU |
├── MacOS | Windows | CPU/GPU |
└── Windows | MacOS | CPU/GPU |
```
此部分使用了 [Faiss](https://github.com/facebookresearch/faiss) 作为检索库,其是一个高效的特征检索及聚类的库。此库中集成了多种相似度检索算法,以满足不同的检索场景。在 PaddleClas 中,支持三种检索算法:
此部分使用了第三方开源库 [Faiss](https://github.com/facebookresearch/faiss) 作为检索工具,它是一个高效的特征检索与聚类的库,集成了多种相似度检索算法,以满足不同的检索场景。PaddleClas 目前支持三种检索算法:
- **HNSW32**: 一种图索引方法。检索精度较高,速度较快。但是特征库只支持添加图像功能,不支持删除图像特征功能。(默认方法) - **HNSW32**: 一种图索引方法。检索精度较高,速度较快。但是特征库只支持添加图像功能,不支持删除图像特征功能。(默认方法)
- **IVF**:倒排索引检索方法。速度较快,但是精度略低。特征库支持增加、删除图像特功能。 - **IVF**:倒排索引检索方法。速度较快,但是精度略低。特征库支持增加、删除图像特功能。
...@@ -296,22 +305,27 @@ PaddleClas 图像检索部分目前支持的环境如下: ...@@ -296,22 +305,27 @@ PaddleClas 图像检索部分目前支持的环境如下:
具体安装方法如下: 具体安装方法如下:
```python ```shell
pip install faiss-cpu==1.7.1post2 python3.7 -m pip install faiss-cpu==1.7.1post2
``` ```
若使用时,不能正常引用,则 `uninstall` 之后,重新 `install`,尤其是 `windows` 下。 若无法正常使用faiss,可以按以下命令先将其卸载,然后重新安装(Windows系统中该问题比较常见)。
```shell
python3.7 -m pip uninstall faiss-cpu
python3.7 -m pip install faiss-cpu==1.7.1post2
```
<a name="4"></a> <a name="4"></a>
## 4. 基础知识 ## 4. 基础知识
图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次,对这些图像的特征向量用欧式距离或余弦距离进行最近邻搜索以找到底库中相似的图像,最后,可以使用一些后处理技术对检索结果进行微调,确定被识别图像的类别等信息。所以,决定一个图像检索算法性能的关键在于图像对应的特征向量的好坏 图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次对这些图像的特征向量用合适的距离度量函数进行最近邻搜索以找到数据库图像中相似的图像,最后,可能会使用一些后处理对检索结果进行进一步优化,得到待识别图像的类别、相似度等信息。所以,图像检索算法性能的关键在于图像提取的特征向量的表示能力强弱
<a name="度量学习"></a> <a name="度量学习"></a>
- 度量学习(Metric Learning) - 度量学习(Metric Learning)
度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法(kNN、k-means 等)取得较好的性能。深度度量学习(Deep Metric Learning)是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间(嵌入空间,embedding space)的映射,使得同类对象在嵌入空间上使用常用的距离函数(欧氏距离、cosine 距离等)计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。更详细的介绍请参考[此文档](../algorithm_introduction/metric_learning.md) 度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法(kNN、k-means 等)取得较好的性能。深度度量学习(Deep Metric Learning)是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间(嵌入空间,embedding space)的映射,使得同类对象在嵌入空间上使用常用的距离函数(欧氏距离、cosine 距离等)计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。更详细的介绍请参考[此文档](../algorithm_introduction/metric_learning.md)
<a name="图像检索数据集介绍"></a> <a name="图像检索数据集介绍"></a>
...@@ -319,19 +333,17 @@ pip install faiss-cpu==1.7.1post2 ...@@ -319,19 +333,17 @@ pip install faiss-cpu==1.7.1post2
- 训练集合(train dataset):用来训练模型,使模型能够学习该集合的图像特征。 - 训练集合(train dataset):用来训练模型,使模型能够学习该集合的图像特征。
- 底库数据集合(gallery dataset):用来提供图像检索任务中的底库数据,该集合可与训练集或测试集相同,也可以不同,当与训练集相同时,测试集的类别体系应与训练集的类别体系相同。 - 底库数据集合(gallery dataset):用来提供图像检索任务中的底库数据,该集合可与训练集或测试集相同,也可以不同,当与训练集相同时,测试集的类别体系应与训练集的类别体系相同。
- 测试集合(query dataset):用来测试模型的好坏,通常要对测试集的每一张测试图片进行特征提取,之后和底库数据的特征进行距离匹配,得到识别结果,后根据识别结果计算整个测试集的指标。 - 测试集合(query dataset):用来测试模型的检索性能,通常要对测试集的每一张测试图片进行特征提取,之后和底库数据的特征进行距离匹配,得到检索结果,后根据检索结果计算模型在整个测试集上的性能指标。
<a name="图像检索评价指标"></a> <a name="图像检索评价指标"></a>
- 图像检索评价指标 - 图像检索评价指标
<a name="召回率"></a> <a name="召回率"></a>
- 召回率(recall):表示预测为正例且标签为正例的个数 / 标签为正例的个数 - 召回率(recall):表示预测为正例且标签为正例的个数 / 标签为正例的个数
- `recall@k`:检索的 top-k 结果中预测为正例且标签为正例的个数 / 标签为正例的个数
- recall@1:检索的 top-1 中预测正例且标签为正例的个数 / 标签为正例的个数
- recall@5:检索的 top-5 中所有预测正例且标签为正例的个数 / 标签为正例的个数
<a name="平均检索精度"></a> <a name="平均检索精度"></a>
- 平均检索精度(mAP) - 平均检索精度(mAP)
- AP: AP 指的是不同召回率上的正确率的平均值 - `AP`: AP 指的是不同召回率上的正确率的平均值
- mAP: 测试集中所有图片对应的 AP 的平均值 - `mAP`: 测试集中所有图片对应的 AP 的平均值
# 更新日志 # 更新日志
- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)
- 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo。 - 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo。
- 2021.10.23 发布轻量级图像识别系统PP-ShiTu,CPU上0.2s即可完成在10w+库的图像识别。[点击这里](../quick_start/quick_start_recognition.md)立即体验。 - 2021.10.23 发布轻量级图像识别系统PP-ShiTu,CPU上0.2s即可完成在10w+库的图像识别。[点击这里](../quick_start/quick_start_recognition.md)立即体验。
- 2021.09.17 发布PP-LCNet系列超轻量骨干网络模型, 在Intel CPU上,单张图像预测速度约5ms,ImageNet-1K数据集上Top1识别准确率达到80.82%,超越ResNet152的模型效果。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](../models/PP-LCNet.md),相关指标和预训练权重可以从 [这里](../algorithm_introduction/ImageNet_models.md)下载。 - 2021.09.17 发布PP-LCNet系列超轻量骨干网络模型, 在Intel CPU上,单张图像预测速度约5ms,ImageNet-1K数据集上Top1识别准确率达到80.82%,超越ResNet152的模型效果。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](../models/PP-LCNet.md),相关指标和预训练权重可以从 [这里](../algorithm_introduction/ImageNet_models.md)下载。
......
# 图像识别快速开始 ## 图像识别快速体验
本文档包含 3 个部分:环境配置、图像识别体验、未知类别的图像识别体验。 本文档包含 2 个部分:PP-ShiTu android端 demo 快速体验与PP-ShiTu PC端 demo 快速体验。
如果图像类别已经存在于图像索引库中,那么可以直接参考[图像识别体验](#图像识别体验)章节,完成图像识别过程;如果希望识别未知类别的图像,即图像类别之前不存在于索引库中,那么可以参考[未知类别的图像识别体验](#未知类别的图像识别体验)章节,完成建立索引并识别的过程。 如果图像类别已经存在于图像索引库中,那么可以直接参考[图像识别体验](#图像识别体验)章节,完成图像识别过程;如果希望识别未知类别的图像,即图像类别之前不存在于索引库中,那么可以参考[未知类别的图像识别体验](#未知类别的图像识别体验)章节,完成建立索引并识别的过程。
## 目录 ## 目录
* [1. 环境配置](#环境配置) - [1. PP-ShiTu android demo 快速体验](#1-pp-shitu-android-demo-快速体验)
* [2. 图像识别体验](#图像识别体验) - [1.1 安装 PP-ShiTu android demo](#11-安装-pp-shitu-android-demo)
* [2.1 下载、解压 inference 模型与 demo 数据](#2.1) - [1.2 操作说明](#12-操作说明)
* [2.2 瓶装饮料识别与检索](#瓶装饮料识别与检索) - [2. PP-ShiTu PC端 demo 快速体验](#2-pp-shitu-pc端-demo-快速体验)
* [2.2.1 识别单张图像](#识别单张图像) - [2.1 环境配置](#21-环境配置)
* [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别) - [2.2 图像识别体验](#22-图像识别体验)
* [3. 未知类别的图像识别体验](#未知类别的图像识别体验) - [2.2.1 下载、解压 inference 模型与 demo 数据](#221-下载解压-inference-模型与-demo-数据)
* [3.1 准备新的数据与标签](#准备新的数据与标签) - [2.2.2 瓶装饮料识别与检索](#222-瓶装饮料识别与检索)
* [3.2 建立新的索引库](#建立新的索引库) - [2.2.2.1 识别单张图像](#2221-识别单张图像)
* [3.3 基于新的索引库的图像识别](#基于新的索引库的图像识别) - [2.2.2.2 基于文件夹的批量识别](#2222-基于文件夹的批量识别)
* [4. 服务端识别模型列表](#4) - [2.3 未知类别的图像识别体验](#23-未知类别的图像识别体验)
- [2.3.1 准备新的数据与标签](#231-准备新的数据与标签)
- [2.3.2 建立新的索引库](#232-建立新的索引库)
- [2.3.3 基于新的索引库的图像识别](#233-基于新的索引库的图像识别)
- [2.4 服务端识别模型列表](#24-服务端识别模型列表)
<a name="PP-ShiTu android 快速体验"></a>
## 1. PP-ShiTu android demo 快速体验
<a name="安装"></a>
### 1.1 安装 PP-ShiTu android demo
可以通过扫描二维码或者[点击链接](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk)下载并安装APP
<div align=center><img src="../../images/quick_start/android_demo/PPShiTu_qrcode.png" height="400" width="400"/></div>
<a name="功能体验"></a>
### 1.2 功能体验
目前 PP-ShiTu android demo 具有图像检索、图像加库、保存检索库、初始化检索库、查看检索库标签等基本功能,接下来介绍如何体验这几个功能。
#### (1)识别图像中的物体
点击下方的“拍照识别”按钮<img src="../../images/quick_start/android_demo/paizhaoshibie_100.png" width="25" height="25"/>或者“本地识别”按钮<img src="../../images/quick_start/android_demo/bendishibie_100.png" width="25" height="25"/>,即可拍摄一张图像或者选中一张图像,然后等待几秒钟,APP便会将图像中的主体框标注出来并且在图像下方给出预测的类别以及预测时间等信息。
在选择好要检索的图片之后,首先会通过检测模型进行主体检测,得到图像中的物体的区域,然后将这块区域裁剪出来输入到识别模型中,得到对应的特征向量并在检索库中检索,返回并显示最终的检索结果。
假设待检索的图像如下:
<img src="../../images/recognition/drink_data_demo/test_images/nongfu_spring.jpeg" width="400" height="600"/>
得到的检索结果可视化如下:
<img src="../../images/quick_start/android_demo/android_nongfu_spring.JPG" width="400" height="800"/>
#### (2)向检索库中添加新的类别或物体
点击上方的“拍照上传”按钮<img src="../../images/quick_start/android_demo/paizhaoshangchuan_100.png" width="25" height="25"/>或者“本地上传”按钮<img src="../../images/quick_start/android_demo/bendishangchuan_100.png" width="25" height="25"/>,即可拍摄一张图像或从图库中选择一张图像,然后再输入这张图像的类别名字(比如`keyboard`),点击“确定”按钮,即可将图片对应的特征向量与标签加入检索库。
在选择好要入库的图片之后,首先会通过检测模型进行主体检测,得到图像中的物体的区域,然后将这块区域裁剪出来输入到识别模型中,得到对应的特征向量,再与用户输入的图像标签一起加入到检索库中。
**温馨提示:** 使用安卓demo管理类别主要用于功能体验,如果您有较为重要的数据要生成检索库,推荐使用[检索库管理工具](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/inference_deployment/shitu_gallery_manager.md)
#### (3) 保存检索库
点击上方的“保存修改”按钮<img src="../../images/quick_start/android_demo/baocunxiugai_100.png" width="25" height="25"/>,即可将当前库以 `latest` 的库名保存下来。
再次打开程序时,将会自动选择使用`latest`库。app仅存在一个自定义库,每次保存时会覆盖之前的库。
#### (4) 检索库恢复出厂设置
**警告:本操作无法撤销,初始化后自定义的标签和类别都会被删除,请谨慎操作**
点击上方的“初始化 ”按钮<img src="../../images/quick_start/android_demo/reset_100.png" width="25" height="25"/>,删除所有自定义的标签和类别,恢复出厂特征库。
初始化库时会删掉`latest`库(如果存在),自动将检索库和标签库切换成 `original.index``original.txt`。不管是否有保存过,自定义的标签和类别都会被清空。
#### (5) 查看当前检索库中的类别列表
点击“类别查询”按钮<img src="../../images/quick_start/android_demo/leibiechaxun_100.png" width="25" height="25"/>,即可在弹窗中查看。
当检索标签库过多(如本demo自带的196类检索标签库)时,可在弹窗中滑动查看。
## 2. PP-ShiTu PC端 demo 快速体验
<a name="环境配置"></a> <a name="环境配置"></a>
## 1. 环境配置 ### 2.1 环境配置
* 安装:请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。 * 安装:请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。
* 进入 `deploy` 运行目录。本部分所有内容与命令均需要在 `deploy` 目录下运行,可以通过下面的命令进入 `deploy` 目录。 * 进入 `deploy` 运行目录。本部分所有内容与命令均需要在 `deploy` 目录下运行,可以通过下面的命令进入 `deploy` 目录。
``` ```shell
cd deploy cd deploy
``` ```
<a name="图像识别体验"></a> <a name="图像识别体验"></a>
## 2. 图像识别体验 ### 2.2 图像识别体验
轻量级通用主体检测模型与轻量级通用识别模型和配置文件下载方式如下表所示。 轻量级通用主体检测模型与轻量级通用识别模型和配置文件下载方式如下表所示。
<a name="轻量级通用主体检测模型与轻量级通用识别模型"></a> <a name="轻量级通用主体检测模型与轻量级通用识别模型"></a>
| 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 | | 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 |
| ------------ | ------------- | -------- | ------- | | ---------------------- | -------- | ----------- | ------------ |
| 轻量级通用主体检测模型 | 通用场景 |[tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) [zip 格式文件下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | - | | 轻量级通用主体检测模型 | 通用场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) \| [zip 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | - |
| 轻量级通用识别模型 | 通用场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) [zip 格式文件下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.zip) | [inference_general.yaml](../../../deploy/configs/inference_general.yaml) | | 轻量级通用识别模型 | 通用场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar) \| [zip 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.zip) | [inference_general.yaml](../../../deploy/configs/inference_general.yaml) |
| 轻量级通用识别二值模型 | 检索库很大, 存储受限场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_binary_v1.0_infer.tar) [zip 格式文件下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_binary_v1.0_infer.zip)| [inference_general_binary.yaml](../../../deploy/configs/inference_general_binary.yaml) |
注意:由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar xf xxx.tar` 解压。 注意:由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar -xf xxx.tar` 解压。
本章节 demo 数据下载地址如下: [瓶装饮料数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar) 本章节 demo 数据下载地址如下: [drink_dataset_v2.0.tar(瓶装饮料数据)](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar)
下面以 **drink_dataset_v2.0.tar** 为例介绍PC端的 PP-ShiTu 快速体验流程。用户也可以自行下载并解压其它场景的数据进行体验:[22种场景数据下载](../introduction/ppshitu_application_scenarios.md#1-应用场景介绍)
如果希望体验服务端主体检测和各垂类方向的识别模型,可以参考[第4章](#4) 如果希望体验服务端主体检测和各垂类方向的识别模型,可以参考 [2.4 服务端识别模型列表](#24-服务端识别模型列表)
**注意** **注意**
1. windows 环境下如果没有安装 wget, 可以按照下面的步骤安装 wget 与 tar 命令,也可以在下载模型时将链接复制到浏览器中下载,并解压放置在相应目录下; linux 或者 macOS 用户可以右键点击,然后复制下载链接,即可通过 `wget` 命令下载。 - windows 环境下如果没有安装 wget, 可以按照下面的步骤安装 wget 与 tar 命令,也可以在下载模型时将链接复制到浏览器中下载,并解压放置在相应目录下; linux 或者 macOS 用户可以右键点击,然后复制下载链接,即可通过 `wget` 命令下载。
2. 如果 macOS 环境下没有安装 `wget` 命令,可以运行下面的命令进行安装。 - 如果 macOS 环境下没有安装 `wget` 命令,可以运行下面的命令进行安装。
```shell
```shell # 安装 homebrew
# 安装 homebrew ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"; # 安装 wget
# 安装 wget brew install wget
brew install wget ```
``` - 如果希望在 windows 环境下安装 wget,可以参考:[链接](https://www.cnblogs.com/jeshy/p/10518062.html);如果希望在 windows 环境中安装 tar 命令,可以参考:[链接](https://www.cnblogs.com/chooperman/p/14190107.html)
4. 如果希望在 windows 环境下安装 wget,可以参考:[链接](https://www.cnblogs.com/jeshy/p/10518062.html);如果希望在 windows 环境中安装 tar 命令,可以参考:[链接](https://www.cnblogs.com/chooperman/p/14190107.html)
* 可以按照下面的命令下载并解压数据与模型
```shell
mkdir models
cd models
# 下载识别 inference 模型并解压
wget {模型下载链接地址} && tar -xf {压缩包的名称}
cd ..
# 下载 demo 数据并解压
wget {数据下载链接地址} && tar -xf {压缩包的名称}
```
<a name="2.1"></a> <a name="2.2.1"></a>
### 2.1 下载、解压 inference 模型与 demo 数据 #### 2.2.1 下载、解压 inference 模型与 demo 数据
下载 demo 数据集以及轻量级主体检测、识别模型,命令如下。 下载 demo 数据集以及轻量级主体检测、识别模型,命令如下。
...@@ -91,30 +135,30 @@ cd models ...@@ -91,30 +135,30 @@ cd models
# 下载通用检测 inference 模型并解压 # 下载通用检测 inference 模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
# 下载识别 inference 模型并解压 # 下载识别 inference 模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar && tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar
cd ../ cd ../
# 下载 demo 数据并解压 # 下载 demo 数据并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar
``` ```
解压完毕后,`drink_dataset_v1.0/` 文件夹下应有如下文件结构: 解压完毕后,`drink_dataset_v2.0/` 文件夹下应有如下文件结构:
``` ```log
├── drink_dataset_v1.0/ ├── drink_dataset_v2.0/
│ ├── gallery/ │ ├── gallery/
│ ├── index/ │ ├── index/
│ ├── test_images/ │ ├── index_all/
│ └── test_images/
├── ... ├── ...
``` ```
其中 `gallery` 文件夹中存放的是用于构建索引库的原始图像,`index` 表示基于原始图像构建得到的索引库信息,`test_images` 文件夹中存放的是用于测试识别效果的图像列表。 其中 `gallery` 文件夹中存放的是用于构建索引库的原始图像,`index` 表示基于原始图像构建得到的索引库信息,`test_images` 文件夹中存放的是用于测试识别效果的图像列表。
`models` 文件夹下应有如下文件结构: `models` 文件夹下应有如下文件结构:
``` ```log
├── general_PPLCNet_x2_5_lite_v1.0_infer ├── general_PPLCNetV2_base_pretrained_v1.0_infer
│ ├── inference.pdiparams │ ├── inference.pdiparams
│ ├── inference.pdiparams.info │ ├── inference.pdiparams.info
│ └── inference.pdmodel │ └── inference.pdmodel
...@@ -129,183 +173,185 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_da ...@@ -129,183 +173,185 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_da
如果使用服务端通用识别模型,Demo 数据需要重新提取特征、够建索引,方式如下: 如果使用服务端通用识别模型,Demo 数据需要重新提取特征、够建索引,方式如下:
```shell ```shell
# 下面是使用下载的服务端商品识别模型进行索引库构建 python3.7 python/build_gallery.py \
python3.7 python/build_gallery.py -c configs/inference_general.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer -c configs/inference_general.yaml \
-o Global.rec_inference_model_dir=./models/general_PPLCNetV2_base_pretrained_v1.0_infer
``` ```
<a name="瓶装饮料识别与检索"></a> <a name="瓶装饮料识别与检索"></a>
### 2.2 瓶装饮料识别与检索 #### 2.2.2 瓶装饮料识别与检索
以瓶装饮料识别 demo 为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的 demo 数据与模型之后,替换对应的配置文件即可完成预测)。 以瓶装饮料识别 demo 为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的 demo 数据与模型之后,替换对应的配置文件即可完成预测)。
注意,此部分使用了 `faiss` 作为检索库,安装方法如下: 注意,此部分使用了 `faiss` 作为检索库,安装方法如下:
```python ```python
pip install faiss-cpu==1.7.1post2 python3.7 -m pip install faiss-cpu==1.7.1post2
``` ```
若使用时,不能正常引用,则 `uninstall` 之后,重新 `install`,尤其是 windows 下。 若使用时,不能正常引用,则 `uninstall` 之后,重新 `install`,尤其是 windows 下。
<a name="识别单张图像"></a> <a name="识别单张图像"></a>
#### 2.2.1 识别单张图像 ##### 2.2.2.1 识别单张图像
运行下面的命令,对图像 `./drink_dataset_v2.0/test_images/100.jpeg` 进行识别与检索
待检索图像如下所示
运行下面的命令,对图像 `./drink_dataset_v1.0/test_images/nongfu_spring.jpeg` 进行识别与检索 ![](../../images/recognition/drink_data_demo/test_images/100.jpeg)
```shell ```shell
# 使用下面的命令使用 GPU 进行预测 # 使用下面的命令使用 GPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml python3.7 python/predict_system.py -c configs/inference_general.yaml
# 使用下面的命令使用 CPU 进行预测 # 使用下面的命令使用 CPU 进行预测
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False
``` ```
待检索图像如下所示。
![](../../images/recognition/drink_data_demo/test_images/nongfu_spring.jpeg)
最终输出结果如下。 最终输出结果如下。
``` ```log
[{'bbox': [244, 49, 509, 964], 'rec_docs': '农夫山泉-饮用天然水', 'rec_scores': 0.7585664}] [{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
``` ```
其中 `bbox` 表示检测出的主体所在位置,`rec_docs` 表示索引库中与检测框最为相似的类别,`rec_scores` 表示对应的置信度。 其中 `bbox` 表示检测出的主体所在位置,`rec_docs` 表示索引库中与检测框最为相似的类别,`rec_scores` 表示对应的置信度。
检测的可视化结果保存在 `output` 文件夹下,对于本张图像,识别结果可视化如下所示。 检测的可视化结果默认保存在 `output` 文件夹下,对于本张图像,识别结果可视化如下所示。
![](../../images/recognition/drink_data_demo/output/nongfu_spring.jpeg) ![](../../images/recognition/drink_data_demo/output/100.jpeg)
<a name="基于文件夹的批量识别"></a> <a name="基于文件夹的批量识别"></a>
#### 2.2.2 基于文件夹的批量识别
##### 2.2.2.2 基于文件夹的批量识别
如果希望预测文件夹内的图像,可以直接修改配置文件中的 `Global.infer_imgs` 字段,也可以通过下面的 `-o` 参数修改对应的配置。 如果希望预测文件夹内的图像,可以直接修改配置文件中的 `Global.infer_imgs` 字段,也可以通过下面的 `-o` 参数修改对应的配置。
```shell ```shell
# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False # 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v1.0/test_images/" python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/"
``` ```
终端中会输出该文件夹内所有图像的识别结果,如下所示。 终端中会输出该文件夹内所有图像的识别结果,如下所示。
``` ```log
... ...
[{'bbox': [345, 95, 524, 586], 'rec_docs': '红牛-强化型', 'rec_scores': 0.80164653}] [{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}]
Inference: 23.43583106994629 ms per batch image Inference: 120.39852142333984 ms per batch image
[{'bbox': [233, 0, 372, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.72513914}] [{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}]
Inference: 117.95639991760254 ms per batch image Inference: 32.045602798461914 ms per batch image
[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.7855944}] [{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}]
Inference: 22.172927856445312 ms per batch image Inference: 113.41428756713867 ms per batch image
[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.5829516}] [{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}]
Inference: 118.08514595031738 ms per batch image Inference: 122.04337120056152 ms per batch image
[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.75581443}] [{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}]
Inference: 150.06470680236816 ms per batch image Inference: 37.95266151428223 ms per batch image
[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.8478892}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6790612}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6292581}] [{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}]
... ...
``` ```
所有图像的识别结果可视化图像也保存在 `output` 文件夹内。 所有图像的识别结果可视化图像也保存在 `output` 文件夹内。
更多地,可以通过修改 `Global.rec_inference_model_dir` 字段来更改识别 inference 模型的路径,通过修改 `IndexProcess.index_dir` 字段来更改索引库索引的路径。 更多地,可以通过修改 `Global.rec_inference_model_dir` 字段来更改识别 inference 模型的路径,通过修改 `IndexProcess.index_dir` 字段来更改索引库索引的路径。
<a name="未知类别的图像识别体验"></a> <a name="未知类别的图像识别体验"></a>
## 3. 未知类别的图像识别体验 ### 2.3 未知类别的图像识别体验
对图像 `./drink_dataset_v1.0/test_images/mosilian.jpeg` 进行识别,命令如下 对图像 `./drink_dataset_v2.0/test_images/mosilian.jpeg` 进行识别
```shell 待检索图像如下
# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v1.0/test_images/mosilian.jpeg"
```
待检索图像如下所示。
![](../../images/recognition/drink_data_demo/test_images/mosilian.jpeg) ![](../../images/recognition/drink_data_demo/test_images/mosilian.jpeg)
执行如下识别命令
输出结果为空。 ```shell
# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg"
```
可以发现输出结果为空
由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。 由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。
当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的 当索引库中的图像无法覆盖我们实际识别的场景时,即识别未知类别的图像前,我们需要将该未知类别的相似图像(至少一张)添加到索引库中,从而完成对未知类别的图像识别。这一过程不需要重新训练模型,以识别 `mosilian.jpeg` 为例,只需按以下步骤重新构建新的索引库即可
<a name="准备新的数据与标签"></a> <a name="准备新的数据与标签"></a>
### 3.1 准备新的数据与标签 #### 2.3.1 准备新的数据与标签
首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹。这里 PaddleClas 已经将所有的图像数据都放在文件夹 `drink_dataset_v1.0/gallery/` 中。 首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹中。这里 PaddleClas 已经将所有的图像数据都放在文件夹 `drink_dataset_v2.0/gallery/` 中。
然后需要编辑记录了图像路径和标签信息的文本文件,这里 PaddleClas 将更正后的标签信息文件放在了 `drink_dataset_v1.0/gallery/drink_label_all.txt` 文件中。可以与默认的 `drink_dataset_v1.0/gallery/drink_label.txt` 标签文件进行对比,添加了光明和三元系列牛奶的索引图像。
然后需要编辑记录了图像路径和标签信息的文本文件,这里 PaddleClas 将更新后的标签信息文件放在了 `drink_dataset_v2.0/gallery/drink_label_all.txt` 文件中。与原始的 `drink_dataset_v2.0/gallery/drink_label.txt` 标签文件进行对比,可以发现新增了光明和三元系列牛奶的索引图像。
每一行的文本中,第一个字段表示图像的相对路径,第二个字段表示图像对应的标签信息,中间用 `\t` 键分隔开(注意:有些编辑器会将 `tab` 自动转换为 `空格`,这种情况下会导致文件解析报错)。 每一行的文本中,第一个字段表示图像的相对路径,第二个字段表示图像对应的标签信息,中间用 `\t` 键分隔开(注意:有些编辑器会将 `tab` 自动转换为 `空格`,这种情况下会导致文件解析报错)。
<a name="建立新的索引库"></a> <a name="建立新的索引库"></a>
### 3.2 建立新的索引库 #### 2.3.2 建立新的索引库
使用下面的命令构建 `index` 索引,加速识别后的检索过程 使用下面的命令构建新的索引库 `index_all`
```shell ```shell
python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v1.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v1.0/index_all" python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v2.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all"
``` ```
最终新的索引信息保存在文件夹 `./drink_dataset_v1.0/index_all`。具体 `yaml` 请参考[向量检索文档](../image_recognition_pipeline/vector_search.md) 最终构建完毕的新的索引库保存在文件夹 `./drink_dataset_v2.0/index_all`。具体 `yaml` 请参考[向量检索文档](../image_recognition_pipeline/vector_search.md)
<a name="基于新的索引库的图像识别"></a> <a name="基于新的索引库的图像识别"></a>
### 3.3 基于新的索引库的图像识别 #### 2.3.3 基于新的索引库的图像识别
使用新的索引库,对上述图像进行识别,运行命令如下。 使用新的索引库,重新对 `mosilian.jpeg` 图像进行识别,运行命令如下。
```shell ```shell
# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False # 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False
python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="././drink_dataset_v1.0/test_images/mosilian.jpeg" -o IndexProcess.index_dir="./drink_dataset_v1.0/index_all" python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all"
``` ```
输出结果如下。 输出结果如下。
``` ```log
[{'bbox': [396, 553, 508, 621], 'rec_docs': '光明_莫斯利安', 'rec_scores': 0.5921005}] [{'bbox': [290, 297, 564, 919], 'rec_docs': '光明_莫斯利安', 'rec_scores': 0.59137374}]
``` ```
最终识别结果为`光明_莫斯利安`,识别正确,识别结果可视化如下所示。 最终识别结果为 `光明_莫斯利安` ,识别正确,识别结果可视化如下所示。
![](../../images/recognition/drink_data_demo/output/mosilian.jpeg) ![](../../images/recognition/drink_data_demo/output/mosilian.jpeg)
<a name="4"></a> <a name="5"></a>
## 4. 服务端识别模型列表
### 2.4 服务端识别模型列表
目前,我们更推荐您使用[轻量级通用主体检测模型与轻量级通用识别模型](#轻量级通用主体检测模型与轻量级通用识别模型),以获得更好的测试结果。但是如果您希望体验服务端识别模型,服务器端通用主体检测模型与各方向识别模型、测试数据下载地址以及对应的配置文件地址如下。 目前,我们更推荐您使用[轻量级通用主体检测模型与轻量级通用识别模型](#轻量级通用主体检测模型与轻量级通用识别模型),以获得更好的测试结果。但是如果您希望体验服务端识别模型,服务器端通用主体检测模型与各方向识别模型、测试数据下载地址以及对应的配置文件地址如下。
| 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 | | 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 |
| ------------ | ------------- | -------- | ------- | | ---------------- | -------------- | ------------ | ----------- |
| 通用主体检测模型 | 通用场景 |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | | 通用主体检测模型 | 通用场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - |
| Logo 识别模型 | Logo 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | | Logo 识别模型 | Logo 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) |
| 动漫人物识别模型 | 动漫人物场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | | 动漫人物识别模型 | 动漫人物场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) |
| 车辆细分类模型 | 车辆场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | | 车辆细分类模型 | 车辆场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) |
| 商品识别模型 | 商品场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | | 商品识别模型 | 商品场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) |
| 车辆 ReID 模型 | 车辆 ReID 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | | 车辆 ReID 模型 | 车辆 ReID 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) |
可以按照如下命令下载上述模型到 `deploy/models` 文件夹中,以供识别任务使用
```shell ```shell
cd PaddleClas/deploy/ cd ./deploy
mkdir -p models mkdir -p models
```
```shell
cd ./models cd ./models
# 下载通用主体检测模型并解压 # 下载服务器端通用主体检测模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# 下载识别模型并解压 # 下载通用识别模型并解压
wget {识别模型下载链接地址} && tar -xf {压缩包的名称} wget {识别模型下载链接地址} && tar -xf {压缩包的名称}
``` ```
使用如下命令下载各方向识别模型的测试数据: 然后使用如下命令下载各个识别场景的测试数据:
```shell ```shell
# 回到 deploy 目录下 # 回到 deploy 目录下
...@@ -316,7 +362,7 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit ...@@ -316,7 +362,7 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit
解压完毕后,`recognition_demo_data_v1.1` 文件夹下应有如下文件结构: 解压完毕后,`recognition_demo_data_v1.1` 文件夹下应有如下文件结构:
``` ```log
├── recognition_demo_data_v1.1 ├── recognition_demo_data_v1.1
│ ├── gallery_cartoon │ ├── gallery_cartoon
│ ├── gallery_logo │ ├── gallery_logo
...@@ -329,6 +375,6 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit ...@@ -329,6 +375,6 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit
├── ... ├── ...
``` ```
按照上述步骤下载模型和测试数据后,您可以进行相关方向识别模型的测试。 按照上述步骤下载模型和测试数据后,您可以重新建立索引库,并进行相关方向识别模型的测试。
* 更多关于主体检测的介绍可以参考:[主体检测教程文档](../image_recognition_pipeline/mainbody_detection.md);关于特征提取的介绍可以参考:[特征提取教程文档](../image_recognition_pipeline/feature_extraction.md);关于向量检索的介绍可以参考:[向量检索教程文档](../image_recognition_pipeline/vector_search.md) * 更多关于主体检测的介绍可以参考:[主体检测教程文档](../image_recognition_pipeline/mainbody_detection.md);关于特征提取的介绍可以参考:[特征提取教程文档](../image_recognition_pipeline/feature_extraction.md);关于向量检索的介绍可以参考:[向量检索教程文档](../image_recognition_pipeline/vector_search.md)
## 生鲜品自主结算
在超市等无人零售场景中,目前主要是结算方式,主要有以下几种
- 条形码方式
- RFID等射频码
- 称重方法
但是以上几种方法存在如下缺点: 1)针对条形码方式,对于成品包装的商品,较为成熟,但是对与生鲜产品等商品,并不能满足需求。 2)RFID等方式,虽然对生鲜等产品能够支持,但是额外生成标签,增加成本 3)称重方法,对于相同重量的山商品,不能很好的区分,同时重量称等精密仪器在长时间的负重和使用过程中,精度会发生变化,需要工作人员定期调教,以满足精度需求。
因此,如何选择一种既能大规模支持各种商品识别,又能方便管理,同时维护成本不高的识别系统,显得尤为重要。
深圳市银歌云技术有限公司基于飞桨的图像识别开发套件PaddleClas,提供了一套基于计算机视觉的完整生鲜品自主结算方案,其通过结算平台的摄像头拍摄的图像,自动的识别称上的商品,整个流程在1秒内完成,无需售卖人员的操作及称重。整个流程,实现了精度高、速度快,无需人工干预的自动结算效果。减少人工成本的同时,大大提高了效率和用户体验。
本案例使用了飞桨图像分类开发套件中的通用图像识别系统[PP-ShiTuV2](../../PPShiTu/PPShiTuV2_introduction.md)
![result](./imgs/yingeo.png)
**注**: AI Studio在线运行代码请参考[生鲜品自主结算](https://aistudio.baidu.com/aistudio/projectdetail/4486158)
...@@ -32,6 +32,7 @@ from .ppcls.arch import backbone ...@@ -32,6 +32,7 @@ from .ppcls.arch import backbone
from .ppcls.utils import logger from .ppcls.utils import logger
from .deploy.python.predict_cls import ClsPredictor from .deploy.python.predict_cls import ClsPredictor
from .deploy.python.predict_system import SystemPredictor
from .deploy.utils.get_image_list import get_image_list from .deploy.utils.get_image_list import get_image_list
from .deploy.utils import config from .deploy.utils import config
...@@ -50,6 +51,11 @@ BASE_IMAGES_DIR = os.path.join(BASE_DIR, "images") ...@@ -50,6 +51,11 @@ BASE_IMAGES_DIR = os.path.join(BASE_DIR, "images")
IMN_MODEL_BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/{}_infer.tar" IMN_MODEL_BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/{}_infer.tar"
IMN_MODEL_SERIES = { IMN_MODEL_SERIES = {
"AlexNet": ["AlexNet"], "AlexNet": ["AlexNet"],
"CSWinTransformer": [
"CSWinTransformer_tiny_224", "CSWinTransformer_small_224",
"CSWinTransformer_base_224", "CSWinTransformer_base_384",
"CSWinTransformer_large_224", "CSWinTransformer_large_384"
],
"DarkNet": ["DarkNet53"], "DarkNet": ["DarkNet53"],
"DeiT": [ "DeiT": [
"DeiT_base_distilled_patch16_224", "DeiT_base_distilled_patch16_384", "DeiT_base_distilled_patch16_224", "DeiT_base_distilled_patch16_384",
...@@ -81,6 +87,8 @@ IMN_MODEL_SERIES = { ...@@ -81,6 +87,8 @@ IMN_MODEL_SERIES = {
"HRNet_W48_C_ssld" "HRNet_W48_C_ssld"
], ],
"Inception": ["GoogLeNet", "InceptionV3", "InceptionV4"], "Inception": ["GoogLeNet", "InceptionV3", "InceptionV4"],
"LeViT":
["LeViT_128S", "LeViT_128", "LeViT_192", "LeViT_256", "LeViT_384"],
"MixNet": ["MixNet_S", "MixNet_M", "MixNet_L"], "MixNet": ["MixNet_S", "MixNet_M", "MixNet_L"],
"MobileNetV1": [ "MobileNetV1": [
"MobileNetV1_x0_25", "MobileNetV1_x0_5", "MobileNetV1_x0_75", "MobileNetV1_x0_25", "MobileNetV1_x0_5", "MobileNetV1_x0_75",
...@@ -99,6 +107,7 @@ IMN_MODEL_SERIES = { ...@@ -99,6 +107,7 @@ IMN_MODEL_SERIES = {
"MobileNetV3_large_x1_0", "MobileNetV3_large_x1_25", "MobileNetV3_large_x1_0", "MobileNetV3_large_x1_25",
"MobileNetV3_small_x1_0_ssld", "MobileNetV3_large_x1_0_ssld" "MobileNetV3_small_x1_0_ssld", "MobileNetV3_large_x1_0_ssld"
], ],
"MobileViT": ["MobileViT_XXS", "MobileViT_XS", "MobileViT_S"],
"PPHGNet": [ "PPHGNet": [
"PPHGNet_tiny", "PPHGNet_tiny",
"PPHGNet_small", "PPHGNet_small",
...@@ -110,6 +119,10 @@ IMN_MODEL_SERIES = { ...@@ -110,6 +119,10 @@ IMN_MODEL_SERIES = {
"PPLCNet_x1_0", "PPLCNet_x1_5", "PPLCNet_x2_0", "PPLCNet_x2_5" "PPLCNet_x1_0", "PPLCNet_x1_5", "PPLCNet_x2_0", "PPLCNet_x2_5"
], ],
"PPLCNetV2": ["PPLCNetV2_base"], "PPLCNetV2": ["PPLCNetV2_base"],
"PVTV2": [
"PVT_V2_B0", "PVT_V2_B1", "PVT_V2_B2", "PVT_V2_B2_Linear", "PVT_V2_B3",
"PVT_V2_B4", "PVT_V2_B5"
],
"RedNet": ["RedNet26", "RedNet38", "RedNet50", "RedNet101", "RedNet152"], "RedNet": ["RedNet26", "RedNet38", "RedNet50", "RedNet101", "RedNet152"],
"RegNet": ["RegNetX_4GF"], "RegNet": ["RegNetX_4GF"],
"Res2Net": [ "Res2Net": [
...@@ -162,6 +175,7 @@ IMN_MODEL_SERIES = { ...@@ -162,6 +175,7 @@ IMN_MODEL_SERIES = {
"pcpvt_small", "pcpvt_base", "pcpvt_large", "alt_gvt_small", "pcpvt_small", "pcpvt_base", "pcpvt_large", "alt_gvt_small",
"alt_gvt_base", "alt_gvt_large" "alt_gvt_base", "alt_gvt_large"
], ],
"TNT": ["TNT_small"],
"VGG": ["VGG11", "VGG13", "VGG16", "VGG19"], "VGG": ["VGG11", "VGG13", "VGG16", "VGG19"],
"VisionTransformer": [ "VisionTransformer": [
"ViT_base_patch16_224", "ViT_base_patch16_384", "ViT_base_patch32_384", "ViT_base_patch16_224", "ViT_base_patch16_384", "ViT_base_patch32_384",
...@@ -178,7 +192,16 @@ PULC_MODEL_BASE_DOWNLOAD_URL = "https://paddleclas.bj.bcebos.com/models/PULC/inf ...@@ -178,7 +192,16 @@ PULC_MODEL_BASE_DOWNLOAD_URL = "https://paddleclas.bj.bcebos.com/models/PULC/inf
PULC_MODELS = [ PULC_MODELS = [
"car_exists", "language_classification", "person_attribute", "car_exists", "language_classification", "person_attribute",
"person_exists", "safety_helmet", "text_image_orientation", "person_exists", "safety_helmet", "text_image_orientation",
"textline_orientation", "traffic_sign", "vehicle_attribute" "textline_orientation", "traffic_sign", "vehicle_attribute",
"table_attribute"
]
SHITU_MODEL_BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/{}_infer.tar"
SHITU_MODELS = [
# "picodet_PPLCNet_x2_5_mainbody_lite_v1.0", # ShiTuV1(V2)_mainbody_det
# "general_PPLCNet_x2_5_lite_v1.0" # ShiTuV1_general_rec
# "PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0", # ShiTuV2_general_rec TODO(hesensen): add lite model
"PP-ShiTuV2"
] ]
...@@ -200,12 +223,24 @@ class InputModelError(Exception): ...@@ -200,12 +223,24 @@ class InputModelError(Exception):
def init_config(model_type, model_name, inference_model_dir, **kwargs): def init_config(model_type, model_name, inference_model_dir, **kwargs):
cfg_path = f"deploy/configs/PULC/{model_name}/inference_{model_name}.yaml" if model_type == "pulc" else "deploy/configs/inference_cls.yaml" if model_type == "pulc":
cfg_path = f"deploy/configs/PULC/{model_name}/inference_{model_name}.yaml"
elif model_type == "shitu":
cfg_path = "deploy/configs/inference_general.yaml"
else:
cfg_path = "deploy/configs/inference_cls.yaml"
__dir__ = os.path.dirname(__file__) __dir__ = os.path.dirname(__file__)
cfg_path = os.path.join(__dir__, cfg_path) cfg_path = os.path.join(__dir__, cfg_path)
cfg = config.get_config(cfg_path, show=False) cfg = config.get_config(cfg_path, show=False)
if cfg.Global.get("inference_model_dir"):
cfg.Global.inference_model_dir = inference_model_dir cfg.Global.inference_model_dir = inference_model_dir
else:
cfg.Global.rec_inference_model_dir = os.path.join(
inference_model_dir,
"PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0")
cfg.Global.det_inference_model_dir = os.path.join(
inference_model_dir, "picodet_PPLCNet_x2_5_mainbody_lite_v1.0")
if "batch_size" in kwargs and kwargs["batch_size"]: if "batch_size" in kwargs and kwargs["batch_size"]:
cfg.Global.batch_size = kwargs["batch_size"] cfg.Global.batch_size = kwargs["batch_size"]
...@@ -219,6 +254,10 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs): ...@@ -219,6 +254,10 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs):
if "infer_imgs" in kwargs and kwargs["infer_imgs"]: if "infer_imgs" in kwargs and kwargs["infer_imgs"]:
cfg.Global.infer_imgs = kwargs["infer_imgs"] cfg.Global.infer_imgs = kwargs["infer_imgs"]
if "index_dir" in kwargs and kwargs["index_dir"]:
cfg.IndexProcess.index_dir = kwargs["index_dir"]
if "data_file" in kwargs and kwargs["data_file"]:
cfg.IndexProcess.data_file = kwargs["data_file"]
if "enable_mkldnn" in kwargs and kwargs["enable_mkldnn"]: if "enable_mkldnn" in kwargs and kwargs["enable_mkldnn"]:
cfg.Global.enable_mkldnn = kwargs["enable_mkldnn"] cfg.Global.enable_mkldnn = kwargs["enable_mkldnn"]
if "cpu_num_threads" in kwargs and kwargs["cpu_num_threads"]: if "cpu_num_threads" in kwargs and kwargs["cpu_num_threads"]:
...@@ -240,6 +279,8 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs): ...@@ -240,6 +279,8 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs):
if "thresh" in kwargs and kwargs[ if "thresh" in kwargs and kwargs[
"thresh"] and "ThreshOutput" in cfg.PostProcess: "thresh"] and "ThreshOutput" in cfg.PostProcess:
cfg.PostProcess.ThreshOutput.thresh = kwargs["thresh"] cfg.PostProcess.ThreshOutput.thresh = kwargs["thresh"]
if cfg.get("PostProcess"):
if "Topk" in cfg.PostProcess: if "Topk" in cfg.PostProcess:
if "topk" in kwargs and kwargs["topk"]: if "topk" in kwargs and kwargs["topk"]:
cfg.PostProcess.Topk.topk = kwargs["topk"] cfg.PostProcess.Topk.topk = kwargs["topk"]
...@@ -258,7 +299,25 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs): ...@@ -258,7 +299,25 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs):
if "type_threshold" in kwargs and kwargs["type_threshold"]: if "type_threshold" in kwargs and kwargs["type_threshold"]:
cfg.PostProcess.VehicleAttribute.type_threshold = kwargs[ cfg.PostProcess.VehicleAttribute.type_threshold = kwargs[
"type_threshold"] "type_threshold"]
if "TableAttribute" in cfg.PostProcess:
if "source_threshold" in kwargs and kwargs["source_threshold"]:
cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
"source_threshold"]
if "number_threshold" in kwargs and kwargs["number_threshold"]:
cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
"number_threshold"]
if "color_threshold" in kwargs and kwargs["color_threshold"]:
cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
"color_threshold"]
if "clarity_threshold" in kwargs and kwargs["clarity_threshold"]:
cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
"clarity_threshold"]
if "obstruction_threshold" in kwargs and kwargs["obstruction_threshold"]:
cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
"obstruction_threshold"]
if "angle_threshold" in kwargs and kwargs["angle_threshold"]:
cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[
"angle_threshold"]
if "save_dir" in kwargs and kwargs["save_dir"]: if "save_dir" in kwargs and kwargs["save_dir"]:
cfg.PostProcess.SavePreLabel.save_dir = kwargs["save_dir"] cfg.PostProcess.SavePreLabel.save_dir = kwargs["save_dir"]
...@@ -282,6 +341,13 @@ def args_cfg(): ...@@ -282,6 +341,13 @@ def args_cfg():
type=str, type=str,
help="The directory of model files. Valid when model_name not specifed." help="The directory of model files. Valid when model_name not specifed."
) )
parser.add_argument(
"--index_dir",
type=str,
required=False,
help="The index directory path.")
parser.add_argument(
"--data_file", type=str, required=False, help="The label file path.")
parser.add_argument("--use_gpu", type=str2bool, help="Whether use GPU.") parser.add_argument("--use_gpu", type=str2bool, help="Whether use GPU.")
parser.add_argument( parser.add_argument(
"--gpu_mem", "--gpu_mem",
...@@ -334,6 +400,7 @@ def print_info(): ...@@ -334,6 +400,7 @@ def print_info():
""" """
imn_table = PrettyTable(["IMN Model Series", "Model Name"]) imn_table = PrettyTable(["IMN Model Series", "Model Name"])
pulc_table = PrettyTable(["PULC Models"]) pulc_table = PrettyTable(["PULC Models"])
shitu_table = PrettyTable(["PP-ShiTu Models"])
try: try:
sz = os.get_terminal_size() sz = os.get_terminal_size()
total_width = sz.columns total_width = sz.columns
...@@ -352,11 +419,16 @@ def print_info(): ...@@ -352,11 +419,16 @@ def print_info():
textwrap.fill( textwrap.fill(
" ".join(PULC_MODELS), width=total_width).center(table_width - 4) " ".join(PULC_MODELS), width=total_width).center(table_width - 4)
]) ])
shitu_table.add_row([
textwrap.fill(
" ".join(SHITU_MODELS), width=total_width).center(table_width - 4)
])
print("{}".format("-" * table_width)) print("{}".format("-" * table_width))
print("Models supported by PaddleClas".center(table_width)) print("Models supported by PaddleClas".center(table_width))
print(imn_table) print(imn_table)
print(pulc_table) print(pulc_table)
print(shitu_table)
print("Powered by PaddlePaddle!".rjust(table_width)) print("Powered by PaddlePaddle!".rjust(table_width))
print("{}".format("-" * table_width)) print("{}".format("-" * table_width))
...@@ -412,6 +484,10 @@ def check_model_file(model_type, model_name): ...@@ -412,6 +484,10 @@ def check_model_file(model_type, model_name):
storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR, storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
"PULC", model_name) "PULC", model_name)
url = PULC_MODEL_BASE_DOWNLOAD_URL.format(model_name) url = PULC_MODEL_BASE_DOWNLOAD_URL.format(model_name)
elif model_type == "shitu":
storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
"PP-ShiTu", model_name)
url = SHITU_MODEL_BASE_DOWNLOAD_URL.format(model_name)
else: else:
storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR, storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR,
"IMN", model_name) "IMN", model_name)
...@@ -472,8 +548,10 @@ class PaddleClas(object): ...@@ -472,8 +548,10 @@ class PaddleClas(object):
model_name, inference_model_dir) model_name, inference_model_dir)
self._config = init_config(self.model_type, model_name, self._config = init_config(self.model_type, model_name,
inference_model_dir, **kwargs) inference_model_dir, **kwargs)
if self.model_type == "shitu":
self.cls_predictor = ClsPredictor(self._config) self.predictor = SystemPredictor(self._config)
else:
self.predictor = ClsPredictor(self._config)
def get_config(self): def get_config(self):
"""Get the config. """Get the config.
...@@ -485,6 +563,7 @@ class PaddleClas(object): ...@@ -485,6 +563,7 @@ class PaddleClas(object):
""" """
all_imn_model_names = get_imn_model_names() all_imn_model_names = get_imn_model_names()
all_pulc_model_names = PULC_MODELS all_pulc_model_names = PULC_MODELS
all_shitu_model_names = SHITU_MODELS
if model_name: if model_name:
if model_name in all_imn_model_names: if model_name in all_imn_model_names:
...@@ -493,6 +572,15 @@ class PaddleClas(object): ...@@ -493,6 +572,15 @@ class PaddleClas(object):
elif model_name in all_pulc_model_names: elif model_name in all_pulc_model_names:
inference_model_dir = check_model_file("pulc", model_name) inference_model_dir = check_model_file("pulc", model_name)
return "pulc", inference_model_dir return "pulc", inference_model_dir
elif model_name in all_shitu_model_names:
inference_model_dir = check_model_file(
"shitu",
"PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0")
inference_model_dir = check_model_file(
"shitu", "picodet_PPLCNet_x2_5_mainbody_lite_v1.0")
inference_model_dir = os.path.abspath(
os.path.dirname(inference_model_dir))
return "shitu", inference_model_dir
else: else:
similar_imn_names = similar_model_names(model_name, similar_imn_names = similar_model_names(model_name,
all_imn_model_names) all_imn_model_names)
...@@ -513,11 +601,12 @@ class PaddleClas(object): ...@@ -513,11 +601,12 @@ class PaddleClas(object):
raise InputModelError(err) raise InputModelError(err)
return "custom", inference_model_dir return "custom", inference_model_dir
else: else:
err = f"Please specify the model name supported by PaddleClas or directory contained model files(inference.pdmodel, inference.pdiparams)." err = "Please specify the model name supported by PaddleClas or directory contained model files(inference.pdmodel, inference.pdiparams)."
raise InputModelError(err) raise InputModelError(err)
return None return None
def predict(self, input_data: Union[str, np.array], def predict_cls(self,
input_data: Union[str, np.array],
print_pred: bool=False) -> Generator[list, None, None]: print_pred: bool=False) -> Generator[list, None, None]:
"""Predict input_data. """Predict input_data.
...@@ -538,7 +627,7 @@ class PaddleClas(object): ...@@ -538,7 +627,7 @@ class PaddleClas(object):
""" """
if isinstance(input_data, np.ndarray): if isinstance(input_data, np.ndarray):
yield self.cls_predictor.predict(input_data) yield self.predictor.predict(input_data)
elif isinstance(input_data, str): elif isinstance(input_data, str):
if input_data.startswith("http") or input_data.startswith("https"): if input_data.startswith("http") or input_data.startswith("https"):
image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR) image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR)
...@@ -570,7 +659,7 @@ class PaddleClas(object): ...@@ -570,7 +659,7 @@ class PaddleClas(object):
cnt += 1 cnt += 1
if cnt % batch_size == 0 or (idx_img + 1) == len(image_list): if cnt % batch_size == 0 or (idx_img + 1) == len(image_list):
preds = self.cls_predictor.predict(img_list) preds = self.predictor.predict(img_list)
if preds: if preds:
for idx_pred, pred in enumerate(preds): for idx_pred, pred in enumerate(preds):
...@@ -587,6 +676,77 @@ class PaddleClas(object): ...@@ -587,6 +676,77 @@ class PaddleClas(object):
raise ImageTypeError(err) raise ImageTypeError(err)
return return
def predict_shitu(self,
input_data: Union[str, np.array],
print_pred: bool=False) -> Generator[list, None, None]:
"""Predict input_data.
Args:
input_data (Union[str, np.array]):
When the type is str, it is the path of image, or the directory containing images, or the URL of image from Internet.
When the type is np.array, it is the image data whose channel order is RGB.
print_pred (bool, optional): Whether print the prediction result. Defaults to False.
Raises:
ImageTypeError: Illegal input_data.
Yields:
Generator[list, None, None]:
The prediction result(s) of input_data by batch_size. For every one image,
prediction result(s) is zipped as a dict, that includs topk "class_ids", "scores" and "label_names".
The format of batch prediction result(s) is as follow: [{"class_ids": [...], "scores": [...], "label_names": [...]}, ...]
"""
if isinstance(input_data, np.ndarray):
yield self.predictor.predict(input_data)
elif isinstance(input_data, str):
if input_data.startswith("http") or input_data.startswith("https"):
image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR)
if not os.path.exists(image_storage_dir()):
os.makedirs(image_storage_dir())
image_save_path = image_storage_dir("tmp.jpg")
download_with_progressbar(input_data, image_save_path)
logger.info(
f"Image to be predicted from Internet: {input_data}, has been saved to: {image_save_path}"
)
input_data = image_save_path
image_list = get_image_list(input_data)
cnt = 0
for idx_img, img_path in enumerate(image_list):
img = cv2.imread(img_path)
if img is None:
logger.warning(
f"Image file failed to read and has been skipped. The path: {img_path}"
)
continue
img = img[:, :, ::-1]
cnt += 1
preds = self.predictor.predict(
img) # [dict1, dict2, ..., dictn]
if preds:
if print_pred:
logger.info(f"{preds}, filename: {img_path}")
yield preds
else:
err = "Please input legal image! The type of image supported by PaddleClas are: NumPy.ndarray and string of local path or Ineternet URL"
raise ImageTypeError(err)
return
def predict(self,
input_data: Union[str, np.array],
print_pred: bool=False,
predict_type="cls"):
if predict_type == "cls":
return self.predict_cls(input_data, print_pred)
elif predict_type == "shitu":
assert not isinstance(input_data, (
list, tuple
)), "PP-ShiTu predictor only support single image as input now."
return self.predict_shitu(input_data, print_pred)
else:
raise ModuleNotFoundError
# for CLI # for CLI
def main(): def main():
...@@ -595,7 +755,10 @@ def main(): ...@@ -595,7 +755,10 @@ def main():
print_info() print_info()
cfg = args_cfg() cfg = args_cfg()
clas_engine = PaddleClas(**cfg) clas_engine = PaddleClas(**cfg)
res = clas_engine.predict(cfg["infer_imgs"], print_pred=True) res = clas_engine.predict(
cfg["infer_imgs"],
print_pred=True,
predict_type="cls" if "PP-ShiTu" not in cfg["model_name"] else "shitu")
for _ in res: for _ in res:
pass pass
logger.info("Predict complete!") logger.info("Predict complete!")
......
...@@ -69,10 +69,12 @@ from .model_zoo.repvgg import RepVGG_A0, RepVGG_A1, RepVGG_A2, RepVGG_B0, RepVGG ...@@ -69,10 +69,12 @@ from .model_zoo.repvgg import RepVGG_A0, RepVGG_A1, RepVGG_A2, RepVGG_B0, RepVGG
from .model_zoo.van import VAN_tiny from .model_zoo.van import VAN_tiny
from .model_zoo.peleenet import PeleeNet from .model_zoo.peleenet import PeleeNet
from .model_zoo.convnext import ConvNeXt_tiny from .model_zoo.convnext import ConvNeXt_tiny
from .model_zoo.cae import cae_base_patch16_224, cae_large_patch16_224
from .variant_models.resnet_variant import ResNet50_last_stage_stride1 from .variant_models.resnet_variant import ResNet50_last_stage_stride1
from .variant_models.vgg_variant import VGG19Sigmoid from .variant_models.vgg_variant import VGG19Sigmoid
from .variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh from .variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh
from .variant_models.pp_lcnetv2_variant import PPLCNetV2_base_ShiTu
from .model_zoo.adaface_ir_net import AdaFace_IR_18, AdaFace_IR_34, AdaFace_IR_50, AdaFace_IR_101, AdaFace_IR_152, AdaFace_IR_SE_50, AdaFace_IR_SE_101, AdaFace_IR_SE_152, AdaFace_IR_SE_200 from .model_zoo.adaface_ir_net import AdaFace_IR_18, AdaFace_IR_34, AdaFace_IR_50, AdaFace_IR_101, AdaFace_IR_152, AdaFace_IR_SE_50, AdaFace_IR_SE_101, AdaFace_IR_SE_152, AdaFace_IR_SE_200
......
...@@ -103,7 +103,7 @@ class TheseusLayer(nn.Layer): ...@@ -103,7 +103,7 @@ class TheseusLayer(nn.Layer):
return new_layer return new_layer
net = paddleclas.MobileNetV1() net = paddleclas.MobileNetV1()
res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) res = net.upgrade_sublayer(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
print(res) print(res)
# {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
""" """
...@@ -117,18 +117,26 @@ class TheseusLayer(nn.Layer): ...@@ -117,18 +117,26 @@ class TheseusLayer(nn.Layer):
layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
if not layer_list: if not layer_list:
continue continue
sub_layer_parent = layer_list[-2]["layer"] if len( sub_layer_parent = layer_list[-2]["layer"] if len(
layer_list) > 1 else self layer_list) > 1 else self
sub_layer = layer_list[-1]["layer"] sub_layer = layer_list[-1]["layer"]
sub_layer_name = layer_list[-1]["name"] sub_layer_name = layer_list[-1]["name"]
sub_layer_index = layer_list[-1]["index"] sub_layer_index_list = layer_list[-1]["index_list"]
new_sub_layer = handle_func(sub_layer, pattern) new_sub_layer = handle_func(sub_layer, pattern)
if sub_layer_index: if sub_layer_index_list:
getattr(sub_layer_parent, if len(sub_layer_index_list) > 1:
sub_layer_name)[sub_layer_index] = new_sub_layer sub_layer_parent = getattr(
sub_layer_parent,
sub_layer_name)[sub_layer_index_list[0]]
for sub_layer_index in sub_layer_index_list[1:-1]:
sub_layer_parent = sub_layer_parent[sub_layer_index]
sub_layer_parent[sub_layer_index_list[-1]] = new_sub_layer
else:
getattr(sub_layer_parent, sub_layer_name)[
sub_layer_index_list[0]] = new_sub_layer
else: else:
setattr(sub_layer_parent, sub_layer_name, new_sub_layer) setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
...@@ -151,8 +159,8 @@ class TheseusLayer(nn.Layer): ...@@ -151,8 +159,8 @@ class TheseusLayer(nn.Layer):
parent_layer = self parent_layer = self
for layer_dict in layer_list: for layer_dict in layer_list:
name, index = layer_dict["name"], layer_dict["index"] name, index_list = layer_dict["name"], layer_dict["index_list"]
if not set_identity(parent_layer, name, index): if not set_identity(parent_layer, name, index_list):
msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
logger.warning(msg) logger.warning(msg)
return False return False
...@@ -208,13 +216,13 @@ def save_sub_res_hook(layer, input, output): ...@@ -208,13 +216,13 @@ def save_sub_res_hook(layer, input, output):
def set_identity(parent_layer: nn.Layer, def set_identity(parent_layer: nn.Layer,
layer_name: str, layer_name: str,
layer_index: str=None) -> bool: layer_index_list: str=None) -> bool:
"""set the layer specified by layer_name and layer_index to Indentity. """set the layer specified by layer_name and layer_index_list to Indentity.
Args: Args:
parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index_list.
layer_name (str): The name of target layer to be set to Indentity. layer_name (str): The name of target layer to be set to Indentity.
layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. layer_index_list (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
Returns: Returns:
bool: True if successfully, False otherwise. bool: True if successfully, False otherwise.
...@@ -228,10 +236,13 @@ def set_identity(parent_layer: nn.Layer, ...@@ -228,10 +236,13 @@ def set_identity(parent_layer: nn.Layer,
if sub_layer_name == layer_name: if sub_layer_name == layer_name:
stop_after = True stop_after = True
if layer_index and stop_after: if layer_index_list and stop_after:
layer_container = parent_layer._sub_layers[layer_name]
for num, layer_index in enumerate(layer_index_list):
stop_after = False stop_after = False
for sub_layer_index in parent_layer._sub_layers[ for i in range(num):
layer_name]._sub_layers: layer_container = layer_container[layer_index_list[i]]
for sub_layer_index in layer_container._sub_layers:
if stop_after: if stop_after:
parent_layer._sub_layers[layer_name][ parent_layer._sub_layers[layer_name][
sub_layer_index] = Identity() sub_layer_index] = Identity()
...@@ -269,10 +280,12 @@ def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[ ...@@ -269,10 +280,12 @@ def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[
while len(pattern_list) > 0: while len(pattern_list) > 0:
if '[' in pattern_list[0]: if '[' in pattern_list[0]:
target_layer_name = pattern_list[0].split('[')[0] target_layer_name = pattern_list[0].split('[')[0]
target_layer_index = pattern_list[0].split('[')[1].split(']')[0] target_layer_index_list = list(
index.split(']')[0]
for index in pattern_list[0].split('[')[1:])
else: else:
target_layer_name = pattern_list[0] target_layer_name = pattern_list[0]
target_layer_index = None target_layer_index_list = None
target_layer = getattr(parent_layer, target_layer_name, None) target_layer = getattr(parent_layer, target_layer_name, None)
...@@ -281,21 +294,22 @@ def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[ ...@@ -281,21 +294,22 @@ def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[
logger.warning(msg) logger.warning(msg)
return None return None
if target_layer_index and target_layer: if target_layer_index_list:
if int(target_layer_index) < 0 or int(target_layer_index) >= len( for target_layer_index in target_layer_index_list:
target_layer): if int(target_layer_index) < 0 or int(
target_layer_index) >= len(target_layer):
msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
logger.warning(msg) logger.warning(msg)
return None return None
target_layer = target_layer[target_layer_index] target_layer = target_layer[target_layer_index]
layer_list.append({ layer_list.append({
"layer": target_layer, "layer": target_layer,
"name": target_layer_name, "name": target_layer_name,
"index": target_layer_index "index_list": target_layer_index_list
}) })
pattern_list = pattern_list[1:] pattern_list = pattern_list[1:]
parent_layer = target_layer parent_layer = target_layer
return layer_list return layer_list
...@@ -126,6 +126,8 @@ class RepDepthwiseSeparable(TheseusLayer): ...@@ -126,6 +126,8 @@ class RepDepthwiseSeparable(TheseusLayer):
use_se=False, use_se=False,
use_shortcut=False): use_shortcut=False):
super().__init__() super().__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.is_repped = False self.is_repped = False
self.dw_size = dw_size self.dw_size = dw_size
...@@ -306,8 +308,8 @@ class PPLCNetV2(TheseusLayer): ...@@ -306,8 +308,8 @@ class PPLCNetV2(TheseusLayer):
self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
in_features = self.class_expand if self.use_last_conv else NET_CONFIG[ in_features = self.class_expand if self.use_last_conv else make_divisible(
"stage4"][0] * 2 * scale NET_CONFIG["stage4"][0] * 2 * scale)
self.fc = Linear(in_features, class_num) self.fc = Linear(in_features, class_num)
def forward(self, x): def forward(self, x):
......
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Code was heavily based on https://github.com/PaddlePaddle/VIMER/blob/main/CAE/models/modeling_finetune.py
# reference: https://arxiv.org/abs/2202.03026
import collections
from itertools import repeat
import math
import numpy as np
from functools import partial
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from ....utils.download import get_weights_path_from_url
MODEL_URLS = {
"cae_base_patch16_224":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/cae_base_patch16_224_pretrained.pdparams",
"cae_large_patch16_224":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/cae_large_patch16_224_pretrained.pdparams"
}
__all__ = list(MODEL_URLS.keys())
def _ntuple(n):
def parse(x):
if isinstance(x, collections.abc.Iterable):
return x
return tuple(repeat(x, n))
return parse
def trunc_normal_(tensor, mean=0., std=1.):
nn.initializer.TruncatedNormal(mean=mean, std=std)(tensor)
def drop_path(x, drop_prob: float=0., training: bool=False):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
'survival rate' as the argument.
"""
if drop_prob == 0. or not training:
return x
keep_prob = 1 - drop_prob
shape = (x.shape[0], ) + (1, ) * (
x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
random_tensor.floor_() # binarize
output = x / keep_prob * random_tensor
return output
class DropPath(nn.Layer):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, x):
return drop_path(x, self.drop_prob, self.training)
def extra_repr(self) -> str:
return 'p={}'.format(self.drop_prob)
class Mlp(nn.Layer):
def __init__(self,
in_features,
hidden_features=None,
out_features=None,
act_layer=nn.GELU,
drop=0.):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Linear(in_features, hidden_features, bias_attr=True)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_features, out_features, bias_attr=True)
self.drop = nn.Dropout(drop)
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
# x = self.drop(x)
# commit this for the orignal BERT implement
x = self.fc2(x)
x = self.drop(x)
return x
class Attention(nn.Layer):
def __init__(self,
dim,
num_heads=8,
qkv_bias=False,
qk_scale=None,
attn_drop=0.,
proj_drop=0.,
window_size=None,
attn_head_dim=None):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
if attn_head_dim is not None:
head_dim = attn_head_dim
all_head_dim = head_dim * self.num_heads
self.scale = qk_scale or head_dim**-0.5
self.zeros_ = nn.initializer.Constant(value=0.)
self.qkv = nn.Linear(dim, all_head_dim * 3, bias_attr=False)
if qkv_bias:
self.q_bias = self.create_parameter(
[all_head_dim], default_initializer=self.zeros_)
self.v_bias = self.create_parameter(
[all_head_dim], default_initializer=self.zeros_)
else:
self.q_bias = None
self.v_bias = None
if window_size:
self.window_size = window_size
self.num_relative_distance = (2 * window_size[0] - 1) * (
2 * window_size[1] - 1) + 3
self.relative_position_bias_table = self.create_parameter(
[self.num_relative_distance, num_heads],
default_initializer=self.zeros_) # 2*Wh-1 * 2*Ww-1, nH
# cls to token & token 2 cls & cls to cls
# get pair-wise relative position index for each token inside the window
coords_h = paddle.arange(window_size[0])
coords_w = paddle.arange(window_size[1])
coords = paddle.stack(paddle.meshgrid(
[coords_h, coords_w])) # 2, Wh, Ww
coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
relative_coords = coords_flatten[:, :,
None] - coords_flatten[:,
None, :] # 2, Wh*Ww, Wh*Ww
relative_coords = relative_coords.transpose(
[1, 2, 0]) # Wh*Ww, Wh*Ww, 2
relative_coords[:, :, 0] += window_size[
0] - 1 # shift to start from 0
relative_coords[:, :, 1] += window_size[1] - 1
relative_coords[:, :, 0] *= 2 * window_size[1] - 1
relative_position_index = \
paddle.zeros((window_size[0] * window_size[1] + 1, ) * 2, dtype=relative_coords.dtype)
relative_position_index[1:, 1:] = relative_coords.sum(
-1) # Wh*Ww, Wh*Ww
relative_position_index[0, 0:] = self.num_relative_distance - 3
relative_position_index[0:, 0] = self.num_relative_distance - 2
relative_position_index[0, 0] = self.num_relative_distance - 1
self.register_buffer("relative_position_index",
relative_position_index)
else:
self.window_size = None
self.relative_position_bias_table = None
self.relative_position_index = None
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(all_head_dim, dim, bias_attr=True)
self.proj_drop = nn.Dropout(proj_drop)
def forward(self, x, rel_pos_bias=None):
B, N, C = x.shape
qkv_bias = None
if self.q_bias is not None:
k_bias = paddle.zeros_like(self.v_bias)
k_bias.stop_gradient = True
qkv_bias = paddle.concat((self.q_bias, k_bias, self.v_bias))
# qkv = self.qkv(x).reshape([B, N, 3, self.num_heads, C // self.num_heads]).transpose([2, 0, 3, 1, 4])
qkv = F.linear(x=x, weight=self.qkv.weight, bias=qkv_bias)
qkv = qkv.reshape([B, N, 3, self.num_heads, -1]).transpose(
[2, 0, 3, 1, 4])
q, k, v = qkv[0], qkv[1], qkv[
2] # make torchscript happy (cannot use tensor as tuple)
q = q * self.scale
attn = (q @k.transpose([0, 1, 3, 2]))
if self.relative_position_bias_table is not None:
relative_position_bias = \
self.relative_position_bias_table[self.relative_position_index.reshape([-1])].reshape([
self.window_size[0] * self.window_size[1] + 1,
self.window_size[0] * self.window_size[1] + 1, -1]) # Wh*Ww,Wh*Ww,nH
relative_position_bias = relative_position_bias.transpose(
[2, 0, 1]) # nH, Wh*Ww, Wh*Ww
attn = attn + relative_position_bias.unsqueeze(0)
if rel_pos_bias is not None:
attn = attn + rel_pos_bias
attn = F.softmax(attn, axis=-1)
attn = self.attn_drop(attn)
x = (attn @v).transpose([0, 2, 1, 3]).reshape([B, N, -1])
x = self.proj(x)
x = self.proj_drop(x)
return x
class Block(nn.Layer):
def __init__(self,
dim,
num_heads,
mlp_ratio=4.,
qkv_bias=False,
qk_scale=None,
drop=0.,
attn_drop=0.,
drop_path=0.,
init_values=None,
act_layer=nn.GELU,
norm_layer=nn.LayerNorm,
window_size=None,
attn_head_dim=None):
super().__init__()
self.norm1 = norm_layer(dim)
self.attn = Attention(
dim,
num_heads=num_heads,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
attn_drop=attn_drop,
proj_drop=drop,
window_size=window_size,
attn_head_dim=attn_head_dim)
# NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
self.drop_path = DropPath(
drop_path) if drop_path > 0. else nn.Identity()
self.norm2 = norm_layer(dim)
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp = Mlp(in_features=dim,
hidden_features=mlp_hidden_dim,
act_layer=act_layer,
drop=drop)
if init_values > 0:
self.gamma_1 = self.create_parameter(
[dim],
default_initializer=nn.initializer.Constant(value=init_values))
self.gamma_2 = self.create_parameter(
[dim],
default_initializer=nn.initializer.Constant(value=init_values))
else:
self.gamma_1, self.gamma_2 = None, None
def forward(self, x, rel_pos_bias=None):
if self.gamma_1 is None:
x = x + self.drop_path(
self.attn(
self.norm1(x), rel_pos_bias=rel_pos_bias))
x = x + self.drop_path(self.mlp(self.norm2(x)))
else:
x = x + self.drop_path(self.gamma_1 * self.attn(
self.norm1(x), rel_pos_bias=rel_pos_bias))
x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x)))
return x
class PatchEmbed(nn.Layer):
""" Image to Patch Embedding
"""
def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768):
super().__init__()
to_2tuple = _ntuple(2)
img_size = to_2tuple(img_size)
patch_size = to_2tuple(patch_size)
num_patches = (img_size[1] // patch_size[1]) * (img_size[0] //
patch_size[0])
self.patch_shape = (img_size[0] // patch_size[0],
img_size[1] // patch_size[1])
self.img_size = img_size
self.patch_size = patch_size
self.num_patches = num_patches
self.in_chans = in_chans
self.out_chans = embed_dim
self.proj = nn.Conv2D(
in_chans,
embed_dim,
kernel_size=patch_size,
stride=patch_size,
bias_attr=True)
def forward(self, x, **kwargs):
B, C, H, W = x.shape
# FIXME look at relaxing size constraints
assert H == self.img_size[0] and W == self.img_size[1], \
f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
x = self.proj(x).flatten(2).transpose([0, 2, 1])
return x
def _init_weights(self):
fan_out = self.out_chans
fan_in = self.patch_size[0] * self.patch_size[1] * self.in_chans
weight_attr = paddle.ParamAttr(
initializer=nn.initializer.XavierUniform(fan_in, fan_out)) # MAE
bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0))
return weight_attr, bias_attr
class RelativePositionBias(nn.Layer):
def __init__(self, window_size, num_heads):
super().__init__()
self.window_size = window_size
self.num_relative_distance = (2 * window_size[0] - 1) * (
2 * window_size[1] - 1) + 3
self.zeros_ = nn.initializer.Constant(value=0.)
self.relative_position_bias_table = self.create_parameter(
[self.num_relative_distance, num_heads],
default_initializer=self.zeros_) # 2*Wh-1 * 2*Ww-1, nH
# cls to token & token 2 cls & cls to cls
# get pair-wise relative position index for each token inside the window
coords_h = paddle.arange(window_size[0])
coords_w = paddle.arange(window_size[1])
coords = paddle.stack(paddle.meshgrid(
[coords_h, coords_w])) # 2, Wh, Ww
coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
relative_coords = coords_flatten[:, :,
None] - coords_flatten[:,
None, :] # 2, Wh*Ww, Wh*Ww
relative_coords = relative_coords.transpose(
[1, 2, 0]) # Wh*Ww, Wh*Ww, 2
relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0
relative_coords[:, :, 1] += window_size[1] - 1
relative_coords[:, :, 0] *= 2 * window_size[1] - 1
relative_position_index = \
paddle.zeros((window_size[0] * window_size[1] + 1,) * 2, dtype=relative_coords.dtype)
relative_position_index[1:, 1:] = relative_coords.sum(
-1) # Wh*Ww, Wh*Ww
relative_position_index[0, 0:] = self.num_relative_distance - 3
relative_position_index[0:, 0] = self.num_relative_distance - 2
relative_position_index[0, 0] = self.num_relative_distance - 1
self.register_buffer("relative_position_index",
relative_position_index)
def forward(self):
relative_position_bias = \
self.relative_position_bias_table[self.relative_position_index.reshape([-1])].reshape([
self.window_size[0] * self.window_size[1] + 1,
self.window_size[0] * self.window_size[1] + 1, -1]) # Wh*Ww,Wh*Ww,nH
return relative_position_bias.transpose([2, 0, 1]) # nH, Wh*Ww, Wh*Ww
def get_sinusoid_encoding_table(n_position, d_hid, token=False):
''' Sinusoid position encoding table '''
def get_position_angle_vec(position):
return [
position / np.power(10000, 2 * (hid_j // 2) / d_hid)
for hid_j in range(d_hid)
]
sinusoid_table = np.array(
[get_position_angle_vec(pos_i) for pos_i in range(n_position)])
sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i
sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1
if token:
sinusoid_table = np.concatenate(
[sinusoid_table, np.zeros([1, d_hid])], dim=0)
return paddle.to_tensor(sinusoid_table).unsqueeze(0)
class VisionTransformer(nn.Layer):
""" Vision Transformer with support for patch or hybrid CNN input stage
"""
def __init__(self,
img_size=224,
patch_size=16,
in_chans=3,
class_num=1000,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4.,
qkv_bias=False,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.,
norm_layer=nn.LayerNorm,
init_values=None,
use_abs_pos_emb=True,
use_rel_pos_bias=False,
use_shared_rel_pos_bias=False,
use_mean_pooling=True,
init_scale=0.001,
lin_probe=False,
sin_pos_emb=True,
args=None):
super().__init__()
self.class_num = class_num
self.num_features = self.embed_dim = embed_dim # num_features for consistency with other models
self.use_mean_pooling = use_mean_pooling
self.patch_embed = PatchEmbed(
img_size=img_size,
patch_size=patch_size,
in_chans=in_chans,
embed_dim=embed_dim)
num_patches = self.patch_embed.num_patches
self.zeros_ = nn.initializer.Constant(value=0.)
self.ones_ = nn.initializer.Constant(value=1.)
self.cls_token = self.create_parameter(
[1, 1, embed_dim], default_initializer=self.zeros_)
self.use_abs_pos_emb = use_abs_pos_emb
if use_abs_pos_emb:
self.pos_embed = self.create_parameter(
[1, num_patches + 1, embed_dim],
default_initializer=self.zeros_)
elif sin_pos_emb:
# sine-cosine positional embeddings is on the way
self.pos_embed = self.create_parameter(
[1, num_patches + 1, embed_dim],
default_initializer=self.zeros_)
self.pos_embed.set_value(
self.build_2d_sincos_position_embedding(embed_dim))
self.pos_embed.stop_gradient = True # fixed sin-cos embedding
else:
self.pos_embed = None
self.pos_drop = nn.Dropout(p=drop_rate)
if use_shared_rel_pos_bias:
self.rel_pos_bias = RelativePositionBias(
window_size=self.patch_embed.patch_shape, num_heads=num_heads)
else:
self.rel_pos_bias = None
dpr = [x.item() for x in paddle.linspace(0, drop_path_rate, depth)
] # stochastic depth decay rule
self.use_rel_pos_bias = use_rel_pos_bias
self.blocks = nn.LayerList([
Block(
dim=embed_dim,
num_heads=num_heads,
mlp_ratio=mlp_ratio,
qkv_bias=qkv_bias,
qk_scale=qk_scale,
drop=drop_rate,
attn_drop=attn_drop_rate,
drop_path=dpr[i],
norm_layer=norm_layer,
init_values=init_values,
window_size=self.patch_embed.patch_shape
if use_rel_pos_bias else None) for i in range(depth)
])
self.norm = nn.Identity() if use_mean_pooling else norm_layer(
embed_dim)
self.lin_probe = lin_probe
# NOTE: batch norm
if lin_probe:
# TODO
from models.lincls_bn import LP_BatchNorm
self.fc_norm = LP_BatchNorm(embed_dim, affine=False)
else:
if use_mean_pooling:
self.fc_norm = norm_layer(embed_dim)
else:
self.fc_norm = None
self.head = nn.Linear(embed_dim,
class_num) if class_num > 0 else nn.Identity()
if self.pos_embed is not None and use_abs_pos_emb:
trunc_normal_(self.pos_embed, std=.02)
trunc_normal_(self.cls_token, std=.02)
# trunc_normal_(self.mask_token, std=.02)
trunc_normal_(self.head.weight, std=.02)
self.apply(self._init_weights)
self.fix_init_weight()
self.head.weight.set_value(self.head.weight * init_scale)
self.head.bias.set_value(self.head.bias * init_scale)
def build_2d_sincos_position_embedding(self,
embed_dim=768,
temperature=10000.):
h, w = self.patch_embed.patch_shape
grid_w = paddle.arange(w, dtype=paddle.float32)
grid_h = paddle.arange(h, dtype=paddle.float32)
grid_w, grid_h = paddle.meshgrid(grid_w, grid_h)
assert embed_dim % 4 == 0, 'Embed dimension must be divisible by 4 for 2D sin-cos position embedding'
pos_dim = embed_dim // 4
omega = paddle.arange(pos_dim, dtype=paddle.float32) / pos_dim
omega = 1. / (temperature**omega)
out_w = paddle.einsum('m,d->md', grid_w.flatten(), omega)
out_h = paddle.einsum('m,d->md', grid_h.flatten(), omega)
pos_emb = paddle.concat(
[
paddle.sin(out_w), paddle.cos(out_w), paddle.sin(out_h),
paddle.cos(out_h)
],
axis=1)[None, :, :]
# if not self.use_mean_pooling:
pe_token = paddle.zeros([1, 1, embed_dim], dtype=paddle.float32)
pos_emb = paddle.concat([pe_token, pos_emb], axis=1)
return pos_emb
def fix_init_weight(self):
def rescale(param, layer_id):
param.set_value(param / math.sqrt(2.0 * layer_id))
for layer_id, layer in enumerate(self.blocks):
rescale(layer.attn.proj.weight, layer_id + 1)
rescale(layer.mlp.fc2.weight, layer_id + 1)
def _init_weights(self, m):
if isinstance(m, nn.Linear):
trunc_normal_(m.weight, std=.02)
if isinstance(m, nn.Linear) and m.bias is not None:
self.zeros_(m.bias)
elif isinstance(m, nn.LayerNorm):
self.zeros_(m.bias)
self.ones_(m.weight)
def get_num_layers(self):
return len(self.blocks)
def no_weight_decay(self):
return {'pos_embed', 'cls_token'}
def get_classifier(self):
return self.head
def reset_classifier(self, class_num, global_pool=''):
self.class_num = class_num
self.head = nn.Linear(self.embed_dim,
class_num) if class_num > 0 else nn.Identity()
def forward_features(self, x, is_train=True):
x = self.patch_embed(x)
batch_size, seq_len, _ = x.shape
cls_tokens = self.cls_token.expand(
[batch_size, -1,
-1]) # stole cls_tokens impl from Phil Wang, thanks
x = paddle.concat((cls_tokens, x), axis=1)
if self.pos_embed is not None:
if self.use_abs_pos_emb:
x = x + self.pos_embed.expand(
[batch_size, -1, -1]).astype(x.dtype).clone().detach()
else:
x = x + self.pos_embed.expand(
[batch_size, -1, -1]).astype(x.dtype).clone().detach()
x = self.pos_drop(x)
rel_pos_bias = self.rel_pos_bias(
) if self.rel_pos_bias is not None else None
for blk in self.blocks:
x = blk(x, rel_pos_bias=rel_pos_bias)
x = self.norm(x)
if self.fc_norm is not None:
t = x[:, 1:, :]
if self.lin_probe:
if self.use_mean_pooling:
return self.fc_norm(t.mean(1), is_train=is_train)
else:
return self.fc_norm(x[:, 0], is_train=is_train)
else:
return self.fc_norm(t.mean(1))
else:
return x[:, 0]
def forward(self, x, is_train=True):
x = self.forward_features(x, is_train)
x = self.head(x)
return x
def _enable_linear_eval(model):
zeros_ = nn.initializer.Constant(value=0.)
normal_ = nn.initializer.Normal(mean=0.0, std=0.01)
linear_keyword = 'head'
head_norm = 'fc_norm'
requires_grad = []
for name, param in model.named_parameters():
if name not in [
'%s.weight' % linear_keyword, '%s.bias' % linear_keyword
] and head_norm not in name:
param.stop_gradient = True
else:
requires_grad.append(name)
# init the fc layer
normal_(getattr(model, linear_keyword).weight)
zeros_(getattr(model, linear_keyword).bias)
return
def _load_pretrained(pretrained,
pretrained_url,
model,
model_keys,
model_ema_configs,
abs_pos_emb,
rel_pos_bias,
use_ssld=False):
if pretrained is False:
pass
elif pretrained is True:
local_weight_path = get_weights_path_from_url(pretrained_url).replace(
".pdparams", "")
checkpoint = paddle.load(local_weight_path + ".pdparams")
elif isinstance(pretrained, str):
checkpoint = paddle.load(local_weight_path + ".pdparams")
checkpoint_model = None
for model_key in model_keys.split('|'):
if model_key in checkpoint:
checkpoint_model = checkpoint[model_key]
break
if checkpoint_model is None:
checkpoint_model = checkpoint
state_dict = model.state_dict()
all_keys = list(checkpoint_model.keys())
# NOTE: remove all decoder keys
all_keys = [key for key in all_keys if key.startswith('encoder.')]
for key in all_keys:
new_key = key.replace('encoder.', '')
checkpoint_model[new_key] = checkpoint_model[key]
checkpoint_model.pop(key)
for key in list(checkpoint_model.keys()):
if key.startswith('regressor_and_decoder.'):
checkpoint_model.pop(key)
if key.startswith('teacher_network.'):
checkpoint_model.pop(key)
# NOTE: replace norm with fc_norm
for key in list(checkpoint_model.keys()):
if key.startswith('norm.'):
new_key = key.replace('norm.', 'fc_norm.')
checkpoint_model[new_key] = checkpoint_model[key]
checkpoint_model.pop(key)
for k in ['head.weight', 'head.bias']:
if k in checkpoint_model and checkpoint_model[k].shape != state_dict[
k].shape:
del checkpoint_model[k]
if model.use_rel_pos_bias and "rel_pos_bias.relative_position_bias_table" in checkpoint_model:
num_layers = model.get_num_layers()
rel_pos_bias = checkpoint_model[
"rel_pos_bias.relative_position_bias_table"]
for i in range(num_layers):
checkpoint_model["blocks.%d.attn.relative_position_bias_table" %
i] = rel_pos_bias.clone()
checkpoint_model.pop("rel_pos_bias.relative_position_bias_table")
all_keys = list(checkpoint_model.keys())
for key in all_keys:
if "relative_position_index" in key:
checkpoint_model.pop(key)
if "relative_position_bias_table" in key and rel_pos_bias:
rel_pos_bias = checkpoint_model[key]
src_num_pos, num_attn_heads = rel_pos_bias.size()
dst_num_pos, _ = model.state_dict()[key].size()
dst_patch_shape = model.patch_embed.patch_shape
if dst_patch_shape[0] != dst_patch_shape[1]:
raise NotImplementedError()
num_extra_tokens = dst_num_pos - (dst_patch_shape[0] * 2 - 1) * (
dst_patch_shape[1] * 2 - 1)
src_size = int((src_num_pos - num_extra_tokens)**0.5)
dst_size = int((dst_num_pos - num_extra_tokens)**0.5)
if src_size != dst_size:
extra_tokens = rel_pos_bias[-num_extra_tokens:, :]
rel_pos_bias = rel_pos_bias[:-num_extra_tokens, :]
def geometric_progression(a, r, n):
return a * (1.0 - r**n) / (1.0 - r)
left, right = 1.01, 1.5
while right - left > 1e-6:
q = (left + right) / 2.0
gp = geometric_progression(1, q, src_size // 2)
if gp > dst_size // 2:
right = q
else:
left = q
dis = []
cur = 1
for i in range(src_size // 2):
dis.append(cur)
cur += q**(i + 1)
r_ids = [-_ for _ in reversed(dis)]
x = r_ids + [0] + dis
y = r_ids + [0] + dis
t = dst_size // 2.0
dx = np.arange(-t, t + 0.1, 1.0)
dy = np.arange(-t, t + 0.1, 1.0)
all_rel_pos_bias = []
for i in range(num_attn_heads):
z = rel_pos_bias[:, i].view(src_size,
src_size).float().numpy()
f = interpolate.interp2d(x, y, z, kind='cubic')
all_rel_pos_bias.append(
paddle.Tensor(f(dx, dy)).contiguous().view(-1, 1).to(
rel_pos_bias.device))
rel_pos_bias = paddle.concat(all_rel_pos_bias, axis=-1)
new_rel_pos_bias = paddle.concat(
(rel_pos_bias, extra_tokens), axis=0)
checkpoint_model[key] = new_rel_pos_bias
# interpolate position embedding
if 'pos_embed' in checkpoint_model and abs_pos_emb:
pos_embed_checkpoint = checkpoint_model['pos_embed']
embedding_size = pos_embed_checkpoint.shape[-1]
num_patches = model.patch_embed.num_patches
num_extra_tokens = model.pos_embed.shape[-2] - num_patches
# height (== width) for the checkpoint position embedding
orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens)**
0.5)
# height (== width) for the new position embedding
new_size = int(num_patches**0.5)
# class_token and dist_token are kept unchanged
if orig_size != new_size:
extra_tokens = pos_embed_checkpoint[:, :num_extra_tokens]
# only the position tokens are interpolated
pos_tokens = pos_embed_checkpoint[:, num_extra_tokens:]
pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size,
embedding_size).permute(0, 3, 1, 2)
pos_tokens = paddle.nn.functional.interpolate(
pos_tokens,
size=(new_size, new_size),
mode='bicubic',
align_corners=False)
pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2)
new_pos_embed = paddle.concat((extra_tokens, pos_tokens), axis=1)
checkpoint_model['pos_embed'] = new_pos_embed
msg = model.set_state_dict(checkpoint_model)
model_without_ddp = model
n_parameters = sum(p.numel() for p in model.parameters()
if not p.stop_gradient).item()
return
def cae_base_patch16_224(pretrained=True, use_ssld=False, **kwargs):
config = kwargs.copy()
enable_linear_eval = config.pop('enable_linear_eval')
model_keys = config.pop('model_key')
model_ema_configs = config.pop('model_ema')
abs_pos_emb = config.pop('abs_pos_emb')
rel_pos_bias = config.pop('rel_pos_bias')
if pretrained in config:
pretrained = config.pop('pretrained')
model = VisionTransformer(
patch_size=16,
embed_dim=768,
depth=12,
num_heads=12,
mlp_ratio=4,
qkv_bias=True,
norm_layer=partial(
nn.LayerNorm, epsilon=1e-6),
**config)
if enable_linear_eval:
_enable_linear_eval(model)
_load_pretrained(
pretrained,
MODEL_URLS["cae_base_patch16_224"],
model,
model_keys,
model_ema_configs,
abs_pos_emb,
rel_pos_bias,
use_ssld=False)
return model
def cae_large_patch16_224(pretrained=True, use_ssld=False, **kwargs):
config = kwargs.copy()
enable_linear_eval = config.pop('enable_linear_eval')
model_keys = config.pop('model_key')
model_ema_configs = config.pop('model_ema')
abs_pos_emb = config.pop('abs_pos_emb')
rel_pos_bias = config.pop('rel_pos_bias')
if pretrained in config:
pretrained = config.pop('pretrained')
model = VisionTransformer(
patch_size=16,
embed_dim=1024,
depth=24,
num_heads=16,
mlp_ratio=4,
qkv_bias=True,
norm_layer=partial(
nn.LayerNorm, epsilon=1e-6),
**config)
if enable_linear_eval:
_enable_linear_eval(model)
_load_pretrained(
pretrained,
MODEL_URLS["cae_large_patch16_224"],
model,
model_keys,
model_ema_configs,
abs_pos_emb,
rel_pos_bias,
use_ssld=False)
return model
from .resnet_variant import ResNet50_last_stage_stride1 from .resnet_variant import ResNet50_last_stage_stride1
from .vgg_variant import VGG19Sigmoid from .vgg_variant import VGG19Sigmoid
from .pp_lcnet_variant import PPLCNet_x2_5_Tanh from .pp_lcnet_variant import PPLCNet_x2_5_Tanh
from .pp_lcnetv2_variant import PPLCNetV2_base_ShiTu
from paddle.nn import Conv2D, Identity
from ..legendary_models.pp_lcnet_v2 import MODEL_URLS, PPLCNetV2_base, RepDepthwiseSeparable, _load_pretrained
__all__ = ["PPLCNetV2_base_ShiTu"]
def PPLCNetV2_base_ShiTu(pretrained=False, use_ssld=False, **kwargs):
"""
An variant network of PPLCNetV2_base
1. remove ReLU layer after last_conv
2. add bias to last_conv
3. change stride to 1 in last two RepDepthwiseSeparable Block
"""
model = PPLCNetV2_base(pretrained=False, use_ssld=use_ssld, **kwargs)
def remove_ReLU_function(conv, pattern):
new_conv = Identity()
return new_conv
def add_bias_last_conv(conv, pattern):
new_conv = Conv2D(
in_channels=conv._in_channels,
out_channels=conv._out_channels,
kernel_size=conv._kernel_size,
stride=conv._stride,
padding=conv._padding,
groups=conv._groups,
bias_attr=True)
return new_conv
def last_stride_function(rep_block, pattern):
new_conv = RepDepthwiseSeparable(
in_channels=rep_block.in_channels,
out_channels=rep_block.out_channels,
stride=1,
dw_size=rep_block.dw_size,
split_pw=rep_block.split_pw,
use_rep=rep_block.use_rep,
use_se=rep_block.use_se,
use_shortcut=rep_block.use_shortcut)
return new_conv
pattern_act = ["act"]
pattern_lastconv = ["last_conv"]
pattern_last_stride = [
"stages[3][0]",
"stages[3][1]",
]
model.upgrade_sublayer(pattern_act, remove_ReLU_function)
model.upgrade_sublayer(pattern_lastconv, add_bias_last_conv)
model.upgrade_sublayer(pattern_last_stride, last_stride_function)
# load params again after upgrade some layers
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNetV2_base"], use_ssld)
return model
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 20
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: cae_base_patch16_224
class_num: 102
drop_rate: 0.0
drop_path_rate: 0.1
attn_drop_rate: 0.0
use_mean_pooling: True
init_scale: 0.001
use_rel_pos_bias: True
use_abs_pos_emb: False
init_values: 0.1
lin_probe: False
sin_pos_emb: True
abs_pos_emb: False
enable_linear_eval: False
model_key: model|module|state_dict
rel_pos_bias: True
model_ema:
enable_model_ema: False
model_ema_decay: 0.9999
model_ema_force_cpu: False
pretrained: True
# loss function config for traing/eval process
Loss:
Train:
- SoftTargetCrossEntropy:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: AdamWDL
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
layerwise_decay: 0.65
lr:
name: Cosine
learning_rate: 0.001
eta_min: 1e-6
warmup_epoch: 10
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/flowers102/
cls_label_path: ./dataset/flowers102/train_list.txt
batch_transform_ops:
- MixupCutmixHybrid:
mixup_alpha: 0.8
cutmix_alpha: 1.0
switch_prob: 0.5
num_classes: 102
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
interpolation: bilinear
- RandFlipImage:
flip_code: 1
- RandAugment:
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.3
r1: 0.3
sampler:
name: DistributedBatchSampler
batch_size: 16
drop_last: True
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/flowers102/
cls_label_path: ./dataset/flowers102/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 16
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 20
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: cae_large_patch16_224
class_num: 102
drop_rate: 0.0
drop_path_rate: 0.2
attn_drop_rate: 0.0
use_mean_pooling: True
init_scale: 0.001
use_rel_pos_bias: True
use_abs_pos_emb: False
init_values: 0.1
lin_probe: False
sin_pos_emb: True
abs_pos_emb: False
enable_linear_eval: False
model_key: model|module|state_dict
rel_pos_bias: True
model_ema:
enable_model_ema: False
model_ema_decay: 0.9999
model_ema_force_cpu: False
pretrained: True
# loss function config for traing/eval process
Loss:
Train:
- SoftTargetCrossEntropy:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: AdamWDL
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
layerwise_decay: 0.75
lr:
name: Cosine
learning_rate: 0.001
eta_min: 1e-6
warmup_epoch: 10
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/flowers102/
cls_label_path: ./dataset/flowers102/train_list.txt
batch_transform_ops:
- MixupCutmixHybrid:
mixup_alpha: 0.8
cutmix_alpha: 1.0
switch_prob: 0.5
num_classes: 102
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
interpolation: bilinear
- RandFlipImage:
flip_code: 1
- RandAugment:
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.3
r1: 0.3
sampler:
name: DistributedBatchSampler
batch_size: 16
drop_last: True
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/flowers102/
cls_label_path: ./dataset/flowers102/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 16
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.5, 0.5, 0.5]
std: [0.5, 0.5, 0.5]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]
...@@ -57,10 +57,9 @@ Optimizer: ...@@ -57,10 +57,9 @@ Optimizer:
learning_rate: 0.04 learning_rate: 0.04
warmup_epoch: 5 warmup_epoch: 5
regularizer: regularizer:
name: 'L2' name: "L2"
coeff: 0.00001 coeff: 0.00001
# data loader for train and eval # data loader for train and eval
DataLoader: DataLoader:
Train: Train:
...@@ -80,7 +79,7 @@ DataLoader: ...@@ -80,7 +79,7 @@ DataLoader:
scale: 1.0/255.0 scale: 1.0/255.0
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
...@@ -107,7 +106,7 @@ DataLoader: ...@@ -107,7 +106,7 @@ DataLoader:
scale: 0.00392157 scale: 0.00392157
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
batch_size: 64 batch_size: 64
...@@ -132,7 +131,7 @@ DataLoader: ...@@ -132,7 +131,7 @@ DataLoader:
scale: 0.00392157 scale: 0.00392157
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
batch_size: 64 batch_size: 64
...@@ -146,3 +145,4 @@ Metric: ...@@ -146,3 +145,4 @@ Metric:
Eval: Eval:
- Recallk: - Recallk:
topk: [1, 5] topk: [1, 5]
- mAP: {}
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 20
use_visualdl: False
eval_mode: retrieval
retrieval_feature_from: features # 'backbone' or 'features'
re_ranking: False
use_dali: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
AMP:
scale_loss: 65536
use_dynamic_loss_scaling: True
# O1: mixed fp16
level: O1
# model architecture
Arch:
name: RecModel
infer_output_key: features
infer_add_softmax: False
Backbone:
name: PPLCNetV2_base_ShiTu
pretrained: True
use_ssld: True
class_expand: &feat_dim 512
BackboneStopLayer:
name: flatten
Neck:
name: BNNeck
num_features: *feat_dim
weight_attr:
initializer:
name: Constant
value: 1.0
bias_attr:
initializer:
name: Constant
value: 0.0
learning_rate: 1.0e-20 # NOTE: Temporarily set lr small enough to freeze the bias to zero
Head:
name: FC
embedding_size: *feat_dim
class_num: 192612
weight_attr:
initializer:
name: Normal
std: 0.001
bias_attr: False
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
- TripletAngularMarginLoss:
weight: 1.0
feature_from: features
margin: 0.5
reduction: mean
add_absolute: True
absolute_loss_weight: 0.1
normalize_feature: True
ap_value: 0.8
an_value: 0.4
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.06 # for 8gpu x 256bs
warmup_epoch: 5
regularizer:
name: L2
coeff: 0.00001
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/train_reg_all_data_v2.txt
relabel: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- RandFlipImage:
flip_code: 1
- Pad:
padding: 10
backend: cv2
- RandCropImageV2:
size: [224, 224]
- RandomRotation:
prob: 0.5
degrees: 90
interpolation: bilinear
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: PKSampler
batch_size: 256
sample_per_id: 4
drop_last: False
shuffle: True
sample_method: "id_avg_prob"
id_list: [50030, 80700, 92019, 96015] # be careful when set relabel=True
ratio: [4, 4]
loader:
num_workers: 4
use_shared_memory: True
Eval:
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/
cls_label_path: ./dataset/Aliproduct/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/
cls_label_path: ./dataset/Aliproduct/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Metric:
Eval:
- Recallk:
topk: [1, 5]
- mAP: {}
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/r34_r18_wsl
device: "gpu"
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
# model architecture
Arch:
name: "DistillationModel"
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
models:
- Teacher:
name: ResNet34
pretrained: True
- Student:
name: ResNet18
pretrained: False
infer_model_name: "Student"
# loss function config for traing/eval process
Loss:
Train:
- DistillationGTCELoss:
weight: 1.0
model_names: ["Student"]
- DistillationWSLLoss:
weight: 2.5
model_name_pairs: [["Student", "Teacher"]]
temperature: 2
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
weight_decay: 1e-4
lr:
name: MultiStepDecay
learning_rate: 0.1
milestones: [30, 60, 90]
step_each_epoch: 1
gamma: 0.1
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: "./dataset/ILSVRC2012/"
cls_label_path: "./dataset/ILSVRC2012/train_list.txt"
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: "./dataset/ILSVRC2012/"
cls_label_path: "./dataset/ILSVRC2012/val_list.txt"
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: "docs/images/inference_deployment/whl_demo.jpg"
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt"
Metric:
Train:
- DistillationTopkAcc:
model_key: "Student"
topk: [1, 5]
Eval:
- DistillationTopkAcc:
model_key: "Student"
topk: [1, 5]
...@@ -142,6 +142,8 @@ Infer: ...@@ -142,6 +142,8 @@ Infer:
channel_first: False channel_first: False
- ResizeImage: - ResizeImage:
resize_short: 236 resize_short: 236
interpolation: bicubic
backend: pil
- CropImage: - CropImage:
size: 224 size: 224
- NormalizeImage: - NormalizeImage:
......
...@@ -142,6 +142,8 @@ Infer: ...@@ -142,6 +142,8 @@ Infer:
channel_first: False channel_first: False
- ResizeImage: - ResizeImage:
resize_short: 236 resize_short: 236
interpolation: bicubic
backend: pil
- CropImage: - CropImage:
size: 224 size: 224
- NormalizeImage: - NormalizeImage:
......
...@@ -142,6 +142,8 @@ Infer: ...@@ -142,6 +142,8 @@ Infer:
channel_first: False channel_first: False
- ResizeImage: - ResizeImage:
resize_short: 232 resize_short: 232
interpolation: bicubic
backend: pil
- CropImage: - CropImage:
size: 224 size: 224
- NormalizeImage: - NormalizeImage:
......
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: "./output/"
device: "gpu"
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 20
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
use_multilabel: True
# model architecture
Arch:
name: "PPLCNet_x1_0"
pretrained: True
use_ssld: True
class_num: 6
# loss function config for traing/eval process
Loss:
Train:
- MultiLabelLoss:
weight: 1.0
weight_ratio: True
size_sum: True
Eval:
- MultiLabelLoss:
weight: 1.0
weight_ratio: True
size_sum: True
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: "dataset/table_attribute/"
cls_label_path: "dataset/table_attribute/train_list.txt"
label_ratio: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: True
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: MultiLabelDataset
image_root: "dataset/table_attribute/"
cls_label_path: "dataset/table_attribute/val_list.txt"
label_ratio: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: deploy/images/PULC/table_attribute/val_3610.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: TableAttribute
source_threshold: 0.5
number_threshold: 0.5
color_threshold: 0.5
clarity_threshold : 0.5
obstruction_threshold: 0.5
angle_threshold: 0.5
Metric:
Eval:
- ATTRMetric:
...@@ -60,7 +60,7 @@ Optimizer: ...@@ -60,7 +60,7 @@ Optimizer:
verbose: False verbose: False
last_epoch: -1 last_epoch: -1
regularizer: regularizer:
name: 'L2' name: "L2"
coeff: 0.0005 coeff: 0.0005
# data loader for train and eval # data loader for train and eval
...@@ -82,7 +82,7 @@ DataLoader: ...@@ -82,7 +82,7 @@ DataLoader:
scale: 0.00392157 scale: 0.00392157
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
- RandomErasing: - RandomErasing:
EPSILON: 0.5 EPSILON: 0.5
sl: 0.02 sl: 0.02
...@@ -115,7 +115,7 @@ DataLoader: ...@@ -115,7 +115,7 @@ DataLoader:
scale: 0.00392157 scale: 0.00392157
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
batch_size: 64 batch_size: 64
...@@ -140,7 +140,7 @@ DataLoader: ...@@ -140,7 +140,7 @@ DataLoader:
scale: 1.0/255.0 scale: 1.0/255.0
mean: [0.485, 0.456, 0.406] mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225] std: [0.229, 0.224, 0.225]
order: '' order: ""
sampler: sampler:
name: DistributedBatchSampler name: DistributedBatchSampler
batch_size: 64 batch_size: 64
...@@ -155,4 +155,3 @@ Metric: ...@@ -155,4 +155,3 @@ Metric:
- Recallk: - Recallk:
topk: [1, 5] topk: [1, 5]
- mAP: {} - mAP: {}
...@@ -72,7 +72,12 @@ def build_dataloader(config, mode, device, use_dali=False, seed=None): ...@@ -72,7 +72,12 @@ def build_dataloader(config, mode, device, use_dali=False, seed=None):
# build dataset # build dataset
if use_dali: if use_dali:
from ppcls.data.dataloader.dali import dali_dataloader from ppcls.data.dataloader.dali import dali_dataloader
return dali_dataloader(config, mode, paddle.device.get_device(), seed) return dali_dataloader(
config,
mode,
paddle.device.get_device(),
num_threads=config[mode]['loader']["num_workers"],
seed=seed)
class_num = config.get("class_num", None) class_num = config.get("class_num", None)
config_dataset = config[mode]['dataset'] config_dataset = config[mode]['dataset']
......
...@@ -143,7 +143,7 @@ class HybridValPipe(Pipeline): ...@@ -143,7 +143,7 @@ class HybridValPipe(Pipeline):
return self.epoch_size("Reader") return self.epoch_size("Reader")
def dali_dataloader(config, mode, device, seed=None): def dali_dataloader(config, mode, device, num_threads=4, seed=None):
assert "gpu" in device, "gpu training is required for DALI" assert "gpu" in device, "gpu training is required for DALI"
device_id = int(device.split(':')[1]) device_id = int(device.split(':')[1])
config_dataloader = config[mode] config_dataloader = config[mode]
...@@ -248,6 +248,7 @@ def dali_dataloader(config, mode, device, seed=None): ...@@ -248,6 +248,7 @@ def dali_dataloader(config, mode, device, seed=None):
device_id, device_id,
shard_id, shard_id,
num_shards, num_shards,
num_threads=num_threads,
seed=seed + shard_id, seed=seed + shard_id,
pad_output=pad_output, pad_output=pad_output,
output_dtype=output_dtype) output_dtype=output_dtype)
...@@ -270,6 +271,7 @@ def dali_dataloader(config, mode, device, seed=None): ...@@ -270,6 +271,7 @@ def dali_dataloader(config, mode, device, seed=None):
device_id=device_id, device_id=device_id,
shard_id=0, shard_id=0,
num_shards=1, num_shards=1,
num_threads=num_threads,
seed=seed, seed=seed,
pad_output=pad_output, pad_output=pad_output,
output_dtype=output_dtype) output_dtype=output_dtype)
...@@ -298,6 +300,7 @@ def dali_dataloader(config, mode, device, seed=None): ...@@ -298,6 +300,7 @@ def dali_dataloader(config, mode, device, seed=None):
device_id=device_id, device_id=device_id,
shard_id=shard_id, shard_id=shard_id,
num_shards=num_shards, num_shards=num_shards,
num_threads=num_threads,
pad_output=pad_output, pad_output=pad_output,
output_dtype=output_dtype) output_dtype=output_dtype)
else: else:
...@@ -311,6 +314,7 @@ def dali_dataloader(config, mode, device, seed=None): ...@@ -311,6 +314,7 @@ def dali_dataloader(config, mode, device, seed=None):
mean, mean,
std, std,
device_id=device_id, device_id=device_id,
num_threads=num_threads,
pad_output=pad_output, pad_output=pad_output,
output_dtype=output_dtype) output_dtype=output_dtype)
pipe.build() pipe.build()
......
...@@ -21,27 +21,54 @@ from .common_dataset import CommonDataset ...@@ -21,27 +21,54 @@ from .common_dataset import CommonDataset
class ImageNetDataset(CommonDataset): class ImageNetDataset(CommonDataset):
def __init__( """ImageNetDataset
self,
Args:
image_root (str): image root, path to `ILSVRC2012`
cls_label_path (str): path to annotation file `train_list.txt` or 'val_list.txt`
transform_ops (list, optional): list of transform op(s). Defaults to None.
delimiter (str, optional): delimiter. Defaults to None.
relabel (bool, optional): whether do relabel when original label do not starts from 0 or are discontinuous. Defaults to False.
"""
def __init__(self,
image_root, image_root,
cls_label_path, cls_label_path,
transform_ops=None, transform_ops=None,
delimiter=None): delimiter=None,
relabel=False):
self.delimiter = delimiter if delimiter is not None else " " self.delimiter = delimiter if delimiter is not None else " "
super(ImageNetDataset, self).__init__(image_root, cls_label_path, transform_ops) self.relabel = relabel
super(ImageNetDataset, self).__init__(image_root, cls_label_path,
transform_ops)
def _load_anno(self, seed=None): def _load_anno(self, seed=None):
assert os.path.exists(self._cls_path) assert os.path.exists(
assert os.path.exists(self._img_root) self._cls_path), f"path {self._cls_path} does not exist."
assert os.path.exists(
self._img_root), f"path {self._img_root} does not exist."
self.images = [] self.images = []
self.labels = [] self.labels = []
with open(self._cls_path) as fd: with open(self._cls_path) as fd:
lines = fd.readlines() lines = fd.readlines()
if self.relabel:
label_set = set()
for line in lines:
line = line.strip().split(self.delimiter)
label_set.add(np.int64(line[1]))
label_map = {
oldlabel: newlabel
for newlabel, oldlabel in enumerate(label_set)
}
if seed is not None: if seed is not None:
np.random.RandomState(seed).shuffle(lines) np.random.RandomState(seed).shuffle(lines)
for l in lines: for line in lines:
l = l.strip().split(self.delimiter) line = line.strip().split(self.delimiter)
self.images.append(os.path.join(self._img_root, l[0])) self.images.append(os.path.join(self._img_root, line[0]))
self.labels.append(np.int64(l[1])) if self.relabel:
assert os.path.exists(self.images[-1]) self.labels.append(label_map[np.int64(line[1])])
else:
self.labels.append(np.int64(line[1]))
assert os.path.exists(self.images[
-1]), f"path {self.images[-1]} does not exist."
...@@ -32,17 +32,23 @@ class PKSampler(DistributedBatchSampler): ...@@ -32,17 +32,23 @@ class PKSampler(DistributedBatchSampler):
batch_size (int): batch size batch_size (int): batch size
sample_per_id (int): number of instance(s) within an class sample_per_id (int): number of instance(s) within an class
shuffle (bool, optional): _description_. Defaults to True. shuffle (bool, optional): _description_. Defaults to True.
id_list(list): list of (start_id, end_id, start_id, end_id) for set of ids to duplicated.
ratio(list): list of (ratio1, ratio2..) the duplication number for ids in id_list.
drop_last (bool, optional): whether to discard the data at the end. Defaults to True. drop_last (bool, optional): whether to discard the data at the end. Defaults to True.
sample_method (str, optional): sample method when generating prob_list. Defaults to "sample_avg_prob". sample_method (str, optional): sample method when generating prob_list. Defaults to "sample_avg_prob".
""" """
def __init__(self, def __init__(self,
dataset, dataset,
batch_size, batch_size,
sample_per_id, sample_per_id,
shuffle=True, shuffle=True,
drop_last=True, drop_last=True,
id_list=None,
ratio=None,
sample_method="sample_avg_prob"): sample_method="sample_avg_prob"):
super().__init__(dataset, batch_size, shuffle=shuffle, drop_last=drop_last) super().__init__(
dataset, batch_size, shuffle=shuffle, drop_last=drop_last)
assert batch_size % sample_per_id == 0, \ assert batch_size % sample_per_id == 0, \
f"PKSampler configs error, sample_per_id({sample_per_id}) must be a divisor of batch_size({batch_size})." f"PKSampler configs error, sample_per_id({sample_per_id}) must be a divisor of batch_size({batch_size})."
assert hasattr(self.dataset, assert hasattr(self.dataset,
...@@ -67,6 +73,16 @@ class PKSampler(DistributedBatchSampler): ...@@ -67,6 +73,16 @@ class PKSampler(DistributedBatchSampler):
logger.error( logger.error(
"PKSampler only support id_avg_prob and sample_avg_prob sample method, " "PKSampler only support id_avg_prob and sample_avg_prob sample method, "
"but receive {}.".format(self.sample_method)) "but receive {}.".format(self.sample_method))
if id_list and ratio:
assert len(id_list) % 2 == 0 and len(id_list) == len(ratio) * 2
for i in range(len(self.prob_list)):
for j in range(len(ratio)):
if i >= id_list[j * 2] and i <= id_list[j * 2 + 1]:
self.prob_list[i] = self.prob_list[i] * ratio[j]
break
self.prob_list = self.prob_list / sum(self.prob_list)
diff = np.abs(sum(self.prob_list) - 1) diff = np.abs(sum(self.prob_list) - 1)
if diff > 0.00000001: if diff > 0.00000001:
self.prob_list[-1] = 1 - sum(self.prob_list[:-1]) self.prob_list[-1] = 1 - sum(self.prob_list[:-1])
...@@ -74,8 +90,8 @@ class PKSampler(DistributedBatchSampler): ...@@ -74,8 +90,8 @@ class PKSampler(DistributedBatchSampler):
logger.error("PKSampler prob list error") logger.error("PKSampler prob list error")
else: else:
logger.info( logger.info(
"PKSampler: sum of prob list not equal to 1, diff is {}, change the last prob".format(diff) "PKSampler: sum of prob list not equal to 1, diff is {}, change the last prob".
) format(diff))
def __iter__(self): def __iter__(self):
label_per_batch = self.batch_size // self.sample_per_label label_per_batch = self.batch_size // self.sample_per_label
......
...@@ -89,11 +89,7 @@ class CompCars(Dataset): ...@@ -89,11 +89,7 @@ class CompCars(Dataset):
class VeriWild(Dataset): class VeriWild(Dataset):
def __init__( def __init__(self, image_root, cls_label_path, transform_ops=None):
self,
image_root,
cls_label_path,
transform_ops=None, ):
self._img_root = image_root self._img_root = image_root
self._cls_path = cls_label_path self._cls_path = cls_label_path
if transform_ops: if transform_ops:
...@@ -102,19 +98,23 @@ class VeriWild(Dataset): ...@@ -102,19 +98,23 @@ class VeriWild(Dataset):
self._load_anno() self._load_anno()
def _load_anno(self): def _load_anno(self):
assert os.path.exists(self._cls_path) assert os.path.exists(
assert os.path.exists(self._img_root) self._cls_path), f"path {self._cls_path} does not exist."
assert os.path.exists(
self._img_root), f"path {self._img_root} does not exist."
self.images = [] self.images = []
self.labels = [] self.labels = []
self.cameras = [] self.cameras = []
with open(self._cls_path) as fd: with open(self._cls_path) as fd:
lines = fd.readlines() lines = fd.readlines()
for l in lines: for line in lines:
l = l.strip().split() line = line.strip().split()
self.images.append(os.path.join(self._img_root, l[0])) self.images.append(os.path.join(self._img_root, line[0]))
self.labels.append(np.int64(l[1])) self.labels.append(np.int64(line[1]))
self.cameras.append(np.int64(l[2])) if len(line) >= 3:
self.cameras.append(np.int64(line[2]))
assert os.path.exists(self.images[-1]) assert os.path.exists(self.images[-1])
self.has_camera = len(self.cameras) > 0
def __getitem__(self, idx): def __getitem__(self, idx):
try: try:
...@@ -123,7 +123,10 @@ class VeriWild(Dataset): ...@@ -123,7 +123,10 @@ class VeriWild(Dataset):
if self._transform_ops: if self._transform_ops:
img = transform(img, self._transform_ops) img = transform(img, self._transform_ops)
img = img.transpose((2, 0, 1)) img = img.transpose((2, 0, 1))
if self.has_camera:
return (img, self.labels[idx], self.cameras[idx]) return (img, self.labels[idx], self.cameras[idx])
else:
return (img, self.labels[idx])
except Exception as ex: except Exception as ex:
logger.error("Exception occured when parse line: {} with msg: {}". logger.error("Exception occured when parse line: {} with msg: {}".
format(self.images[idx], ex)) format(self.images[idx], ex))
......
...@@ -21,6 +21,7 @@ from .threshoutput import ThreshOutput, MultiLabelThreshOutput ...@@ -21,6 +21,7 @@ from .threshoutput import ThreshOutput, MultiLabelThreshOutput
from .attr_rec import VehicleAttribute, PersonAttribute from .attr_rec import VehicleAttribute, PersonAttribute
def build_postprocess(config): def build_postprocess(config):
config = copy.deepcopy(config) config = copy.deepcopy(config)
model_name = config.pop("name") model_name = config.pop("name")
......
...@@ -71,7 +71,6 @@ class VehicleAttribute(object): ...@@ -71,7 +71,6 @@ class VehicleAttribute(object):
return batch_res return batch_res
class PersonAttribute(object): class PersonAttribute(object):
def __init__(self, def __init__(self,
threshold=0.5, threshold=0.5,
...@@ -171,3 +170,58 @@ class PersonAttribute(object): ...@@ -171,3 +170,58 @@ class PersonAttribute(object):
batch_res.append({"attributes": label_res, "output": pred_res}) batch_res.append({"attributes": label_res, "output": pred_res})
return batch_res return batch_res
class TableAttribute(object):
def __init__(
self,
source_threshold=0.5,
number_threshold=0.5,
color_threshold=0.5,
clarity_threshold=0.5,
obstruction_threshold=0.5,
angle_threshold=0.5, ):
self.source_threshold = source_threshold
self.number_threshold = number_threshold
self.color_threshold = color_threshold
self.clarity_threshold = clarity_threshold
self.obstruction_threshold = obstruction_threshold
self.angle_threshold = angle_threshold
def __call__(self, x, file_names=None):
if isinstance(x, dict):
x = x['logits']
assert isinstance(x, paddle.Tensor)
if file_names is not None:
assert x.shape[0] == len(file_names)
x = F.sigmoid(x).numpy()
# postprocess output of predictor
batch_res = []
for idx, res in enumerate(x):
res = res.tolist()
label_res = []
source = 'Scanned' if res[0] > self.source_threshold else 'Photo'
number = 'Little' if res[1] > self.number_threshold else 'Numerous'
color = 'Black-and-White' if res[
2] > self.color_threshold else 'Multicolor'
clarity = 'Clear' if res[3] > self.clarity_threshold else 'Blurry'
obstruction = 'Without-Obstacles' if res[
4] > self.number_threshold else 'With-Obstacles'
angle = 'Horizontal' if res[
5] > self.number_threshold else 'Tilted'
label_res = [source, number, color, clarity, obstruction, angle]
threshold_list = [
self.source_threshold, self.number_threshold,
self.color_threshold, self.clarity_threshold,
self.obstruction_threshold, self.angle_threshold
]
pred_res = (np.array(res) > np.array(threshold_list)
).astype(np.int8).tolist()
batch_res.append({
"attributes": label_res,
"output": pred_res,
"file_name": file_names[idx]
})
return batch_res
...@@ -38,9 +38,11 @@ from ppcls.data.preprocess.ops.operators import CropWithPadding ...@@ -38,9 +38,11 @@ from ppcls.data.preprocess.ops.operators import CropWithPadding
from ppcls.data.preprocess.ops.operators import RandomInterpolationAugment from ppcls.data.preprocess.ops.operators import RandomInterpolationAugment
from ppcls.data.preprocess.ops.operators import ColorJitter from ppcls.data.preprocess.ops.operators import ColorJitter
from ppcls.data.preprocess.ops.operators import RandomCropImage from ppcls.data.preprocess.ops.operators import RandomCropImage
from ppcls.data.preprocess.ops.operators import RandomRotation
from ppcls.data.preprocess.ops.operators import Padv2 from ppcls.data.preprocess.ops.operators import Padv2
from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator
from ppcls.data.preprocess.batch_ops.batch_operators import MixupCutmixHybrid
import numpy as np import numpy as np
from PIL import Image from PIL import Image
......
...@@ -23,6 +23,9 @@ import numpy as np ...@@ -23,6 +23,9 @@ import numpy as np
from ppcls.utils import logger from ppcls.utils import logger
from ppcls.data.preprocess.ops.fmix import sample_mask from ppcls.data.preprocess.ops.fmix import sample_mask
import paddle
import paddle.nn.functional as F
class BatchOperator(object): class BatchOperator(object):
""" BatchOperator """ """ BatchOperator """
...@@ -229,3 +232,270 @@ class OpSampler(object): ...@@ -229,3 +232,270 @@ class OpSampler(object):
list(self.ops.keys()), weights=list(self.ops.values()), k=1)[0] list(self.ops.keys()), weights=list(self.ops.values()), k=1)[0]
# return batch directly when None Op # return batch directly when None Op
return op(batch) if op else batch return op(batch) if op else batch
class MixupCutmixHybrid(object):
""" Mixup/Cutmix that applies different params to each element or whole batch
Args:
mixup_alpha (float): mixup alpha value, mixup is active if > 0.
cutmix_alpha (float): cutmix alpha value, cutmix is active if > 0.
cutmix_minmax (List[float]): cutmix min/max image ratio, cutmix is active and uses this vs alpha if not None.
prob (float): probability of applying mixup or cutmix per batch or element
switch_prob (float): probability of switching to cutmix instead of mixup when both are active
mode (str): how to apply mixup/cutmix params (per 'batch', 'pair' (pair of elements), 'elem' (element)
correct_lam (bool): apply lambda correction when cutmix bbox clipped by image borders
label_smoothing (float): apply label smoothing to the mixed target tensor
num_classes (int): number of classes for target
"""
def __init__(self,
mixup_alpha=1.,
cutmix_alpha=0.,
cutmix_minmax=None,
prob=1.0,
switch_prob=0.5,
mode='batch',
correct_lam=True,
label_smoothing=0.1,
num_classes=4):
self.mixup_alpha = mixup_alpha
self.cutmix_alpha = cutmix_alpha
self.cutmix_minmax = cutmix_minmax
if self.cutmix_minmax is not None:
assert len(self.cutmix_minmax) == 2
# force cutmix alpha == 1.0 when minmax active to keep logic simple & safe
self.cutmix_alpha = 1.0
self.mix_prob = prob
self.switch_prob = switch_prob
self.label_smoothing = label_smoothing
self.num_classes = num_classes
self.mode = mode
self.correct_lam = correct_lam # correct lambda based on clipped area for cutmix
self.mixup_enabled = True # set to false to disable mixing (intended tp be set by train loop)
def _one_hot(self, x, num_classes, on_value=1., off_value=0.):
x = paddle.cast(x, dtype='int64')
on_value = paddle.full([x.shape[0], num_classes], on_value)
off_value = paddle.full([x.shape[0], num_classes], off_value)
return paddle.where(
F.one_hot(x, num_classes) == 1, on_value, off_value)
def _mixup_target(self, target, num_classes, lam=1., smoothing=0.0):
off_value = smoothing / num_classes
on_value = 1. - smoothing + off_value
y1 = self._one_hot(
target,
num_classes,
on_value=on_value,
off_value=off_value, )
y2 = self._one_hot(
target.flip(0),
num_classes,
on_value=on_value,
off_value=off_value)
return y1 * lam + y2 * (1. - lam)
def _rand_bbox(self, img_shape, lam, margin=0., count=None):
""" Standard CutMix bounding-box
Generates a random square bbox based on lambda value. This impl includes
support for enforcing a border margin as percent of bbox dimensions.
Args:
img_shape (tuple): Image shape as tuple
lam (float): Cutmix lambda value
margin (float): Percentage of bbox dimension to enforce as margin (reduce amount of box outside image)
count (int): Number of bbox to generate
"""
ratio = np.sqrt(1 - lam)
img_h, img_w = img_shape[-2:]
cut_h, cut_w = int(img_h * ratio), int(img_w * ratio)
margin_y, margin_x = int(margin * cut_h), int(margin * cut_w)
cy = np.random.randint(0 + margin_y, img_h - margin_y, size=count)
cx = np.random.randint(0 + margin_x, img_w - margin_x, size=count)
yl = np.clip(cy - cut_h // 2, 0, img_h)
yh = np.clip(cy + cut_h // 2, 0, img_h)
xl = np.clip(cx - cut_w // 2, 0, img_w)
xh = np.clip(cx + cut_w // 2, 0, img_w)
return yl, yh, xl, xh
def _rand_bbox_minmax(self, img_shape, minmax, count=None):
""" Min-Max CutMix bounding-box
Inspired by Darknet cutmix impl, generates a random rectangular bbox
based on min/max percent values applied to each dimension of the input image.
Typical defaults for minmax are usually in the .2-.3 for min and .8-.9 range for max.
Args:
img_shape (tuple): Image shape as tuple
minmax (tuple or list): Min and max bbox ratios (as percent of image size)
count (int): Number of bbox to generate
"""
assert len(minmax) == 2
img_h, img_w = img_shape[-2:]
cut_h = np.random.randint(
int(img_h * minmax[0]), int(img_h * minmax[1]), size=count)
cut_w = np.random.randint(
int(img_w * minmax[0]), int(img_w * minmax[1]), size=count)
yl = np.random.randint(0, img_h - cut_h, size=count)
xl = np.random.randint(0, img_w - cut_w, size=count)
yu = yl + cut_h
xu = xl + cut_w
return yl, yu, xl, xu
def _cutmix_bbox_and_lam(self,
img_shape,
lam,
ratio_minmax=None,
correct_lam=True,
count=None):
""" Generate bbox and apply lambda correction.
"""
if ratio_minmax is not None:
yl, yu, xl, xu = self._rand_bbox_minmax(
img_shape, ratio_minmax, count=count)
else:
yl, yu, xl, xu = self._rand_bbox(img_shape, lam, count=count)
if correct_lam or ratio_minmax is not None:
bbox_area = (yu - yl) * (xu - xl)
lam = 1. - bbox_area / float(img_shape[-2] * img_shape[-1])
return (yl, yu, xl, xu), lam
def _params_per_elem(self, batch_size):
lam = np.ones(batch_size, dtype=np.float32)
use_cutmix = np.zeros(batch_size, dtype=np.bool)
if self.mixup_enabled:
if self.mixup_alpha > 0. and self.cutmix_alpha > 0.:
use_cutmix = np.random.rand(batch_size) < self.switch_prob
lam_mix = np.where(
use_cutmix,
np.random.beta(
self.cutmix_alpha, self.cutmix_alpha, size=batch_size),
np.random.beta(
self.mixup_alpha, self.mixup_alpha, size=batch_size))
elif self.mixup_alpha > 0.:
lam_mix = np.random.beta(
self.mixup_alpha, self.mixup_alpha, size=batch_size)
elif self.cutmix_alpha > 0.:
use_cutmix = np.ones(batch_size, dtype=np.bool)
lam_mix = np.random.beta(
self.cutmix_alpha, self.cutmix_alpha, size=batch_size)
else:
assert False, "One of mixup_alpha > 0., cutmix_alpha > 0., cutmix_minmax not None should be true."
lam = np.where(
np.random.rand(batch_size) < self.mix_prob,
lam_mix.astype(np.float32), lam)
return lam, use_cutmix
def _params_per_batch(self):
lam = 1.
use_cutmix = False
if self.mixup_enabled and np.random.rand() < self.mix_prob:
if self.mixup_alpha > 0. and self.cutmix_alpha > 0.:
use_cutmix = np.random.rand() < self.switch_prob
lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha) if use_cutmix else \
np.random.beta(self.mixup_alpha, self.mixup_alpha)
elif self.mixup_alpha > 0.:
lam_mix = np.random.beta(self.mixup_alpha, self.mixup_alpha)
elif self.cutmix_alpha > 0.:
use_cutmix = True
lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha)
else:
assert False, "One of mixup_alpha > 0., cutmix_alpha > 0., cutmix_minmax not None should be true."
lam = float(lam_mix)
return lam, use_cutmix
def _mix_elem(self, x):
batch_size = len(x)
lam_batch, use_cutmix = self._params_per_elem(batch_size)
x_orig = x.clone(
) # need to keep an unmodified original for mixing source
for i in range(batch_size):
j = batch_size - i - 1
lam = lam_batch[i]
if lam != 1.:
if use_cutmix[i]:
(yl, yh, xl, xh), lam = self._cutmix_bbox_and_lam(
x[i].shape,
lam,
ratio_minmax=self.cutmix_minmax,
correct_lam=self.correct_lam)
if yl < yh and xl < xh:
x[i][:, yl:yh, xl:xh] = x_orig[j][:, yl:yh, xl:xh]
lam_batch[i] = lam
else:
x[i] = x[i] * lam + x_orig[j] * (1 - lam)
return paddle.to_tensor(lam_batch, dtype=x.dtype).unsqueeze(1)
def _mix_pair(self, x):
batch_size = len(x)
lam_batch, use_cutmix = self._params_per_elem(batch_size // 2)
x_orig = x.clone(
) # need to keep an unmodified original for mixing source
for i in range(batch_size // 2):
j = batch_size - i - 1
lam = lam_batch[i]
if lam != 1.:
if use_cutmix[i]:
(yl, yh, xl, xh), lam = self._cutmix_bbox_and_lam(
x[i].shape,
lam,
ratio_minmax=self.cutmix_minmax,
correct_lam=self.correct_lam)
if yl < yh and xl < xh:
x[i][:, yl:yh, xl:xh] = x_orig[j][:, yl:yh, xl:xh]
x[j][:, yl:yh, xl:xh] = x_orig[i][:, yl:yh, xl:xh]
lam_batch[i] = lam
else:
x[i] = x[i] * lam + x_orig[j] * (1 - lam)
x[j] = x[j] * lam + x_orig[i] * (1 - lam)
lam_batch = np.concatenate((lam_batch, lam_batch[::-1]))
return paddle.to_tensor(lam_batch, dtype=x.dtype).unsqueeze(1)
def _mix_batch(self, x):
lam, use_cutmix = self._params_per_batch()
if lam == 1.:
return 1.
if use_cutmix:
(yl, yh, xl, xh), lam = self._cutmix_bbox_and_lam(
x.shape,
lam,
ratio_minmax=self.cutmix_minmax,
correct_lam=self.correct_lam)
if yl < yh and xl < xh:
x[:, :, yl:yh, xl:xh] = x.flip(0)[:, :, yl:yh, xl:xh]
else:
x_flipped = x.flip(0) * (1. - lam)
x[:] = x * lam + x_flipped
return lam
def _unpack(self, batch):
""" _unpack """
assert isinstance(batch, list), \
'batch should be a list filled with tuples (img, label)'
bs = len(batch)
assert bs > 0, 'size of the batch data should > 0'
#imgs, labels = list(zip(*batch))
imgs = []
labels = []
for item in batch:
imgs.append(item[0])
labels.append(item[1])
return np.array(imgs), np.array(labels), bs
def __call__(self, batch):
x, target, bs = self._unpack(batch)
x = paddle.to_tensor(x)
target = paddle.to_tensor(target)
assert len(x) % 2 == 0, 'Batch size should be even when using this'
if self.mode == 'elem':
lam = self._mix_elem(x)
elif self.mode == 'pair':
lam = self._mix_pair(x)
else:
lam = self._mix_batch(x)
target = self._mixup_target(target, self.num_classes, lam,
self.label_smoothing)
return list(zip(x.numpy(), target.numpy()))
...@@ -26,6 +26,7 @@ import cv2 ...@@ -26,6 +26,7 @@ import cv2
import numpy as np import numpy as np
from PIL import Image, ImageOps, __version__ as PILLOW_VERSION from PIL import Image, ImageOps, __version__ as PILLOW_VERSION
from paddle.vision.transforms import ColorJitter as RawColorJitter from paddle.vision.transforms import ColorJitter as RawColorJitter
from paddle.vision.transforms import RandomRotation as RawRandomRotation
from paddle.vision.transforms import ToTensor, Normalize, RandomHorizontalFlip, RandomResizedCrop from paddle.vision.transforms import ToTensor, Normalize, RandomHorizontalFlip, RandomResizedCrop
from paddle.vision.transforms import functional as F from paddle.vision.transforms import functional as F
from .autoaugment import ImageNetPolicy from .autoaugment import ImageNetPolicy
...@@ -181,7 +182,8 @@ class DecodeImage(object): ...@@ -181,7 +182,8 @@ class DecodeImage(object):
img = np.asarray(img)[:, :, ::-1] # BRG img = np.asarray(img)[:, :, ::-1] # BRG
if self.to_rgb: if self.to_rgb:
assert img.shape[2] == 3, f"invalid shape of image[{img.shape}]" assert img.shape[
2] == 3, f"invalid shape of image[{img.shape}]"
img = img[:, :, ::-1] img = img[:, :, ::-1]
if self.channel_first: if self.channel_first:
...@@ -495,7 +497,13 @@ class RandFlipImage(object): ...@@ -495,7 +497,13 @@ class RandFlipImage(object):
if isinstance(img, np.ndarray): if isinstance(img, np.ndarray):
return cv2.flip(img, self.flip_code) return cv2.flip(img, self.flip_code)
else: else:
if self.flip_code == 1:
return img.transpose(Image.FLIP_LEFT_RIGHT) return img.transpose(Image.FLIP_LEFT_RIGHT)
elif self.flip_code == 0:
return img.transpose(Image.FLIP_TOP_BOTTOM)
else:
return img.transpose(Image.FLIP_LEFT_RIGHT).transpose(
Image.FLIP_LEFT_RIGHT)
else: else:
return img return img
...@@ -653,17 +661,38 @@ class ColorJitter(RawColorJitter): ...@@ -653,17 +661,38 @@ class ColorJitter(RawColorJitter):
return img return img
class RandomRotation(RawRandomRotation):
"""RandomRotation.
"""
def __init__(self, prob=0.5, *args, **kwargs):
super().__init__(*args, **kwargs)
self.prob = prob
def __call__(self, img):
if np.random.random() < self.prob:
img = super()._apply_image(img)
return img
class Pad(object): class Pad(object):
""" """
Pads the given PIL.Image on all sides with specified padding mode and fill value. Pads the given PIL.Image on all sides with specified padding mode and fill value.
adapted from: https://pytorch.org/vision/stable/_modules/torchvision/transforms/transforms.html#Pad adapted from: https://pytorch.org/vision/stable/_modules/torchvision/transforms/transforms.html#Pad
""" """
def __init__(self, padding: int, fill: int=0, def __init__(self,
padding_mode: str="constant"): padding: int,
fill: int=0,
padding_mode: str="constant",
backend: str="pil"):
self.padding = padding self.padding = padding
self.fill = fill self.fill = fill
self.padding_mode = padding_mode self.padding_mode = padding_mode
self.backend = backend
assert backend in [
"pil", "cv2"
], f"backend must in ['pil', 'cv2'], but got {backend}"
def _parse_fill(self, fill, img, min_pil_version, name="fillcolor"): def _parse_fill(self, fill, img, min_pil_version, name="fillcolor"):
# Process fill color for affine transforms # Process fill color for affine transforms
...@@ -698,11 +727,21 @@ class Pad(object): ...@@ -698,11 +727,21 @@ class Pad(object):
return {name: fill} return {name: fill}
def __call__(self, img): def __call__(self, img):
if self.backend == "pil":
opts = self._parse_fill(self.fill, img, "2.3.0", name="fill") opts = self._parse_fill(self.fill, img, "2.3.0", name="fill")
if img.mode == "P": if img.mode == "P":
palette = img.getpalette() palette = img.getpalette()
img = ImageOps.expand(img, border=self.padding, **opts) img = ImageOps.expand(img, border=self.padding, **opts)
img.putpalette(palette) img.putpalette(palette)
return img return img
return ImageOps.expand(img, border=self.padding, **opts) return ImageOps.expand(img, border=self.padding, **opts)
else:
img = cv2.copyMakeBorder(
img,
self.padding,
self.padding,
self.padding,
self.padding,
cv2.BORDER_CONSTANT,
value=(self.fill, self.fill, self.fill))
return img
...@@ -114,6 +114,7 @@ class Engine(object): ...@@ -114,6 +114,7 @@ class Engine(object):
#TODO(gaotingquan): support rec #TODO(gaotingquan): support rec
class_num = config["Arch"].get("class_num", None) class_num = config["Arch"].get("class_num", None)
self.config["DataLoader"].update({"class_num": class_num}) self.config["DataLoader"].update({"class_num": class_num})
# build dataloader # build dataloader
if self.mode == 'train': if self.mode == 'train':
self.train_dataloader = build_dataloader( self.train_dataloader = build_dataloader(
......
...@@ -25,32 +25,35 @@ from ppcls.utils import logger ...@@ -25,32 +25,35 @@ from ppcls.utils import logger
def retrieval_eval(engine, epoch_id=0): def retrieval_eval(engine, epoch_id=0):
engine.model.eval() engine.model.eval()
# step1. build gallery # step1. build query & gallery
if engine.gallery_query_dataloader is not None: if engine.gallery_query_dataloader is not None:
gallery_feas, gallery_img_id, gallery_unique_id = cal_feature( gallery_feas, gallery_img_id, gallery_unique_id = cal_feature(
engine, name='gallery_query') engine, name='gallery_query')
query_feas, query_img_id, query_query_id = gallery_feas, gallery_img_id, gallery_unique_id query_feas, query_img_id, query_unique_id = gallery_feas, gallery_img_id, gallery_unique_id
else: else:
gallery_feas, gallery_img_id, gallery_unique_id = cal_feature( gallery_feas, gallery_img_id, gallery_unique_id = cal_feature(
engine, name='gallery') engine, name='gallery')
query_feas, query_img_id, query_query_id = cal_feature( query_feas, query_img_id, query_unique_id = cal_feature(
engine, name='query') engine, name='query')
# step2. do evaluation # step2. split data into blocks so as to save memory
sim_block_size = engine.config["Global"].get("sim_block_size", 64) sim_block_size = engine.config["Global"].get("sim_block_size", 64)
sections = [sim_block_size] * (len(query_feas) // sim_block_size) sections = [sim_block_size] * (len(query_feas) // sim_block_size)
if len(query_feas) % sim_block_size: if len(query_feas) % sim_block_size:
sections.append(len(query_feas) % sim_block_size) sections.append(len(query_feas) % sim_block_size)
fea_blocks = paddle.split(query_feas, num_or_sections=sections) fea_blocks = paddle.split(query_feas, num_or_sections=sections)
if query_query_id is not None: if query_unique_id is not None:
query_id_blocks = paddle.split( query_unique_id_blocks = paddle.split(
query_query_id, num_or_sections=sections) query_unique_id, num_or_sections=sections)
image_id_blocks = paddle.split(query_img_id, num_or_sections=sections) query_img_id_blocks = paddle.split(query_img_id, num_or_sections=sections)
metric_key = None metric_key = None
# step3. do evaluation
if engine.eval_loss_func is None: if engine.eval_loss_func is None:
metric_dict = {metric_key: 0.} metric_dict = {metric_key: 0.}
else: else:
# do evaluation with re-ranking(k-reciprocal)
reranking_flag = engine.config['Global'].get('re_ranking', False) reranking_flag = engine.config['Global'].get('re_ranking', False)
logger.info(f"re_ranking={reranking_flag}") logger.info(f"re_ranking={reranking_flag}")
metric_dict = dict() metric_dict = dict()
...@@ -70,9 +73,9 @@ def retrieval_eval(engine, epoch_id=0): ...@@ -70,9 +73,9 @@ def retrieval_eval(engine, epoch_id=0):
query_feas, gallery_feas, k1=20, k2=6, lambda_value=0.3) query_feas, gallery_feas, k1=20, k2=6, lambda_value=0.3)
# compute keep mask # compute keep mask
query_id_mask = (query_query_id != gallery_unique_id.t()) unique_id_mask = (query_unique_id != gallery_unique_id.t())
image_id_mask = (query_img_id != gallery_img_id.t()) image_id_mask = (query_img_id != gallery_img_id.t())
keep_mask = paddle.logical_or(query_id_mask, image_id_mask) keep_mask = paddle.logical_or(image_id_mask, unique_id_mask)
# set inf(1e9) distance to those exist in gallery # set inf(1e9) distance to those exist in gallery
distmat = distmat * keep_mask.astype("float32") distmat = distmat * keep_mask.astype("float32")
...@@ -85,24 +88,27 @@ def retrieval_eval(engine, epoch_id=0): ...@@ -85,24 +88,27 @@ def retrieval_eval(engine, epoch_id=0):
for key in metric_tmp: for key in metric_tmp:
metric_dict[key] = metric_tmp[key] metric_dict[key] = metric_tmp[key]
else: else:
# do evaluation without re-ranking
for block_idx, block_fea in enumerate(fea_blocks): for block_idx, block_fea in enumerate(fea_blocks):
similarity_matrix = paddle.matmul( similarity_matrix = paddle.matmul(
block_fea, gallery_feas, transpose_y=True) # [n,m] block_fea, gallery_feas, transpose_y=True) # [n,m]
if query_query_id is not None: if query_unique_id is not None:
query_id_block = query_id_blocks[block_idx] query_unique_id_block = query_unique_id_blocks[block_idx]
query_id_mask = (query_id_block != gallery_unique_id.t()) unique_id_mask = (
query_unique_id_block != gallery_unique_id.t())
image_id_block = image_id_blocks[block_idx] query_img_id_block = query_img_id_blocks[block_idx]
image_id_mask = (image_id_block != gallery_img_id.t()) image_id_mask = (query_img_id_block != gallery_img_id.t())
keep_mask = paddle.logical_or(query_id_mask, image_id_mask) keep_mask = paddle.logical_or(image_id_mask,
unique_id_mask)
similarity_matrix = similarity_matrix * keep_mask.astype( similarity_matrix = similarity_matrix * keep_mask.astype(
"float32") "float32")
else: else:
keep_mask = None keep_mask = None
metric_tmp = engine.eval_metric_func( metric_tmp = engine.eval_metric_func(
similarity_matrix, image_id_blocks[block_idx], similarity_matrix, query_img_id_blocks[block_idx],
gallery_img_id, keep_mask) gallery_img_id, keep_mask)
for key in metric_tmp: for key in metric_tmp:
......
...@@ -12,10 +12,12 @@ from .msmloss import MSMLoss ...@@ -12,10 +12,12 @@ from .msmloss import MSMLoss
from .npairsloss import NpairsLoss from .npairsloss import NpairsLoss
from .trihardloss import TriHardLoss from .trihardloss import TriHardLoss
from .triplet import TripletLoss, TripletLossV2 from .triplet import TripletLoss, TripletLossV2
from .tripletangularmarginloss import TripletAngularMarginLoss
from .supconloss import SupConLoss from .supconloss import SupConLoss
from .pairwisecosface import PairwiseCosface from .pairwisecosface import PairwiseCosface
from .dmlloss import DMLLoss from .dmlloss import DMLLoss
from .distanceloss import DistanceLoss from .distanceloss import DistanceLoss
from .softtargetceloss import SoftTargetCrossEntropy
from .distillationloss import DistillationCELoss from .distillationloss import DistillationCELoss
from .distillationloss import DistillationGTCELoss from .distillationloss import DistillationGTCELoss
...@@ -24,6 +26,7 @@ from .distillationloss import DistillationDistanceLoss ...@@ -24,6 +26,7 @@ from .distillationloss import DistillationDistanceLoss
from .distillationloss import DistillationRKDLoss from .distillationloss import DistillationRKDLoss
from .distillationloss import DistillationKLDivLoss from .distillationloss import DistillationKLDivLoss
from .distillationloss import DistillationDKDLoss from .distillationloss import DistillationDKDLoss
from .distillationloss import DistillationWSLLoss
from .distillationloss import DistillationMultiLabelLoss from .distillationloss import DistillationMultiLabelLoss
from .distillationloss import DistillationDISTLoss from .distillationloss import DistillationDISTLoss
from .distillationloss import DistillationPairLoss from .distillationloss import DistillationPairLoss
......
...@@ -22,6 +22,7 @@ from .distanceloss import DistanceLoss ...@@ -22,6 +22,7 @@ from .distanceloss import DistanceLoss
from .rkdloss import RKdAngle, RkdDistance from .rkdloss import RKdAngle, RkdDistance
from .kldivloss import KLDivLoss from .kldivloss import KLDivLoss
from .dkdloss import DKDLoss from .dkdloss import DKDLoss
from .wslloss import WSLLoss
from .dist_loss import DISTLoss from .dist_loss import DISTLoss
from .multilabelloss import MultiLabelLoss from .multilabelloss import MultiLabelLoss
from .mgd_loss import MGDLoss from .mgd_loss import MGDLoss
...@@ -262,6 +263,34 @@ class DistillationDKDLoss(DKDLoss): ...@@ -262,6 +263,34 @@ class DistillationDKDLoss(DKDLoss):
return loss_dict return loss_dict
class DistillationWSLLoss(WSLLoss):
"""
DistillationWSLLoss
"""
def __init__(self,
model_name_pairs=[],
key=None,
temperature=2.0,
name="wsl_loss"):
super().__init__(temperature)
self.model_name_pairs = model_name_pairs
self.key = key
self.name = name
def forward(self, predicts, batch):
loss_dict = dict()
for idx, pair in enumerate(self.model_name_pairs):
out1 = predicts[pair[0]]
out2 = predicts[pair[1]]
if self.key is not None:
out1 = out1[self.key]
out2 = out2[self.key]
loss = super().forward(out1, out2, batch)
loss_dict[f"{self.name}_{pair[0]}_{pair[1]}"] = loss
return loss_dict
class DistillationMultiLabelLoss(MultiLabelLoss): class DistillationMultiLabelLoss(MultiLabelLoss):
""" """
DistillationMultiLabelLoss DistillationMultiLabelLoss
......
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class SoftTargetCrossEntropy(nn.Layer):
def __init__(self):
super().__init__()
def forward(self, x, target):
loss = paddle.sum(-target * F.log_softmax(x, axis=-1), axis=-1)
loss = loss.mean()
return {"SoftTargetCELoss": loss}
def __str__(self, ):
return type(self).__name__
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.nn as nn
class TripletAngularMarginLoss(nn.Layer):
"""A more robust triplet loss with hard positive/negative mining on angular margin instead of relative distance between d(a,p) and d(a,n).
Args:
margin (float, optional): angular margin. Defaults to 0.5.
normalize_feature (bool, optional): whether to apply L2-norm in feature before computing distance(cos-similarity). Defaults to True.
reduction (str, optional): reducing option within an batch . Defaults to "mean".
add_absolute (bool, optional): whether add absolute loss within d(a,p) or d(a,n). Defaults to False.
absolute_loss_weight (float, optional): weight for absolute loss. Defaults to 1.0.
ap_value (float, optional): weight for d(a, p). Defaults to 0.9.
an_value (float, optional): weight for d(a, n). Defaults to 0.5.
feature_from (str, optional): which key feature from. Defaults to "features".
"""
def __init__(self,
margin=0.5,
normalize_feature=True,
reduction="mean",
add_absolute=False,
absolute_loss_weight=1.0,
ap_value=0.9,
an_value=0.5,
feature_from="features"):
super(TripletAngularMarginLoss, self).__init__()
self.margin = margin
self.feature_from = feature_from
self.ranking_loss = paddle.nn.loss.MarginRankingLoss(
margin=margin, reduction=reduction)
self.normalize_feature = normalize_feature
self.add_absolute = add_absolute
self.ap_value = ap_value
self.an_value = an_value
self.absolute_loss_weight = absolute_loss_weight
def forward(self, input, target):
"""
Args:
inputs: feature matrix with shape (batch_size, feat_dim)
target: ground truth labels with shape (num_classes)
"""
inputs = input[self.feature_from]
if self.normalize_feature:
inputs = paddle.divide(
inputs, paddle.norm(
inputs, p=2, axis=-1, keepdim=True))
bs = inputs.shape[0]
# compute distance(cos-similarity)
dist = paddle.matmul(inputs, inputs.t())
# hard negative mining
is_pos = paddle.expand(target, (
bs, bs)).equal(paddle.expand(target, (bs, bs)).t())
is_neg = paddle.expand(target, (
bs, bs)).not_equal(paddle.expand(target, (bs, bs)).t())
# `dist_ap` means distance(anchor, positive)
# both `dist_ap` and `relative_p_inds` with shape [N, 1]
dist_ap = paddle.min(paddle.reshape(
paddle.masked_select(dist, is_pos), (bs, -1)),
axis=1,
keepdim=True)
# `dist_an` means distance(anchor, negative)
# both `dist_an` and `relative_n_inds` with shape [N, 1]
dist_an = paddle.max(paddle.reshape(
paddle.masked_select(dist, is_neg), (bs, -1)),
axis=1,
keepdim=True)
# shape [N]
dist_ap = paddle.squeeze(dist_ap, axis=1)
dist_an = paddle.squeeze(dist_an, axis=1)
# Compute ranking hinge loss
y = paddle.ones_like(dist_an)
loss = self.ranking_loss(dist_ap, dist_an, y)
if self.add_absolute:
absolut_loss_ap = self.ap_value - dist_ap
absolut_loss_ap = paddle.where(absolut_loss_ap > 0,
absolut_loss_ap,
paddle.zeros_like(absolut_loss_ap))
absolut_loss_an = dist_an - self.an_value
absolut_loss_an = paddle.where(absolut_loss_an > 0,
absolut_loss_an,
paddle.ones_like(absolut_loss_an))
loss = (absolut_loss_an.mean() + absolut_loss_ap.mean()
) * self.absolute_loss_weight + loss.mean()
return {"TripletAngularMarginLoss": loss}
# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
class WSLLoss(nn.Layer):
"""
Weighted Soft Labels Loss
paper: https://arxiv.org/pdf/2102.00650.pdf
code reference: https://github.com/bellymonster/Weighted-Soft-Label-Distillation
"""
def __init__(self, temperature=2.0, use_target_as_gt=False):
super().__init__()
self.temperature = temperature
self.use_target_as_gt = use_target_as_gt
def forward(self, logits_student, logits_teacher, target=None):
"""Compute weighted soft labels loss.
Args:
logits_student: student's logits with shape (batch_size, num_classes)
logits_teacher: teacher's logits with shape (batch_size, num_classes)
target: ground truth labels with shape (batch_size)
"""
if target is None or self.use_target_as_gt:
target = logits_teacher.argmax(axis=-1)
target = F.one_hot(
target.reshape([-1]), num_classes=logits_student[0].shape[0])
s_input_for_softmax = logits_student / self.temperature
t_input_for_softmax = logits_teacher / self.temperature
ce_loss_s = -paddle.sum(target *
F.log_softmax(logits_student.detach()),
axis=1)
ce_loss_t = -paddle.sum(target *
F.log_softmax(logits_teacher.detach()),
axis=1)
ratio = ce_loss_s / (ce_loss_t + 1e-7)
ratio = paddle.maximum(ratio, paddle.zeros_like(ratio))
kd_loss = -paddle.sum(F.softmax(t_input_for_softmax) *
F.log_softmax(s_input_for_softmax),
axis=1)
weight = 1 - paddle.exp(-ratio)
weighted_kd_loss = (self.temperature**2) * paddle.mean(kd_loss *
weight)
return weighted_kd_loss
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from cmath import nan
import numpy as np import numpy as np
import paddle import paddle
import paddle.nn as nn import paddle.nn as nn
...@@ -97,6 +98,11 @@ class mAP(nn.Layer): ...@@ -97,6 +98,11 @@ class mAP(nn.Layer):
num_rel = paddle.greater_than(num_rel, paddle.to_tensor(0.)) num_rel = paddle.greater_than(num_rel, paddle.to_tensor(0.))
num_rel_index = paddle.nonzero(num_rel.astype("int")) num_rel_index = paddle.nonzero(num_rel.astype("int"))
num_rel_index = paddle.reshape(num_rel_index, [num_rel_index.shape[0]]) num_rel_index = paddle.reshape(num_rel_index, [num_rel_index.shape[0]])
if paddle.numel(num_rel_index).item() == 0:
metric_dict["mAP"] = np.nan
return metric_dict
equal_flag = paddle.index_select(equal_flag, num_rel_index, axis=0) equal_flag = paddle.index_select(equal_flag, num_rel_index, axis=0)
acc_sum = paddle.cumsum(equal_flag, axis=1) acc_sum = paddle.cumsum(equal_flag, axis=1)
......
...@@ -272,3 +272,145 @@ class AdamW(object): ...@@ -272,3 +272,145 @@ class AdamW(object):
def _apply_decay_param_fun(self, name): def _apply_decay_param_fun(self, name):
return name not in self.no_weight_decay_param_name_list return name not in self.no_weight_decay_param_name_list
class AdamWDL(object):
"""
The AdamWDL optimizer is implemented based on the AdamW Optimization with dynamic lr setting.
Generally it's used for transformer model.
"""
def __init__(self,
learning_rate=0.001,
beta1=0.9,
beta2=0.999,
epsilon=1e-8,
weight_decay=None,
multi_precision=False,
grad_clip=None,
layerwise_decay=None,
filter_bias_and_bn=True,
**args):
self.learning_rate = learning_rate
self.beta1 = beta1
self.beta2 = beta2
self.epsilon = epsilon
self.grad_clip = grad_clip
self.weight_decay = weight_decay
self.multi_precision = multi_precision
self.layerwise_decay = layerwise_decay
self.filter_bias_and_bn = filter_bias_and_bn
class AdamWDLImpl(optim.AdamW):
def __init__(self,
learning_rate=0.001,
beta1=0.9,
beta2=0.999,
epsilon=1e-8,
parameters=None,
weight_decay=0.01,
apply_decay_param_fun=None,
grad_clip=None,
lazy_mode=False,
multi_precision=False,
layerwise_decay=1.0,
n_layers=12,
name_dict=None,
name=None):
if not isinstance(layerwise_decay, float) and \
not isinstance(layerwise_decay, fluid.framework.Variable):
raise TypeError("coeff should be float or Tensor.")
self.layerwise_decay = layerwise_decay
self.name_dict = name_dict
self.n_layers = n_layers
self.set_param_lr_fun = self._layerwise_lr_decay
super().__init__(
learning_rate=learning_rate,
parameters=parameters,
beta1=beta1,
beta2=beta2,
epsilon=epsilon,
grad_clip=grad_clip,
name=name,
apply_decay_param_fun=apply_decay_param_fun,
weight_decay=weight_decay,
lazy_mode=lazy_mode,
multi_precision=multi_precision)
def _append_optimize_op(self, block, param_and_grad):
if self.set_param_lr_fun is None:
return super(AdamLW, self)._append_optimize_op(block,
param_and_grad)
self._append_decoupled_weight_decay(block, param_and_grad)
prev_lr = param_and_grad[0].optimize_attr["learning_rate"]
self.set_param_lr_fun(self.layerwise_decay, self.name_dict,
self.n_layers, param_and_grad[0])
# excute Adam op
res = super(optim.AdamW, self)._append_optimize_op(block,
param_and_grad)
param_and_grad[0].optimize_attr["learning_rate"] = prev_lr
return res
# Layerwise decay
def _layerwise_lr_decay(self, decay_rate, name_dict, n_layers, param):
"""
Args:
decay_rate (float):
The layer-wise decay ratio.
name_dict (dict):
The keys of name_dict is dynamic name of model while the value
of name_dict is static name.
Use model.named_parameters() to get name_dict.
n_layers (int):
Total number of layers in the transformer encoder.
"""
ratio = 1.0
static_name = name_dict[param.name]
if "blocks" in static_name:
idx = static_name.find("blocks.")
layer = int(static_name[idx:].split(".")[1])
ratio = decay_rate**(n_layers - layer)
elif "embed" in static_name:
ratio = decay_rate**(n_layers + 1)
param.optimize_attr["learning_rate"] *= ratio
def __call__(self, model_list):
model = model_list[0]
if self.weight_decay and self.filter_bias_and_bn:
skip = {}
if hasattr(model, 'no_weight_decay'):
skip = model.no_weight_decay()
decay_dict = {
param.name: not (len(param.shape) == 1 or
name.endswith(".bias") or name in skip)
for name, param in model.named_parameters()
if not 'teacher' in name
}
parameters = [
param for param in model.parameters()
if 'teacher' not in param.name
]
weight_decay = 0.
else:
parameters = model.parameters()
opt_args = dict(
learning_rate=self.learning_rate, weight_decay=self.weight_decay)
opt_args['parameters'] = parameters
if decay_dict is not None:
opt_args['apply_decay_param_fun'] = lambda n: decay_dict[n]
opt_args['epsilon'] = self.epsilon
opt_args['beta1'] = self.beta1
opt_args['beta2'] = self.beta2
if self.layerwise_decay and self.layerwise_decay < 1.0:
opt_args['layerwise_decay'] = self.layerwise_decay
name_dict = dict()
for n, p in model.named_parameters():
name_dict[p.name] = n
opt_args['name_dict'] = name_dict
opt_args['n_layers'] = model.get_num_layers()
optimizer = self.AdamWDLImpl(**opt_args)
return optimizer
...@@ -110,7 +110,6 @@ bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/MobileNetV3/Mo ...@@ -110,7 +110,6 @@ bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/MobileNetV3/Mo
- [test_train_pact_inference_python 使用](docs/test_train_pact_inference_python.md):测试基于Python的模型PACT在线量化等基本功能。 - [test_train_pact_inference_python 使用](docs/test_train_pact_inference_python.md):测试基于Python的模型PACT在线量化等基本功能。
- [test_train_ptq_inference_python 使用](docs/test_train_ptq_inference_python.md):测试基于Python的模型KL离线量化等基本功能。 - [test_train_ptq_inference_python 使用](docs/test_train_ptq_inference_python.md):测试基于Python的模型KL离线量化等基本功能。
- [test_inference_cpp 使用](docs/test_inference_cpp.md) :测试基于C++的模型推理。 - [test_inference_cpp 使用](docs/test_inference_cpp.md) :测试基于C++的模型推理。
- [test_serving 使用](docs/test_serving.md) :测试基于Paddle Serving的服务化部署功能。
- [test_lite_arm_cpu_cpp 使用](docs/test_lite_arm_cpu_cpp.md): 测试基于Paddle-Lite的ARM CPU端c++预测部署功能. - [test_lite_arm_cpu_cpp 使用](docs/test_lite_arm_cpu_cpp.md): 测试基于Paddle-Lite的ARM CPU端c++预测部署功能.
- [test_paddle2onnx 使用](docs/test_paddle2onnx.md):测试Paddle2ONNX的模型转化功能,并验证正确性。 - [test_paddle2onnx 使用](docs/test_paddle2onnx.md):测试Paddle2ONNX的模型转化功能,并验证正确性。
- [test_serving_infer_python 使用](docs/test_serving_infer_python.md):测试python serving功能。 - [test_serving_infer_python 使用](docs/test_serving_infer_python.md):测试python serving功能。
......
...@@ -179,6 +179,11 @@ for batch_size in ${batch_size_list[*]}; do ...@@ -179,6 +179,11 @@ for batch_size in ${batch_size_list[*]}; do
func_sed_params "$FILENAME" "${line_epoch}" "$epoch" func_sed_params "$FILENAME" "${line_epoch}" "$epoch"
gpu_id=$(set_gpu_id $device_num) gpu_id=$(set_gpu_id $device_num)
# It is needed that using dali, NHWC and 4 channels when training ResNet50 with AMPO2
if [[ $model_name == "ResNet50" && $precision == "fp16" ]]; then
sed -i "s/ResNet50.yaml/ResNet50_amp_O2_ultra.yaml/g" $FILENAME
fi
# if bs is big, then copy train_list.txt to generate more train log # if bs is big, then copy train_list.txt to generate more train log
# At least 25 log number would be good to calculate ips for benchmark system. # At least 25 log number would be good to calculate ips for benchmark system.
# So the copy number for train_list is as follows: # So the copy number for train_list is as follows:
......
...@@ -77,9 +77,10 @@ function status_check(){ ...@@ -77,9 +77,10 @@ function status_check(){
run_command=$2 run_command=$2
run_log=$3 run_log=$3
model_name=$4 model_name=$4
log_path=$5
if [ $last_status -eq 0 ]; then if [ $last_status -eq 0 ]; then
echo -e "\033[33m Run successfully with command - ${model_name} - ${run_command}! \033[0m" | tee -a ${run_log} echo -e "\033[33m Run successfully with command - ${model_name} - ${run_command} - ${log_path} ! \033[0m" | tee -a ${run_log}
else else
echo -e "\033[33m Run failed with command - ${model_name} - ${run_command}! \033[0m" | tee -a ${run_log} echo -e "\033[33m Run failed with command - ${model_name} - ${run_command} - ${log_path} ! \033[0m" | tee -a ${run_log}
fi fi
} }
...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r ...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r
infer_model:../inference/ infer_model:../inference/
infer_export:True infer_export:True
infer_quant:Fasle infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer"
-o Global.use_gpu:True|False -o Global.use_gpu:True|False
-o Global.enable_mkldnn:False -o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1 -o Global.cpu_num_threads:1
......
...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r ...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r
infer_model:../inference/ infer_model:../inference/
infer_export:True infer_export:True
infer_quant:Fasle infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer"
-o Global.use_gpu:True|False -o Global.use_gpu:True|False
-o Global.enable_mkldnn:False -o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1 -o Global.cpu_num_threads:1
......
...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r ...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r
infer_model:../inference/ infer_model:../inference/
infer_export:True infer_export:True
infer_quant:Fasle infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer"
-o Global.use_gpu:True|False -o Global.use_gpu:True|False
-o Global.enable_mkldnn:False -o Global.enable_mkldnn:False
-o Global.cpu_num_threads:6 -o Global.cpu_num_threads:6
......
...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r ...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r
infer_model:../inference/ infer_model:../inference/
infer_export:True infer_export:True
infer_quant:Fasle infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer"
-o Global.use_gpu:True|False -o Global.use_gpu:True|False
-o Global.enable_mkldnn:False -o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1 -o Global.cpu_num_threads:1
......
...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r ...@@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r
infer_model:./general_PPLCNet_x2_5_lite_v1.0_infer/ infer_model:./general_PPLCNet_x2_5_lite_v1.0_infer/
infer_export:True infer_export:True
infer_quant:Fasle infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer"
-o Global.use_gpu:True|False -o Global.use_gpu:True|False
-o Global.enable_mkldnn:False -o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1 -o Global.cpu_num_threads:1
......
===========================paddle2onnx_params===========================
model_name:GeneralRecognitionV2_PPLCNetV2_base
python:python3.7
2onnx: paddle2onnx
--model_dir:./deploy/models/general_PPLCNetV2_base_pretrained_v1.0_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--save_file:./deploy/models/general_PPLCNetV2_base_pretrained_v1.0_infer/inference.onnx
--opset_version:10
--enable_onnx_checker:True
inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
inference:./python/predict_rec.py
Global.use_onnx:True
Global.rec_inference_model_dir:./models/general_PPLCNetV2_base_pretrained_v1.0_infer
Global.use_gpu:False
-c:configs/inference_rec.yaml
\ No newline at end of file
===========================train_params===========================
model_name:GeneralRecognitionV2_PPLCNetV2_base
python:python3.7
gpu_list:0|0,1
-o Global.device:gpu
-o Global.auto_cast:null
-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
-o Global.output_dir:./output/
-o DataLoader.Train.sampler.batch_size:8
-o Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./dataset/ILSVRC2012/val
null:null
##
trainer:norm_train
norm_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Global.eval_during_train=False -o Global.save_interval=2 -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o DataLoader.Train.loader.sampler.batch_size=8
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
null:null
##
===========================infer_params==========================
-o Global.save_inference_dir:./inference
-o Global.pretrained_model:
norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
quant_export:null
fpgm_export:null
distill_export:null
kl_quant:null
export2:null
pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams
infer_model:../inference/
infer_export:True
infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml
-o Global.use_gpu:True|False
-o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1
-o Global.batch_size:1
-o Global.use_tensorrt:False
-o Global.use_fp16:False
-o Global.rec_inference_model_dir:../inference
-o Global.infer_imgs:../dataset/Aliproduct/demo_test/
-o Global.save_log_path:null
-o Global.benchmark:False
null:null
null:null
===========================train_benchmark_params==========================
batch_size:256
fp_items:fp32|fp16
epoch:1
--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile
flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,224,224]}]
\ No newline at end of file
===========================train_params===========================
model_name:GeneralRecognition_PPLCNet_x2_5
python:python3.7
gpu_list:192.168.0.1,192.168.0.2;0,1
-o Global.device:gpu
-o Global.auto_cast:null
-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120
-o Global.output_dir:./output/
-o DataLoader.Train.sampler.batch_size:8
-o Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./dataset/ILSVRC2012/val
null:null
##
trainer:norm_train
norm_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Global.eval_during_train=False -o Global.save_interval=2 -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o DataLoader.Train.loader.sampler.batch_size=8
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
null:null
##
===========================infer_params==========================
-o Global.save_inference_dir:./inference
-o Global.pretrained_model:
norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
quant_export:null
fpgm_export:null
distill_export:null
kl_quant:null
export2:null
pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams
infer_model:../inference/
infer_export:True
infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml
-o Global.use_gpu:True|False
-o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1
-o Global.batch_size:1
-o Global.use_tensorrt:False
-o Global.use_fp16:False
-o Global.rec_inference_model_dir:../inference
-o Global.infer_imgs:../dataset/Aliproduct/demo_test/
-o Global.save_log_path:null
-o Global.benchmark:False
null:null
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,224,224]}]
\ No newline at end of file
===========================train_params===========================
model_name:GeneralRecognitionV2_PPLCNetV2_base
python:python3.7
gpu_list:0|0,1
-o Global.device:gpu
-o Global.auto_cast:null
-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=100
-o Global.output_dir:./output/
-o DataLoader.Train.sampler.batch_size:8
-o Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./dataset/ILSVRC2012/val
null:null
##
trainer:amp_train
amp_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o AMP.scale_loss=65536 -o AMP.use_dynamic_loss_scaling=True -o AMP.level=O2 -o Optimizer.multi_precision=True -o Global.eval_during_train=False -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o DataLoader.Train.loader.sampler.batch_size=8
pact_train:null
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
null:null
##
===========================infer_params==========================
-o Global.save_inference_dir:./inference
-o Global.pretrained_model:
norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
quant_export:null
fpgm_export:null
distill_export:null
kl_quant:null
export2:null
pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams
infer_model:../inference/
infer_export:True
infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml
-o Global.use_gpu:True|False
-o Global.enable_mkldnn:False
-o Global.cpu_num_threads:6
-o Global.batch_size:1
-o Global.use_tensorrt:False
-o Global.use_fp16:False
-o Global.rec_inference_model_dir:../inference
-o Global.infer_imgs:../dataset/Aliproduct/demo_test/
-o Global.save_log_path:null
-o Global.benchmark:False
null:null
null:null
===========================train_params===========================
model_name:GeneralRecognitionV2_PPLCNetV2_base
python:python3.7
gpu_list:0
-o Global.device:gpu
-o Global.auto_cast:null
-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=100
-o Global.output_dir:./output/
-o DataLoader.Train.sampler.batch_size:8
-o Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./dataset/ILSVRC2012/val
null:null
##
trainer:pact_train
norm_train:null
pact_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Slim.quant.name=pact -o Optimizer.lr.learning_rate=0.006 -o Global.pretrained_model="pretrained_model/general_PPLCNetV2_base_pretrained_v1.0" -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o AMP=None -o DataLoader.Train.sampler.batch_size=8
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Slim.quant.name=pact
null:null
##
===========================infer_params==========================
-o Global.save_inference_dir:./inference
-o Global.pretrained_model:
norm_export:null
quant_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Slim.quant.name=pact
fpgm_export:null
distill_export:null
kl_quant:null
export2:null
pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams
infer_model:../inference/
infer_export:True
infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml
-o Global.use_gpu:True|False
-o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1
-o Global.batch_size:1
-o Global.use_tensorrt:False
-o Global.use_fp16:False
-o Global.rec_inference_model_dir:../inference
-o Global.infer_imgs:../dataset/Aliproduct/demo_test/
-o Global.save_log_path:null
-o Global.benchmark:True
null:null
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,224,224]}]
\ No newline at end of file
===========================train_params===========================
model_name:GeneralRecognitionV2_PPLCNetV2_base
python:python3.7
gpu_list:0
-o Global.device:gpu
-o Global.auto_cast:null
-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=100
-o Global.output_dir:./output/
-o DataLoader.Train.sampler.batch_size:8
-o Global.pretrained_model:null
train_model_name:latest
train_infer_img_dir:./dataset/ILSVRC2012/val
null:null
##
trainer:pact_train
norm_train:null
pact_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False
fpgm_train:null
distill_train:null
null:null
null:null
##
===========================eval_params===========================
eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
null:null
##
===========================infer_params==========================
-o Global.save_inference_dir:./inference
-o Global.pretrained_model:
norm_export:null
quant_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
fpgm_export:null
distill_export:null
kl_quant:deploy/slim/quant_post_static.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.save_inference_dir=./general_PPLCNetV2_base_pretrained_v1.0_infer
export2:null
pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
infer_model:./general_PPLCNetV2_base_pretrained_v1.0_infer
infer_export:True
infer_quant:Fasle
inference:python/predict_rec.py -c configs/inference_rec.yaml
-o Global.use_gpu:True|False
-o Global.enable_mkldnn:False
-o Global.cpu_num_threads:1
-o Global.batch_size:1
-o Global.use_tensorrt:False
-o Global.use_fp16:False
-o Global.rec_inference_model_dir:../inference
-o Global.infer_imgs:../dataset/Aliproduct/demo_test/
-o Global.save_log_path:null
-o Global.benchmark:False
null:null
null:null
===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,224,224]}]
\ No newline at end of file
===========================cpp_infer_params===========================
model_name:PPShiTuV2
cpp_infer_type:shitu
feature_inference_model_dir:./general_PPLCNetV2_base_pretrained_v1.0_infer/
det_inference_model_dir:./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
det_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
infer_quant:False
inference_cmd:./deploy/cpp_shitu/build/pp_shitu -c inference_drink.yaml
use_gpu:True|False
enable_mkldnn:False
cpu_threads:1
batch_size:1
use_tensorrt:False
precision:fp32
data_dir:./dataset/drink_dataset_v2.0
benchmark:True
generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py
transform_index_cmd:python3.7 deploy/cpp_shitu/tools/transform_id_map.py -c inference_drink.yaml
===========================serving_params===========================
model_name:PPShiTuV2
python:python3.7
cls_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
det_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
trans_model:-m paddle_serving_client.convert
--dirname:./models/general_PPLCNetV2_base_pretrained_v1.0_infer/
--dirname:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--serving_server:./models/general_PPLCNetV2_base_pretrained_v1.0_serving/
--serving_client:./models/general_PPLCNetV2_base_pretrained_v1.0_client/
--serving_server:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
--serving_client:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
serving_dir:./paddleserving/recognition
web_service:null
--use_gpu:0|null
pipline:test_cpp_serving_client.py
===========================serving_params===========================
model_name:PPShiTuV2
python:python3.7
cls_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar
det_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
trans_model:-m paddle_serving_client.convert
--dirname:./models/general_PPLCNetV2_base_pretrained_v1.0_infer/
--dirname:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--serving_server:./models/general_PPLCNetV2_base_pretrained_v1.0_serving/
--serving_client:./models/general_PPLCNetV2_base_pretrained_v1.0_client/
--serving_server:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
--serving_client:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
serving_dir:./paddleserving/recognition
web_service:recognition_web_service.py
--use_gpu:0|null
pipline:pipeline_http_client.py
===========================paddle2onnx_params===========================
model_name:PP-ShiTu_mainbody_det
python:python3.7
2onnx: paddle2onnx
--model_dir:./deploy/models/picodet_lcnet_x2_5_640_mainbody_infer/
--model_filename:inference.pdmodel
--params_filename:inference.pdiparams
--save_file:./deploy/models/picodet_lcnet_x2_5_640_mainbody_infer/inference.onnx
--opset_version:11
--enable_onnx_checker:True
inference_model_url:https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody_infer.tar
inference:null
Global.use_onnx:null
Global.inference_model_dir:null
Global.use_gpu:null
-c:null
\ No newline at end of file
...@@ -12,6 +12,7 @@ Linux GPU/CPU C++ 推理功能测试的主程序为`test_inference_cpp.sh`,可 ...@@ -12,6 +12,7 @@ Linux GPU/CPU C++ 推理功能测试的主程序为`test_inference_cpp.sh`,可
| MobileNetV3 | MobileNetV3_large_x1_0_KL | 支持 | 支持 | | MobileNetV3 | MobileNetV3_large_x1_0_KL | 支持 | 支持 |
| MobileNetV3 | MobileNetV3_large_x1_0_PACT | 支持 | 支持 | | MobileNetV3 | MobileNetV3_large_x1_0_PACT | 支持 | 支持 |
| PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 |
| PP-ShiTuV2 | PPShiTuV2_general_rec、PPShiTu_mainbody_det | 支持 | 支持 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 |
| PPHGNet | PPHGNet_small | 支持 | 支持 | | PPHGNet | PPHGNet_small | 支持 | 支持 |
......
...@@ -15,6 +15,7 @@ Linux GPU/CPU C++ 服务化部署测试的主程序为`test_serving_infer_cpp.sh ...@@ -15,6 +15,7 @@ Linux GPU/CPU C++ 服务化部署测试的主程序为`test_serving_infer_cpp.sh
| PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 |
| PP-ShiTuV2 | PPShiTuV2_general_rec、PPShiTu_mainbody_det | 支持 | 支持 |
| PPHGNet | PPHGNet_small | 支持 | 支持 | | PPHGNet | PPHGNet_small | 支持 | 支持 |
| PPHGNet | PPHGNet_small_KL | 支持 | 支持 | | PPHGNet | PPHGNet_small_KL | 支持 | 支持 |
| PPHGNet | PPHGNet_small_PACT | 支持 | 支持 | | PPHGNet | PPHGNet_small_PACT | 支持 | 支持 |
......
...@@ -15,6 +15,7 @@ Linux GPU/CPU PYTHON 服务化部署测试的主程序为`test_serving_infer_pyt ...@@ -15,6 +15,7 @@ Linux GPU/CPU PYTHON 服务化部署测试的主程序为`test_serving_infer_pyt
| PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 |
| PP-ShiTuV2 | PPShiTuV2_general_rec、PPShiTu_mainbody_det | 支持 | 支持 |
| PPHGNet | PPHGNet_small | 支持 | 支持 | | PPHGNet | PPHGNet_small | 支持 | 支持 |
| PPHGNet | PPHGNet_small_KL | 支持 | 支持 | | PPHGNet | PPHGNet_small_KL | 支持 | 支持 |
| PPHGNet | PPHGNet_small_PACT | 支持 | 支持 | | PPHGNet | PPHGNet_small_PACT | 支持 | 支持 |
......
...@@ -10,6 +10,7 @@ Linux GPU/CPU 混合精度训练推理测试的主程序为`test_train_inference ...@@ -10,6 +10,7 @@ Linux GPU/CPU 混合精度训练推理测试的主程序为`test_train_inference
| :-------------: | :-------------------------------------: | :----------: | :----------: | | :-------------: | :-------------------------------------: | :----------: | :----------: |
| MobileNetV3 | MobileNetV3_large_x1_0 | 混合精度训练 | 混合精度训练 | | MobileNetV3 | MobileNetV3_large_x1_0 | 混合精度训练 | 混合精度训练 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 混合精度训练 | 混合精度训练 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 混合精度训练 | 混合精度训练 |
| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | 混合精度训练 | 混合精度训练 |
| PPHGNet | PPHGNet_small | 混合精度训练 | 混合精度训练 | | PPHGNet | PPHGNet_small | 混合精度训练 | 混合精度训练 |
| PPHGNet | PPHGNet_tiny | 混合精度训练 | 混合精度训练 | | PPHGNet | PPHGNet_tiny | 混合精度训练 | 混合精度训练 |
| PPLCNet | PPLCNet_x0_25 | 混合精度训练 | 混合精度训练 | | PPLCNet | PPLCNet_x0_25 | 混合精度训练 | 混合精度训练 |
...@@ -31,6 +32,7 @@ Linux GPU/CPU 混合精度训练推理测试的主程序为`test_train_inference ...@@ -31,6 +32,7 @@ Linux GPU/CPU 混合精度训练推理测试的主程序为`test_train_inference
| :-------------: | :-------------------------------------: | :--------: | :--------: | :-------: | | :-------------: | :-------------------------------------: | :--------: | :--------: | :-------: |
| MobileNetV3 | MobileNetV3_large_x1_0 | 支持 | 支持 | 1 | | MobileNetV3 | MobileNetV3_large_x1_0 | 支持 | 支持 | 1 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 支持 | 支持 | 1 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 支持 | 支持 | 1 |
| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | 支持 | 支持 | 1 |
| PPHGNet | PPHGNet_small | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_small | 支持 | 支持 | 1 |
| PPHGNet | PPHGNet_tiny | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_tiny | 支持 | 支持 | 1 |
| PPLCNet | PPLCNet_x0_25 | 支持 | 支持 | 1 | | PPLCNet | PPLCNet_x0_25 | 支持 | 支持 | 1 |
......
...@@ -10,6 +10,7 @@ Linux GPU/CPU PACT量化训练推理测试的主程序为`test_train_inference_p ...@@ -10,6 +10,7 @@ Linux GPU/CPU PACT量化训练推理测试的主程序为`test_train_inference_p
| :-------------: | :-------------------------------------: | :----------: | | :-------------: | :-------------------------------------: | :----------: |
| MobileNetV3 | MobileNetV3_large_x1_0 | PACT量化训练 | | MobileNetV3 | MobileNetV3_large_x1_0 | PACT量化训练 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | PACT量化训练 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | PACT量化训练 |
| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | PACT量化训练 |
| PPHGNet | PPHGNet_small | PACT量化训练 | | PPHGNet | PPHGNet_small | PACT量化训练 |
| PPHGNet | PPHGNet_tiny | PACT量化训练 | | PPHGNet | PPHGNet_tiny | PACT量化训练 |
| PPLCNet | PPLCNet_x0_25 | PACT量化训练 | | PPLCNet | PPLCNet_x0_25 | PACT量化训练 |
...@@ -31,6 +32,7 @@ Linux GPU/CPU PACT量化训练推理测试的主程序为`test_train_inference_p ...@@ -31,6 +32,7 @@ Linux GPU/CPU PACT量化训练推理测试的主程序为`test_train_inference_p
| :-------------: | :-------------------------------------: | :--------: | :--------: | :-------: | | :-------------: | :-------------------------------------: | :--------: | :--------: | :-------: |
| MobileNetV3 | MobileNetV3_large_x1_0 | 支持 | 支持 | 1 | | MobileNetV3 | MobileNetV3_large_x1_0 | 支持 | 支持 | 1 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 支持 | 支持 | 1 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 支持 | 支持 | 1 |
| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | 支持 | 支持 | 1 |
| PPHGNet | PPHGNet_small | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_small | 支持 | 支持 | 1 |
| PPHGNet | PPHGNet_tiny | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_tiny | 支持 | 支持 | 1 |
| PPLCNet | PPLCNet_x0_25 | 支持 | 支持 | 1 | | PPLCNet | PPLCNet_x0_25 | 支持 | 支持 | 1 |
......
...@@ -10,6 +10,7 @@ Linux GPU/CPU KL离线量化推理测试的主程序为`test_ptq_inference_pytho ...@@ -10,6 +10,7 @@ Linux GPU/CPU KL离线量化推理测试的主程序为`test_ptq_inference_pytho
| :-------------: | :-------------------------------------: | :----------: | | :-------------: | :-------------------------------------: | :----------: |
| MobileNetV3 | MobileNetV3_large_x1_0 | KL离线量化 | | MobileNetV3 | MobileNetV3_large_x1_0 | KL离线量化 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | KL离线量化 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | KL离线量化 |
| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | KL离线量化 |
| PPHGNet | PPHGNet_small | KL离线量化 | | PPHGNet | PPHGNet_small | KL离线量化 |
| PPHGNet | PPHGNet_tiny | KL离线量化 | | PPHGNet | PPHGNet_tiny | KL离线量化 |
| PPLCNet | PPLCNet_x0_25 | KL离线量化 | | PPLCNet | PPLCNet_x0_25 | KL离线量化 |
...@@ -31,6 +32,7 @@ Linux GPU/CPU KL离线量化推理测试的主程序为`test_ptq_inference_pytho ...@@ -31,6 +32,7 @@ Linux GPU/CPU KL离线量化推理测试的主程序为`test_ptq_inference_pytho
| :-------------: | :-------------------------------------: | :----------: | | :-------------: | :-------------------------------------: | :----------: |
| MobileNetV3 | MobileNetV3_large_x1_0 | KL离线量化 | | MobileNetV3 | MobileNetV3_large_x1_0 | KL离线量化 |
| PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | KL离线量化 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | KL离线量化 |
| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | KL离线量化 |
| PPHGNet | PPHGNet_small | KL离线量化 | | PPHGNet | PPHGNet_small | KL离线量化 |
| PPHGNet | PPHGNet_tiny | KL离线量化 | | PPHGNet | PPHGNet_tiny | KL离线量化 |
| PPLCNet | PPLCNet_x0_25 | KL离线量化 | | PPLCNet | PPLCNet_x0_25 | KL离线量化 |
......
...@@ -42,6 +42,10 @@ function func_get_url_file_name() { ...@@ -42,6 +42,10 @@ function func_get_url_file_name() {
model_name=$(func_parser_value "${lines[1]}") model_name=$(func_parser_value "${lines[1]}")
# install paddleclas whl
python_name=$(func_parser_value "${lines[2]}")
${python_name} setup.py install
if [[ ${MODE} = "cpp_infer" ]]; then if [[ ${MODE} = "cpp_infer" ]]; then
if [ -d "./deploy/cpp/opencv-3.4.7/opencv3/" ] && [ $(md5sum ./deploy/cpp/opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ]; then if [ -d "./deploy/cpp/opencv-3.4.7/opencv3/" ] && [ $(md5sum ./deploy/cpp/opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ]; then
echo "################### build opencv skipped ###################" echo "################### build opencv skipped ###################"
...@@ -139,6 +143,8 @@ if [[ ${MODE} = "cpp_infer" ]]; then ...@@ -139,6 +143,8 @@ if [[ ${MODE} = "cpp_infer" ]]; then
cd dataset cd dataset
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
tar -xf drink_dataset_v1.0.tar tar -xf drink_dataset_v1.0.tar
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar
tar -xf drink_dataset_v2.0.tar
else else
echo "Wrong cpp type in config file in line 3. only support cls, shitu" echo "Wrong cpp type in config file in line 3. only support cls, shitu"
fi fi
...@@ -167,8 +173,9 @@ if [[ $model_name == *ShiTu* ]]; then ...@@ -167,8 +173,9 @@ if [[ $model_name == *ShiTu* ]]; then
ln -s demo_test.txt val_list.txt ln -s demo_test.txt val_list.txt
cd ../../ cd ../../
eval "wget -nc $model_url_value --no-check-certificate" eval "wget -nc $model_url_value --no-check-certificate"
if [[ -d "./general_PPLCNet_x2_5_pretrained_v1.0.pdparams" ]]; then
mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams
exit 0 fi
fi fi
if [[ $FILENAME == *use_dali* ]]; then if [[ $FILENAME == *use_dali* ]]; then
...@@ -240,12 +247,12 @@ elif [[ ${MODE} = "whole_infer" ]]; then ...@@ -240,12 +247,12 @@ elif [[ ${MODE} = "whole_infer" ]]; then
cd ../../ cd ../../
fi fi
# download inference or pretrained model # download inference or pretrained model
eval "wget -nc $model_url_value" eval "wget -nc ${model_url_value}"
if [[ ${model_url_value} =~ ".tar" ]]; then if [[ ${model_url_value} =~ ".tar" ]]; then
tar_name=$(func_get_url_file_name "${model_url_value}") tar_name=$(func_get_url_file_name "${model_url_value}")
echo $tar_name echo ${tar_name}
rm -rf {tar_name} eval "tar -xf ${tar_name}"
tar xf ${tar_name} rm -f ${tar_name}
fi fi
if [[ $model_name == "SwinTransformer_large_patch4_window7_224" || $model_name == "SwinTransformer_large_patch4_window12_384" ]]; then if [[ $model_name == "SwinTransformer_large_patch4_window7_224" || $model_name == "SwinTransformer_large_patch4_window12_384" ]]; then
cmd="mv ${model_name}_22kto1k_pretrained.pdparams ${model_name}_pretrained.pdparams" cmd="mv ${model_name}_22kto1k_pretrained.pdparams ${model_name}_pretrained.pdparams"
...@@ -275,7 +282,7 @@ fi ...@@ -275,7 +282,7 @@ fi
if [[ ${MODE} = "serving_infer" ]]; then if [[ ${MODE} = "serving_infer" ]]; then
# prepare serving env # prepare serving env
python_name=$(func_parser_value "${lines[2]}") python_name=$(func_parser_value "${lines[2]}")
if [[ ${model_name} = "PPShiTu" ]]; then if [[ ${model_name} =~ "PPShiTu" ]]; then
cls_inference_model_url=$(func_parser_value "${lines[3]}") cls_inference_model_url=$(func_parser_value "${lines[3]}")
cls_tar_name=$(func_get_url_file_name "${cls_inference_model_url}") cls_tar_name=$(func_get_url_file_name "${cls_inference_model_url}")
det_inference_model_url=$(func_parser_value "${lines[4]}") det_inference_model_url=$(func_parser_value "${lines[4]}")
...@@ -283,6 +290,8 @@ if [[ ${MODE} = "serving_infer" ]]; then ...@@ -283,6 +290,8 @@ if [[ ${MODE} = "serving_infer" ]]; then
cd ./deploy cd ./deploy
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar --no-check-certificate wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar --no-check-certificate
tar -xf drink_dataset_v1.0.tar tar -xf drink_dataset_v1.0.tar
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar --no-check-certificate
tar -xf drink_dataset_v2.0.tar
mkdir models mkdir models
cd models cd models
wget -nc ${cls_inference_model_url} && tar xf ${cls_tar_name} wget -nc ${cls_inference_model_url} && tar xf ${cls_tar_name}
...@@ -314,8 +323,9 @@ if [[ ${MODE} = "paddle2onnx_infer" ]]; then ...@@ -314,8 +323,9 @@ if [[ ${MODE} = "paddle2onnx_infer" ]]; then
# prepare paddle2onnx env # prepare paddle2onnx env
python_name=$(func_parser_value "${lines[2]}") python_name=$(func_parser_value "${lines[2]}")
inference_model_url=$(func_parser_value "${lines[10]}") inference_model_url=$(func_parser_value "${lines[10]}")
tar_name=${inference_model_url##*/} tar_name=$(func_get_url_file_name "$inference_model_url")
${python_name} -m pip install onnx
${python_name} -m pip install paddle2onnx ${python_name} -m pip install paddle2onnx
${python_name} -m pip install onnxruntime ${python_name} -m pip install onnxruntime
if [[ ${model_name} =~ "GeneralRecognition" ]]; then if [[ ${model_name} =~ "GeneralRecognition" ]]; then
...@@ -332,14 +342,12 @@ if [[ ${MODE} = "paddle2onnx_infer" ]]; then ...@@ -332,14 +342,12 @@ if [[ ${MODE} = "paddle2onnx_infer" ]]; then
rm -rf val_list.txt rm -rf val_list.txt
ln -s demo_test.txt val_list.txt ln -s demo_test.txt val_list.txt
cd ../../ cd ../../
eval "wget -nc $model_url_value --no-check-certificate"
mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams
fi fi
cd deploy cd deploy
mkdir models mkdir models
cd models cd models
wget -nc ${inference_model_url} wget -nc ${inference_model_url}
tar xf ${tar_name} eval "tar -xf ${tar_name}"
cd ../../ cd ../../
fi fi
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N1C1 device_num=N1C1
max_epochs=1 max_epochs=1
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N1C1 device_num=N1C1
max_epochs=1 max_epochs=1
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N1C1 device_num=N1C1
max_epochs=1 max_epochs=1
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N1C1 device_num=N1C1
max_epochs=1 max_epochs=1
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N1C1 device_num=N1C1
max_epochs=1 max_epochs=1
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N1C1 device_num=N1C1
max_epochs=1 max_epochs=1
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N1C8 device_num=N1C8
max_epochs=8 max_epochs=8
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N1C8 device_num=N1C8
max_epochs=8 max_epochs=8
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N1C8 device_num=N1C8
max_epochs=8 max_epochs=8
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N1C8 device_num=N1C8
max_epochs=8 max_epochs=8
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N1C8 device_num=N1C8
max_epochs=8 max_epochs=8
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N1C8 device_num=N1C8
max_epochs=8 max_epochs=8
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N4C32 device_num=N4C32
max_epochs=32 max_epochs=32
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N4C32 device_num=N4C32
max_epochs=32 max_epochs=32
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N4C32 device_num=N4C32
max_epochs=32 max_epochs=32
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N4C32 device_num=N4C32
max_epochs=32 max_epochs=32
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=amp_fp16 ...@@ -4,7 +4,7 @@ fp_item=amp_fp16
run_mode=DP run_mode=DP
device_num=N4C32 device_num=N4C32
max_epochs=32 max_epochs=32
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -4,7 +4,7 @@ fp_item=pure_fp16 ...@@ -4,7 +4,7 @@ fp_item=pure_fp16
run_mode=DP run_mode=DP
device_num=N4C32 device_num=N4C32
max_epochs=32 max_epochs=32
num_workers=8 num_workers=4
# get data # get data
bash test_tipc/static/${model_item}/benchmark_common/prepare.sh bash test_tipc/static/${model_item}/benchmark_common/prepare.sh
......
...@@ -37,7 +37,8 @@ cpp_benchmark_value=$(func_parser_value "${lines[16]}") ...@@ -37,7 +37,8 @@ cpp_benchmark_value=$(func_parser_value "${lines[16]}")
generate_yaml_cmd=$(func_parser_value "${lines[17]}") generate_yaml_cmd=$(func_parser_value "${lines[17]}")
transform_index_cmd=$(func_parser_value "${lines[18]}") transform_index_cmd=$(func_parser_value "${lines[18]}")
LOG_PATH="./test_tipc/output/${model_name}/${MODE}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results_cpp.log" status_log="${LOG_PATH}/results_cpp.log"
# generate_yaml_cmd="python3 test_tipc/generate_cpp_yaml.py" # generate_yaml_cmd="python3 test_tipc/generate_cpp_yaml.py"
...@@ -70,7 +71,7 @@ function func_shitu_cpp_inference(){ ...@@ -70,7 +71,7 @@ function func_shitu_cpp_inference(){
command="${_script} > ${_save_log_path} 2>&1" command="${_script} > ${_save_log_path} 2>&1"
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${command}" "${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}"
done done
done done
done done
...@@ -94,7 +95,7 @@ function func_shitu_cpp_inference(){ ...@@ -94,7 +95,7 @@ function func_shitu_cpp_inference(){
command="${_script} > ${_save_log_path} 2>&1" command="${_script} > ${_save_log_path} 2>&1"
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${command}" "${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}"
done done
done done
done done
...@@ -126,13 +127,12 @@ function func_cls_cpp_inference(){ ...@@ -126,13 +127,12 @@ function func_cls_cpp_inference(){
precison="int8" precison="int8"
fi fi
_save_log_path="${_log_path}/cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log" _save_log_path="${_log_path}/cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log"
command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}" command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}"
eval $command eval $command
command1="${_script} > ${_save_log_path} 2>&1" command1="${_script} > ${_save_log_path} 2>&1"
eval ${command1} eval ${command1}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${command1}" "${status_log}" "${model_name}" status_check $last_status "${command1}" "${status_log}" "${model_name}" "${_save_log_path}"
done done
done done
done done
...@@ -155,7 +155,7 @@ function func_cls_cpp_inference(){ ...@@ -155,7 +155,7 @@ function func_cls_cpp_inference(){
command="${_script} > ${_save_log_path} 2>&1" command="${_script} > ${_save_log_path} 2>&1"
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${command}" "${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}"
done done
done done
done done
......
...@@ -42,7 +42,8 @@ infer_key1=$(func_parser_key "${lines[17]}") ...@@ -42,7 +42,8 @@ infer_key1=$(func_parser_key "${lines[17]}")
infer_value1=$(func_parser_value "${lines[17]}") infer_value1=$(func_parser_value "${lines[17]}")
LOG_PATH="./test_tipc/output" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results_python.log" status_log="${LOG_PATH}/results_python.log"
...@@ -71,7 +72,7 @@ if [ ${MODE} = "whole_infer" ]; then ...@@ -71,7 +72,7 @@ if [ ${MODE} = "whole_infer" ]; then
echo $export_cmd echo $export_cmd
eval $export_cmd eval $export_cmd
status_export=$? status_export=$?
status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" ""
else else
save_infer_dir=${infer_model} save_infer_dir=${infer_model}
fi fi
......
#!/bin/bash #!/bin/bash
source test_tipc/common_func.sh source test_tipc/common_func.sh
current_path=$PWD
IFS=$'\n' IFS=$'\n'
...@@ -33,7 +32,8 @@ num_threads_list=$(func_parser_value_lite "${tipc_lines[5]}" ":") ...@@ -33,7 +32,8 @@ num_threads_list=$(func_parser_value_lite "${tipc_lines[5]}" ":")
batch_size_list=$(func_parser_value_lite "${tipc_lines[6]}" ":") batch_size_list=$(func_parser_value_lite "${tipc_lines[6]}" ":")
precision_list=$(func_parser_value_lite "${tipc_lines[7]}" ":") precision_list=$(func_parser_value_lite "${tipc_lines[7]}" ":")
LOG_PATH=${current_path}"/output" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/output"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results.log" status_log="${LOG_PATH}/results.log"
...@@ -67,7 +67,7 @@ function func_test_tipc(){ ...@@ -67,7 +67,7 @@ function func_test_tipc(){
eval ${command1} eval ${command1}
command2="adb shell 'export LD_LIBRARY_PATH=${lite_arm_work_path}; ${real_inference_cmd}' > ${_save_log_path} 2>&1" command2="adb shell 'export LD_LIBRARY_PATH=${lite_arm_work_path}; ${real_inference_cmd}' > ${_save_log_path} 2>&1"
eval ${command2} eval ${command2}
status_check $? "${command2}" "${status_log}" "${model_name}" status_check $? "${command2}" "${status_log}" "${model_name}" "${_save_log_path}"
done done
done done
done done
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
source test_tipc/common_func.sh source test_tipc/common_func.sh
FILENAME=$1 FILENAME=$1
MODE=$2 MODE="paddle2onnx_infer"
# parser params # parser params
dataline=$(awk 'NR==1, NR==16{print}' $FILENAME) dataline=$(awk 'NR==1, NR==16{print}' $FILENAME)
...@@ -36,7 +36,8 @@ inference_hardware_value=$(func_parser_value "${lines[14]}") ...@@ -36,7 +36,8 @@ inference_hardware_value=$(func_parser_value "${lines[14]}")
inference_config_key=$(func_parser_key "${lines[15]}") inference_config_key=$(func_parser_key "${lines[15]}")
inference_config_value=$(func_parser_value "${lines[15]}") inference_config_value=$(func_parser_value "${lines[15]}")
LOG_PATH="./test_tipc/output/${model_name}/${MODE}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results_paddle2onnx.log" status_log="${LOG_PATH}/results_paddle2onnx.log"
...@@ -46,27 +47,29 @@ function func_paddle2onnx(){ ...@@ -46,27 +47,29 @@ function func_paddle2onnx(){
_script=$1 _script=$1
# paddle2onnx # paddle2onnx
_save_log_path=".${LOG_PATH}/paddle2onnx_infer_cpu.log"
set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}") set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}")
set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
set_save_model=$(func_set_params "${save_file_key}" "${save_file_value}") set_save_model=$(func_set_params "${save_file_key}" "${save_file_value}")
set_opset_version=$(func_set_params "${opset_version_key}" "${opset_version_value}") set_opset_version=$(func_set_params "${opset_version_key}" "${opset_version_value}")
set_enable_onnx_checker=$(func_set_params "${enable_onnx_checker_key}" "${enable_onnx_checker_value}") set_enable_onnx_checker=$(func_set_params "${enable_onnx_checker_key}" "${enable_onnx_checker_value}")
trans_model_cmd="${padlle2onnx_cmd} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_save_model} ${set_opset_version} ${set_enable_onnx_checker}" trans_log="${LOG_PATH}/trans_model.log"
trans_model_cmd="${padlle2onnx_cmd} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_save_model} ${set_opset_version} ${set_enable_onnx_checker} --enable_dev_version=False > ${trans_log} 2>&1"
eval $trans_model_cmd eval $trans_model_cmd
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log}"
# python inference # python inference
if [[ ${inference_py} != "null" ]]; then if [[ ${inference_py} != "null" ]]; then
_save_log_path="${LOG_PATH}/paddle2onnx_infer_cpu.log"
set_model_dir=$(func_set_params "${inference_model_dir_key}" "${inference_model_dir_value}") set_model_dir=$(func_set_params "${inference_model_dir_key}" "${inference_model_dir_value}")
set_use_onnx=$(func_set_params "${use_onnx_key}" "${use_onnx_value}") set_use_onnx=$(func_set_params "${use_onnx_key}" "${use_onnx_value}")
set_hardware=$(func_set_params "${inference_hardware_key}" "${inference_hardware_value}") set_hardware=$(func_set_params "${inference_hardware_key}" "${inference_hardware_value}")
set_inference_config=$(func_set_params "${inference_config_key}" "${inference_config_value}") set_inference_config=$(func_set_params "${inference_config_key}" "${inference_config_value}")
infer_model_cmd="cd deploy && ${python} ${inference_py} -o ${set_model_dir} -o ${set_use_onnx} -o ${set_hardware} ${set_inference_config} > ${_save_log_path} 2>&1 && cd ../" infer_model_cmd="cd deploy && ${python} ${inference_py} -o ${set_model_dir} -o ${set_use_onnx} -o ${set_hardware} ${set_inference_config} > ${_save_log_path} 2>&1 && cd ../"
eval $infer_model_cmd eval $infer_model_cmd
status_check $last_status "${infer_model_cmd}" "${status_log}" "${model_name}" status_check $last_status "${infer_model_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
fi fi
} }
......
...@@ -94,7 +94,8 @@ if [[ $MODE = 'benchmark_train' ]]; then ...@@ -94,7 +94,8 @@ if [[ $MODE = 'benchmark_train' ]]; then
epoch_num=1 epoch_num=1
fi fi
LOG_PATH="./test_tipc/output/${model_name}/${MODE}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results_python.log" status_log="${LOG_PATH}/results_python.log"
...@@ -123,7 +124,7 @@ function func_inference() { ...@@ -123,7 +124,7 @@ function func_inference() {
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${command}" "../${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}"
done done
done done
done done
...@@ -145,7 +146,7 @@ function func_inference() { ...@@ -145,7 +146,7 @@ function func_inference() {
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${command}" "../${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}"
done done
done done
done done
...@@ -168,6 +169,6 @@ if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ]; ...@@ -168,6 +169,6 @@ if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ];
ln -s __params__ inference.pdiparams ln -s __params__ inference.pdiparams
cd ../../deploy cd ../../deploy
is_quant=True is_quant=True
func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "../${LOG_PATH}" "${infer_img_dir}" ${is_quant} func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "${LOG_PATH}" "${infer_img_dir}" ${is_quant}
cd .. cd ..
fi fi
...@@ -38,10 +38,10 @@ pipeline_py=$(func_parser_value "${lines[13]}") ...@@ -38,10 +38,10 @@ pipeline_py=$(func_parser_value "${lines[13]}")
function func_serving_cls(){ function func_serving_cls(){
LOG_PATH="test_tipc/output/${model_name}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/serving_infer"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
LOG_PATH="../../${LOG_PATH}" status_log="${LOG_PATH}/results_cpp_serving.log"
status_log="${LOG_PATH}/results_serving.log"
IFS='|' IFS='|'
# pdserving # pdserving
...@@ -53,8 +53,11 @@ function func_serving_cls(){ ...@@ -53,8 +53,11 @@ function func_serving_cls(){
for python_ in ${python[*]}; do for python_ in ${python[*]}; do
if [[ ${python_} =~ "python" ]]; then if [[ ${python_} =~ "python" ]]; then
trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" trans_log="${LOG_PATH}/cpp_trans_model.log"
trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_log} 2>&1"
eval ${trans_model_cmd} eval ${trans_model_cmd}
last_status=${PIPESTATUS[0]}
status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log}"
break break
fi fi
done done
...@@ -102,32 +105,34 @@ function func_serving_cls(){ ...@@ -102,32 +105,34 @@ function func_serving_cls(){
for use_gpu in ${web_use_gpu_list[*]}; do for use_gpu in ${web_use_gpu_list[*]}; do
if [[ ${use_gpu} = "null" ]]; then if [[ ${use_gpu} = "null" ]]; then
web_service_cpp_cmd="${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 &" server_log_path="${LOG_PATH}/cpp_server_cpu.log"
web_service_cpp_cmd="nohup ${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 > ${server_log_path} 2>&1 &"
eval ${web_service_cpp_cmd} eval ${web_service_cpp_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 5s sleep 5s
_save_log_path="${LOG_PATH}/server_infer_cpp_cpu_pipeline_batchsize_1.log" _save_log_path="${LOG_PATH}/cpp_client_cpu.log"
pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 " pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 "
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
sleep 5s sleep 5s
else else
web_service_cpp_cmd="${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 --gpu_id=${use_gpu} &" server_log_path="${LOG_PATH}/cpp_server_gpu.log"
web_service_cpp_cmd="nohup ${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 --gpu_id=${use_gpu} > ${server_log_path} 2>&1 &"
eval ${web_service_cpp_cmd} eval ${web_service_cpp_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 8s sleep 8s
_save_log_path="${LOG_PATH}/server_infer_cpp_gpu_pipeline_batchsize_1.log" _save_log_path="${LOG_PATH}/cpp_client_gpu.log"
pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 " pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 "
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
sleep 5s sleep 5s
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
fi fi
...@@ -136,10 +141,11 @@ function func_serving_cls(){ ...@@ -136,10 +141,11 @@ function func_serving_cls(){
function func_serving_rec(){ function func_serving_rec(){
LOG_PATH="test_tipc/output/${model_name}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/serving_infer"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
LOG_PATH="../../../${LOG_PATH}" status_log="${LOG_PATH}/results_cpp_serving.log"
status_log="${LOG_PATH}/results_serving.log"
trans_model_py=$(func_parser_value "${lines[5]}") trans_model_py=$(func_parser_value "${lines[5]}")
cls_infer_model_dir_key=$(func_parser_key "${lines[6]}") cls_infer_model_dir_key=$(func_parser_key "${lines[6]}")
cls_infer_model_dir_value=$(func_parser_value "${lines[6]}") cls_infer_model_dir_value=$(func_parser_value "${lines[6]}")
...@@ -181,20 +187,36 @@ function func_serving_rec(){ ...@@ -181,20 +187,36 @@ function func_serving_rec(){
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}") set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}")
set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}") set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}")
cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" trans_cls_log="${LOG_PATH}/cpp_trans_model_cls.log"
cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_cls_log} 2>&1"
eval ${cls_trans_model_cmd} eval ${cls_trans_model_cmd}
last_status=${PIPESTATUS[0]}
status_check $last_status "${cls_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_cls_log}"
set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}") set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}")
set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}") set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}")
set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}") set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}")
det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" trans_det_log="${LOG_PATH}/cpp_trans_model_det.log"
det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_det_log} 2>&1"
eval ${det_trans_model_cmd} eval ${det_trans_model_cmd}
last_status=${PIPESTATUS[0]}
cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ${cls_serving_server_value}" status_check $last_status "${det_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_det_log}"
OLD_IFS="${IFS}"
IFS='/'
tmp_arr=($cls_serving_server_value)
lastIndex=$((${#tmp_arr[@]}-1))
cls_serving_server_dirname="${tmp_arr[lastIndex]}"
tmp_arr=($cls_serving_client_value)
lastIndex=$((${#tmp_arr[@]}-1))
cls_serving_client_dirname="${tmp_arr[lastIndex]}"
IFS="${OLD_IFS}"
cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/${cls_serving_server_dirname}/*.prototxt ${cls_serving_server_value}"
eval ${cp_prototxt_cmd} eval ${cp_prototxt_cmd}
cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ${cls_serving_client_value}" cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/${cls_serving_client_dirname}/*.prototxt ${cls_serving_client_value}"
eval ${cp_prototxt_cmd} eval ${cp_prototxt_cmd}
cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ${det_serving_client_value}" cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ${det_serving_client_value}"
eval ${cp_prototxt_cmd} eval ${cp_prototxt_cmd}
...@@ -215,32 +237,34 @@ function func_serving_rec(){ ...@@ -215,32 +237,34 @@ function func_serving_rec(){
for use_gpu in ${web_use_gpu_list[*]}; do for use_gpu in ${web_use_gpu_list[*]}; do
if [ ${use_gpu} = "null" ]; then if [ ${use_gpu} = "null" ]; then
det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value") det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value")
web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 &" server_log_path="${LOG_PATH}/cpp_server_cpu.log"
web_service_cpp_cmd="nohup ${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 > ${server_log_path} 2>&1 &"
eval ${web_service_cpp_cmd} eval ${web_service_cpp_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 5s sleep 5s
_save_log_path="${LOG_PATH}/server_infer_cpp_cpu_batchsize_1.log" _save_log_path="${LOG_PATH}/cpp_client_cpu.log"
pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 " pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 "
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
sleep 5s sleep 5s
else else
det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value") det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value")
server_log_path="${LOG_PATH}/cpp_server_gpu.log"
web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 --gpu_id=${use_gpu} &" web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 --gpu_id=${use_gpu} &"
eval ${web_service_cpp_cmd} eval ${web_service_cpp_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" ${server_log_path}
sleep 5s sleep 5s
_save_log_path="${LOG_PATH}/server_infer_cpp_gpu_batchsize_1.log" _save_log_path="${LOG_PATH}/cpp_client_gpu.log"
pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 " pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 "
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
sleep 5s sleep 5s
fi fi
......
...@@ -36,13 +36,16 @@ web_service_py=$(func_parser_value "${lines[11]}") ...@@ -36,13 +36,16 @@ web_service_py=$(func_parser_value "${lines[11]}")
web_use_gpu_key=$(func_parser_key "${lines[12]}") web_use_gpu_key=$(func_parser_key "${lines[12]}")
web_use_gpu_list=$(func_parser_value "${lines[12]}") web_use_gpu_list=$(func_parser_value "${lines[12]}")
pipeline_py=$(func_parser_value "${lines[13]}") pipeline_py=$(func_parser_value "${lines[13]}")
use_mkldnn="False"
threads="1"
function func_serving_cls(){ function func_serving_cls(){
LOG_PATH="test_tipc/output/${model_name}/${MODE}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
LOG_PATH="../../${LOG_PATH}"
status_log="${LOG_PATH}/results_serving.log" status_log="${LOG_PATH}/results_serving.log"
IFS='|' IFS='|'
# pdserving # pdserving
...@@ -54,8 +57,11 @@ function func_serving_cls(){ ...@@ -54,8 +57,11 @@ function func_serving_cls(){
for python_ in ${python[*]}; do for python_ in ${python[*]}; do
if [[ ${python_} =~ "python" ]]; then if [[ ${python_} =~ "python" ]]; then
trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" trans_log="${LOG_PATH}/python_trans_model.log"
trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_log} 2>&1"
eval ${trans_model_cmd} eval ${trans_model_cmd}
last_status=${PIPESTATUS[0]}
status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log}"
break break
fi fi
done done
...@@ -96,19 +102,19 @@ function func_serving_cls(){ ...@@ -96,19 +102,19 @@ function func_serving_cls(){
devices_line=27 devices_line=27
set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml" set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml"
eval ${set_devices_cmd} eval ${set_devices_cmd}
server_log_path="${LOG_PATH}/python_server_cpu.log"
web_service_cmd="${python_} ${web_service_py} &" web_service_cmd="nohup ${python_} ${web_service_py} > ${server_log_path} 2>&1 &"
eval ${web_service_cmd} eval ${web_service_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 5s sleep 5s
for pipeline in ${pipeline_py[*]}; do for pipeline in ${pipeline_py[*]}; do
_save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_batchsize_1.log" _save_log_path="${LOG_PATH}/python_client_cpu_${pipeline%_client*}_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log"
pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1 " pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1 "
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
sleep 5s sleep 5s
done done
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
...@@ -130,19 +136,19 @@ function func_serving_cls(){ ...@@ -130,19 +136,19 @@ function func_serving_cls(){
devices_line=27 devices_line=27
set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml" set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml"
eval ${set_devices_cmd} eval ${set_devices_cmd}
server_log_path="${LOG_PATH}/python_server_gpu_usetrt_${use_trt}_precision_${precision}.log"
web_service_cmd="${python_} ${web_service_py} & " web_service_cmd="nohup ${python_} ${web_service_py} > ${server_log_path} 2>&1 &"
eval ${web_service_cmd} eval ${web_service_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 5s sleep 5s
for pipeline in ${pipeline_py[*]}; do for pipeline in ${pipeline_py[*]}; do
_save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_batchsize_1.log" _save_log_path="${LOG_PATH}/python_client_gpu_${pipeline%_client*}_usetrt_${use_trt}_precision_${precision}_batchsize_1.log"
pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1" pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1"
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
sleep 5s sleep 5s
done done
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
...@@ -154,10 +160,11 @@ function func_serving_cls(){ ...@@ -154,10 +160,11 @@ function func_serving_cls(){
function func_serving_rec(){ function func_serving_rec(){
LOG_PATH="test_tipc/output/${model_name}/${MODE}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
LOG_PATH="../../../${LOG_PATH}"
status_log="${LOG_PATH}/results_serving.log" status_log="${LOG_PATH}/results_serving.log"
trans_model_py=$(func_parser_value "${lines[5]}") trans_model_py=$(func_parser_value "${lines[5]}")
cls_infer_model_dir_key=$(func_parser_key "${lines[6]}") cls_infer_model_dir_key=$(func_parser_key "${lines[6]}")
cls_infer_model_dir_value=$(func_parser_value "${lines[6]}") cls_infer_model_dir_value=$(func_parser_value "${lines[6]}")
...@@ -199,16 +206,22 @@ function func_serving_rec(){ ...@@ -199,16 +206,22 @@ function func_serving_rec(){
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}") set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}")
set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}") set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}")
cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" trans_cls_log="${LOG_PATH}/python_trans_model_cls.log"
cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_cls_log} 2>&1"
eval ${cls_trans_model_cmd} eval ${cls_trans_model_cmd}
last_status=${PIPESTATUS[0]}
status_check $last_status "${cls_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_cls_log}"
set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}") set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}")
set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}")
set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}")
set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}") set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}")
set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}") set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}")
det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" trans_det_log="${LOG_PATH}/python_trans_model_det.log"
det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_det_log} 2>&1"
eval ${det_trans_model_cmd} eval ${det_trans_model_cmd}
last_status=${PIPESTATUS[0]}
status_check $last_status "${det_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_det_log}"
# modify the alias_name of fetch_var to "outputs" # modify the alias_name of fetch_var to "outputs"
server_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"features\"/' $cls_serving_server_value/serving_server_conf.prototxt" server_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"features\"/' $cls_serving_server_value/serving_server_conf.prototxt"
...@@ -239,19 +252,19 @@ function func_serving_rec(){ ...@@ -239,19 +252,19 @@ function func_serving_rec(){
devices_line=27 devices_line=27
set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml" set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml"
eval ${set_devices_cmd} eval ${set_devices_cmd}
server_log_path="${LOG_PATH}/python_server_cpu.log"
web_service_cmd="${python} ${web_service_py} &" web_service_cmd="nohup ${python} ${web_service_py} > ${server_log_path} 2>&1 &"
eval ${web_service_cmd} eval ${web_service_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 5s sleep 5s
for pipeline in ${pipeline_py[*]}; do for pipeline in ${pipeline_py[*]}; do
_save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_batchsize_1.log" _save_log_path="${LOG_PATH}/python_client_cpu_${pipeline%_client*}_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log"
pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1 " pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1"
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
sleep 5s sleep 5s
done done
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
...@@ -273,19 +286,19 @@ function func_serving_rec(){ ...@@ -273,19 +286,19 @@ function func_serving_rec(){
devices_line=27 devices_line=27
set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml" set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml"
eval ${set_devices_cmd} eval ${set_devices_cmd}
server_log_path="${LOG_PATH}/python_server_gpu_usetrt_${use_trt}_precision_${precision}.log"
web_service_cmd="${python} ${web_service_py} & " web_service_cmd="nohup ${python} ${web_service_py} > ${server_log_path} 2>&1 &"
eval ${web_service_cmd} eval ${web_service_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}"
sleep 10s sleep 10s
for pipeline in ${pipeline_py[*]}; do for pipeline in ${pipeline_py[*]}; do
_save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_batchsize_1.log" _save_log_path="${LOG_PATH}/python_client_gpu_${pipeline%_client*}_usetrt_${use_trt}_precision_${precision}_batchsize_1.log"
pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1" pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1"
eval ${pipeline_cmd} eval ${pipeline_cmd}
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}"
sleep 10s sleep 10s
done done
eval "${python_} -m paddle_serving_server.serve stop" eval "${python_} -m paddle_serving_server.serve stop"
...@@ -311,7 +324,7 @@ echo "################### run test ###################" ...@@ -311,7 +324,7 @@ echo "################### run test ###################"
export Count=0 export Count=0
IFS="|" IFS="|"
if [[ ${model_name} = "PPShiTu" ]]; then if [[ ${model_name} =~ "PPShiTu" ]]; then
func_serving_rec func_serving_rec
else else
func_serving_cls func_serving_cls
......
...@@ -95,7 +95,8 @@ if [[ $MODE = 'benchmark_train' ]]; then ...@@ -95,7 +95,8 @@ if [[ $MODE = 'benchmark_train' ]]; then
epoch_num=1 epoch_num=1
fi fi
LOG_PATH="./test_tipc/output/${model_name}/${MODE}" CLS_ROOT_PATH=$(pwd)
LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}"
mkdir -p ${LOG_PATH} mkdir -p ${LOG_PATH}
status_log="${LOG_PATH}/results_python.log" status_log="${LOG_PATH}/results_python.log"
...@@ -107,13 +108,15 @@ function func_inference() { ...@@ -107,13 +108,15 @@ function func_inference() {
_log_path=$4 _log_path=$4
_img_dir=$5 _img_dir=$5
_flag_quant=$6 _flag_quant=$6
_gpu=$7
# inference # inference
for use_gpu in ${use_gpu_list[*]}; do for use_gpu in ${use_gpu_list[*]}; do
if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then
for use_mkldnn in ${use_mkldnn_list[*]}; do for use_mkldnn in ${use_mkldnn_list[*]}; do
for threads in ${cpu_threads_list[*]}; do for threads in ${cpu_threads_list[*]}; do
for batch_size in ${batch_size_list[*]}; do for batch_size in ${batch_size_list[*]}; do
_save_log_path="${_log_path}/infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_${batch_size}.log" for precision in ${precision_list[*]}; do
_save_log_path="${_log_path}/python_infer_cpu_gpus_${_gpu}_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
...@@ -124,7 +127,8 @@ function func_inference() { ...@@ -124,7 +127,8 @@ function func_inference() {
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${command}" "../${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}"
done
done done
done done
done done
...@@ -135,7 +139,7 @@ function func_inference() { ...@@ -135,7 +139,7 @@ function func_inference() {
continue continue
fi fi
for batch_size in ${batch_size_list[*]}; do for batch_size in ${batch_size_list[*]}; do
_save_log_path="${_log_path}/infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log" _save_log_path="${_log_path}/python_infer_gpu_gpus_${_gpu}_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log"
set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}")
set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}")
set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}")
...@@ -146,7 +150,7 @@ function func_inference() { ...@@ -146,7 +150,7 @@ function func_inference() {
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
eval "cat ${_save_log_path}" eval "cat ${_save_log_path}"
status_check $last_status "${command}" "../${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}"
done done
done done
done done
...@@ -161,17 +165,19 @@ if [[ ${MODE} = "whole_infer" ]]; then ...@@ -161,17 +165,19 @@ if [[ ${MODE} = "whole_infer" ]]; then
# for kl_quant # for kl_quant
if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ]; then if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ]; then
echo "kl_quant" echo "kl_quant"
command="${python} ${kl_quant_cmd_value}" log_path="${LOG_PATH}/export.log"
command="${python} ${kl_quant_cmd_value} > ${log_path} 2>&1"
echo ${command} echo ${command}
eval $command eval $command
last_status=${PIPESTATUS[0]} last_status=${PIPESTATUS[0]}
status_check $last_status "${command}" "${status_log}" "${model_name}" status_check $last_status "${command}" "${status_log}" "${model_name}" "${log_path}"
cd ${infer_model_dir_list}/quant_post_static_model cd ${infer_model_dir_list}/quant_post_static_model
ln -s __model__ inference.pdmodel ln -s model.pdmodel inference.pdmodel
ln -s __params__ inference.pdiparams ln -s model.pdiparams inference.pdiparams
cd ../../deploy cd ../../deploy
is_quant=True is_quant=True
func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "../${LOG_PATH}" "${infer_img_dir}" ${is_quant} gpu=0
func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "${LOG_PATH}" "${infer_img_dir}" "${is_quant}" "${gpu}"
cd .. cd ..
fi fi
else else
...@@ -240,7 +246,7 @@ else ...@@ -240,7 +246,7 @@ else
if [ ${#ips} -le 15 ]; then if [ ${#ips} -le 15 ]; then
# if length of ips >= 15, then it is seen as multi-machine # if length of ips >= 15, then it is seen as multi-machine
# 15 is the min length of ips info for multi-machine: 0.0.0.0,0.0.0.0 # 15 is the min length of ips info for multi-machine: 0.0.0.0,0.0.0.0
save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}" save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_1"
nodes=1 nodes=1
else else
IFS="," IFS=","
...@@ -259,16 +265,21 @@ else ...@@ -259,16 +265,21 @@ else
if [ ${#gpu} -le 2 ]; then # train with cpu or single gpu if [ ${#gpu} -le 2 ]; then # train with cpu or single gpu
cmd="${python} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} " cmd="${python} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} "
elif [ ${#ips} -le 15 ]; then # train with multi-gpu elif [ ${#ips} -le 15 ]; then # train with multi-gpu
cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1}" cmd="${python} -m paddle.distributed.launch --devices=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1}"
else # train with multi-machine else # train with multi-machine
cmd="${python} -m paddle.distributed.launch --ips=${ips} --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_pretrain} ${set_epoch} ${set_autocast} ${set_batchsize} ${set_train_params1}" cmd="${python} -m paddle.distributed.launch --ips=${ips} --devices=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_pretrain} ${set_epoch} ${set_autocast} ${set_batchsize} ${set_train_params1}"
fi fi
# run train # run train
eval "unset CUDA_VISIBLE_DEVICES" eval "unset CUDA_VISIBLE_DEVICES"
# export FLAGS_cudnn_deterministic=True # export FLAGS_cudnn_deterministic=True
sleep 5 sleep 5
eval $cmd eval $cmd
status_check $? "${cmd}" "${status_log}" "${model_name}" if [[ $FILENAME == *GeneralRecognition* ]]; then
eval "cat ${save_log}/RecModel/train.log >> ${save_log}.log"
else
eval "cat ${save_log}/${model_name}/train.log >> ${save_log}.log"
fi
status_check $? "${cmd}" "${status_log}" "${model_name}" "${save_log}.log"
sleep 5 sleep 5
if [[ $FILENAME == *GeneralRecognition* ]]; then if [[ $FILENAME == *GeneralRecognition* ]]; then
...@@ -283,9 +294,10 @@ else ...@@ -283,9 +294,10 @@ else
# run eval # run eval
if [ ${eval_py} != "null" ]; then if [ ${eval_py} != "null" ]; then
set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}") set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}")
eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}" eval_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_eval.log"
eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1} > ${eval_log_path} 2>&1"
eval $eval_cmd eval $eval_cmd
status_check $? "${eval_cmd}" "${status_log}" "${model_name}" status_check $? "${eval_cmd}" "${status_log}" "${model_name}" "${eval_log_path}"
sleep 5 sleep 5
fi fi
# run export model # run export model
...@@ -298,15 +310,16 @@ else ...@@ -298,15 +310,16 @@ else
set_export_weight=$(func_set_params "${export_weight}" "${save_log}/${model_name}/${train_model_name}") set_export_weight=$(func_set_params "${export_weight}" "${save_log}/${model_name}/${train_model_name}")
fi fi
set_save_infer_key=$(func_set_params "${save_infer_key}" "${save_infer_path}") set_save_infer_key=$(func_set_params "${save_infer_key}" "${save_infer_path}")
export_cmd="${python} ${run_export} ${set_export_weight} ${set_save_infer_key}" export_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_export.log"
export_cmd="${python} ${run_export} ${set_export_weight} ${set_save_infer_key} > ${export_log_path} 2>&1"
eval $export_cmd eval $export_cmd
status_check $? "${export_cmd}" "${status_log}" "${model_name}" status_check $? "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}"
#run inference # run inference
eval $env eval $env
save_infer_path="${save_log}" save_infer_path="${save_log}"
cd deploy cd deploy
func_inference "${python}" "${inference_py}" "../${save_infer_path}" "../${LOG_PATH}" "${infer_img_dir}" "${flag_quant}" func_inference "${python}" "${inference_py}" "${save_infer_path}" "${LOG_PATH}" "${infer_img_dir}" "${flag_quant}" "${gpu}"
cd .. cd ..
fi fi
eval "unset CUDA_VISIBLE_DEVICES" eval "unset CUDA_VISIBLE_DEVICES"
......
#!/bin/bash
source test_tipc/common_func.sh
function readlinkf() {
perl -MCwd -e 'print Cwd::abs_path shift' "$1";
}
function func_parser_config() {
strs=$1
IFS=" "
array=(${strs})
tmp=${array[2]}
echo ${tmp}
}
BASEDIR=$(dirname "$0")
REPO_ROOT_PATH=$(readlinkf ${BASEDIR}/../)
FILENAME=$1
# change gpu to npu in tipc txt configs
sed -i "s/Global.device:gpu/Global.device:npu/g" $FILENAME
sed -i "s/Global.use_gpu/Global.use_npu/g" $FILENAME
dataline=`cat $FILENAME`
# parser params
IFS=$'\n'
lines=(${dataline})
# replace inference config file
inference_py=$(func_parser_value "${lines[39]}")
inference_config=$(func_parser_config ${inference_py})
sed -i 's/use_gpu: True/use_npu: True/g' "$REPO_ROOT_PATH/deploy/$inference_config"
# replace training config file
grep -n 'tools/.*yaml' $FILENAME | cut -d ":" -f 1 \
| while read line_num ; do
train_cmd=$(func_parser_value "${lines[line_num-1]}")
trainer_config=$(func_parser_config ${train_cmd})
sed -i 's/device: gpu/device: npu/g' "$REPO_ROOT_PATH/$trainer_config"
done
# change gpu to npu in execution script
sed -i "s/\"gpu\"/\"npu\"/g" test_tipc/test_train_inference_python.sh
# pass parameters to test_train_inference_python.sh
cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} $2"
echo $cmd
eval $cmd
#!/bin/bash
source test_tipc/common_func.sh
function readlinkf() {
perl -MCwd -e 'print Cwd::abs_path shift' "$1";
}
function func_parser_config() {
strs=$1
IFS=" "
array=(${strs})
tmp=${array[2]}
echo ${tmp}
}
BASEDIR=$(dirname "$0")
REPO_ROOT_PATH=$(readlinkf ${BASEDIR}/../)
FILENAME=$1
# change gpu to xpu in tipc txt configs
sed -i "s/Global.device:gpu/Global.device:xpu/g" $FILENAME
sed -i "s/Global.use_gpu/Global.use_xpu/g" $FILENAME
dataline=`cat $FILENAME`
# parser params
IFS=$'\n'
lines=(${dataline})
# replace inference config file
inference_py=$(func_parser_value "${lines[39]}")
inference_config=$(func_parser_config ${inference_py})
sed -i 's/use_gpu: True/use_xpu: True/g' "$REPO_ROOT_PATH/deploy/$inference_config"
# replace training config file
grep -n 'tools/.*yaml' $FILENAME | cut -d ":" -f 1 \
| while read line_num ; do
train_cmd=$(func_parser_value "${lines[line_num-1]}")
trainer_config=$(func_parser_config ${train_cmd})
sed -i 's/device: gpu/device: xpu/g' "$REPO_ROOT_PATH/$trainer_config"
done
# change gpu to npu in execution script
sed -i "s/\"gpu\"/\"npu\"/g" test_tipc/test_train_inference_python.sh
# pass parameters to test_train_inference_python.sh
cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} $2"
echo $cmd
eval $cmd
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册