diff --git a/README.md b/README.md index 13c4f964bb9063f28d6e08dfb8c6b828a81d2536..44885f554afdc7e00188fae2987e7fbbb4278fcc 120000 --- a/README.md +++ b/README.md @@ -1 +1 @@ -README_en.md \ No newline at end of file +README_ch.md \ No newline at end of file diff --git a/README_ch.md b/README_ch.md index 2ca73fdc5b2c1b1e504cf4ec8eef2d0dcb13deb4..2ad34e7c66804251a8401a560783da94ea0070c8 100644 --- a/README_ch.md +++ b/README_ch.md @@ -4,64 +4,85 @@ ## 简介 -飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 +飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。 
- -

PULC实用图像分类模型效果展示

+ +

PP-ShiTuV2图像识别系统效果展示

-  - -
- -

PP-ShiTu图像识别系统效果展示

-
-## 近期更新 -- 📢将于**6月15-6月17日晚20:30** 进行为期三天的课程直播,详细介绍超轻量图像分类方案,对各场景模型优化原理及使用方式进行拆解,之后还有产业案例全流程实操,对各类痛难点解决方案进行手把手教学,加上现场互动答疑,抓紧扫码上车吧!
- + +

PULC实用图像分类模型效果展示

-- 🔥️ 2022.6.15 发布[PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md),CPU推理3ms,精度比肩SwinTransformer,覆盖人、车、OCR场景九大常见任务。 -- 2022.5.26 [飞桨产业实践范例直播课](http://aglc.cn/v-c4FAR),解读**超轻量重点区域人员出入管理方案**。 -- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475),具体内容可以在 AI Studio 上体验。 +## 近期更新 +- 🔥️ 发布[PP-ShiTuV2](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md),recall1精度提升8个点,覆盖[20+识别场景](./docs/zh_CN/introduction/ppshitu_application_scenarios.md),新增[库管理工具](./deploy/shitu_index_manager/),[Android Demo](./docs/zh_CN/quick_start/quick_start_recognition.md)全新体验。 +- 2022.9.4 新增[生鲜产品自主结算范例库](./docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md),具体内容可以在AI Studio上体验。 +- 2022.6.15 发布[PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md),CPU推理3ms,精度比肩SwinTransformer,覆盖人、车、OCR场景九大常见任务。 +- 2022.5.23 新增[人员出入管理范例库](https://aistudio.baidu.com/aistudio/projectdetail/4094475),具体内容可以在 AI Studio 上体验。 - 2022.5.20 上线[PP-HGNet](./docs/zh_CN/models/PP-HGNet.md), [PP-LCNetv2](./docs/zh_CN/models/PP-LCNetV2.md)。 - -- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)。 - - [more](./docs/zh_CN/others/update_history.md) ## 特性 PaddleClas发布了[PP-HGNet](docs/zh_CN/models/PP-HGNet.md)、[PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md)、 [PP-LCNet](docs/zh_CN/models/PP-LCNet.md)和[SSLD半监督知识蒸馏方案](docs/zh_CN/advanced_tutorials/ssld.md)等算法, 并支持多种图像分类、识别相关算法,在此基础上打造[PULC超轻量图像分类方案](docs/zh_CN/PULC/PULC_quickstart.md)和[PP-ShiTu图像识别系统](./docs/zh_CN/quick_start/quick_start_recognition.md)。 -![](https://user-images.githubusercontent.com/19523330/173273046-239a42da-c88d-4c2c-94b1-2134557afa49.png) +![](https://user-images.githubusercontent.com/11568925/189267545-7a6eefa0-b4fc-4ed0-ae9d-7c6d53f59798.png) ## 欢迎加入技术交流群 -* 您可以扫描下面的微信/QQ二维码(添加小助手微信并回复“C”),加入PaddleClas微信交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。 +* 欢迎加入PaddleClas 微信用户群(扫码填写问卷即可入群)
- - +
+ ## 快速体验 PULC超轻量图像分类方案快速体验:[点击这里](docs/zh_CN/PULC/PULC_quickstart.md) PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md) +PP-ShiTuV2 Android Demo APP,可扫描如下二维码,下载体验 + +
+ +

PP-ShiTuV2 Android Demo

+
+ + +## 产业实践范例库 + +- 基于PP-ShiTu v2的生鲜品自助结算: [点击这里](./docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md) +- 基于PULC人员出入视频管理: [点击这里](./docs/zh_CN/samples/Personnel_Access/README.md) +- 基于 PP-ShiTu 的智慧商超商品识别:[点击这里](./docs/zh_CN/Goods_Recognition/README.md) +- 基于PP-ShiTu电梯内电瓶车入室识别:[点击这里](./docs/zh_CN/samples//Electromobile_In_Elevator_Detection/README.md) + ## 文档教程 - [环境准备](docs/zh_CN/installation/install_paddleclas.md) +- [PP-ShiTuV2图像识别系统介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md) + - [图像识别快速体验](docs/zh_CN/quick_start/quick_start_recognition.md) + - [20+应用场景库](docs/zh_CN/introduction/ppshitu_application_scenarios.md) + - 子模块算法介绍及模型训练 + - [主体检测](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md) + - [特征提取模型](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md) + - [向量检索](./docs/zh_CN/image_recognition_pipeline/vector_search.md) + - [哈希编码](docs/zh_CN/image_recognition_pipeline/deep_hashing.md) + - PipeLine 推理部署 + - [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#2) + - [基于C++预测引擎推理](deploy/cpp_shitu/readme.md) + - [服务化部署](docs/zh_CN/inference_deployment/recognition_serving_deploy.md) + - [端侧部署](docs/zh_CN/inference_deployment/lite_shitu.md) + - [库管理工具](docs/zh_CN/inference_deployment/shitu_gallery_manager.md) - [PULC超轻量图像分类实用方案](docs/zh_CN/PULC/PULC_train.md) - [超轻量图像分类快速体验](docs/zh_CN/PULC/PULC_quickstart.md) - [超轻量图像分类模型库](docs/zh_CN/PULC/PULC_model_list.md) @@ -82,19 +103,6 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick - [端侧部署](docs/zh_CN/inference_deployment/paddle_lite_deploy.md) - [Paddle2ONNX模型转化与预测](deploy/paddle2onnx/readme.md) - [模型压缩](deploy/slim/README.md) -- [PP-ShiTu图像识别系统介绍](#图像识别系统介绍) - - [图像识别快速体验](docs/zh_CN/quick_start/quick_start_recognition.md) - - 模块介绍 - - [主体检测](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md) - - [特征提取模型](./docs/zh_CN/image_recognition_pipeline/feature_extraction.md) - - [向量检索](./docs/zh_CN/image_recognition_pipeline/vector_search.md) - - [哈希编码](docs/zh_CN/image_recognition_pipeline/) - - [模型训练](docs/zh_CN/models_training/recognition.md) - - 推理部署 - - [基于python预测引擎推理](docs/zh_CN/inference_deployment/python_deploy.md#2) - - [基于C++预测引擎推理](deploy/cpp_shitu/readme.md) - - [服务化部署](docs/zh_CN/inference_deployment/recognition_serving_deploy.md) - - [端侧部署](deploy/lite_shitu/README.md) - PP系列骨干网络模型 - [PP-HGNet](docs/zh_CN/models/PP-HGNet.md) - [PP-LCNetv2](docs/zh_CN/models/PP-LCNetV2.md) @@ -103,6 +111,7 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick - 前沿算法 - [骨干网络和预训练模型库](docs/zh_CN/algorithm_introduction/ImageNet_models.md) - [度量学习](docs/zh_CN/algorithm_introduction/metric_learning.md) + - [ReID](./docs/zh_CN/algorithm_introduction/reid.md) - [模型压缩](docs/zh_CN/algorithm_introduction/model_prune_quantization.md) - [模型蒸馏](docs/zh_CN/algorithm_introduction/knowledge_distillation.md) - [数据增强](docs/zh_CN/advanced_tutorials/DataAugmentation.md) @@ -113,63 +122,80 @@ PP-ShiTu图像识别快速体验:[点击这里](./docs/zh_CN/quick_start/quick - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md) - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md) - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md) + - [图像分类FAQ第三季](docs/zh_CN/faq_series/faq_2022_s1.md) - [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md) - [许可证书](#许可证书) - [贡献代码](#贡献代码) - - -## PULC超轻量图像分类方案 -
- -
-PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法,可以自动训练得到轻量且高精度的图像分类模型。 -PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型,CPU推理3ms,精度比肩SwinTransformer。 - -## PP-ShiTu图像识别系统 + +## PP-ShiTuV2图像识别系统
-PP-ShiTu是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化8个方面,采用多种策略,对各个模块的模型进行优化,最终得到在CPU上仅0.2s即可完成10w+库的图像识别的系统。更多细节请参考[PP-ShiTu技术方案](https://arxiv.org/pdf/2111.00775.pdf)。 - -## PULC实用图像分类模型效果展示 -
- -
+PP-ShiTuV2是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面,采用多种策略,对各个模块的模型进行优化,PP-ShiTuV2相比V1,Recall1提升近8个点。更多细节请参考[PP-ShiTuV2详细介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)。 -## PP-ShiTu图像识别系统效果展示 + +## PP-ShiTuV2图像识别系统效果展示 + - 瓶装饮料识别 +
+ - 商品识别 +
+ - 动漫人物识别 +
+ - logo识别 +
+ - 车辆识别 +
+ + + +## PULC超轻量图像分类方案 +
+ +
+PULC融合了骨干网络、数据增广、蒸馏等多种前沿算法,可以自动训练得到轻量且高精度的图像分类模型。 +PaddleClas提供了覆盖人、车、OCR场景九大常见任务的分类模型,CPU推理3ms,精度比肩SwinTransformer。 + + + +## PULC实用图像分类模型效果展示 +
+ +
+ + ## 许可证书 diff --git a/README_en.md b/README_en.md index 4bf960e57f2e56972f889c4bcf6a6d715b903477..75707070b9e46dfdd1b0503538159681c9ccf07a 100644 --- a/README_en.md +++ b/README_en.md @@ -7,20 +7,23 @@ PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
- - -PULC demo images + +

PP-ShiTuV2 demo images

-  + +
- + -PP-ShiTu demo images +PULC demo images
+ + **Recent updates** +- 🔥️ Release [PP-ShiTuV2](./docs/en/PPShiTu/PPShiTuV2_introduction.md), recall1 is improved by nearly 8 points, covering 20+ recognition scenarios, with [index management tool](./deploy/shitu_index_manager) and [Android Demo](./docs/en/quick_start/quick_start_recognition_en.md) for better experience. - 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario. - 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf). @@ -38,7 +41,7 @@ image classification and image recognition algorithms. Based on th algorithms above, PaddleClas release PP-ShiTu image recognition system and [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](docs/en/PULC/PULC_quickstart_en.md). -![](https://user-images.githubusercontent.com/19523330/173539361-68cf7ab1-7e3b-4e5e-b00f-1500719bd2a2.png) +![](https://user-images.githubusercontent.com/11568925/189268878-43d9d35b-90cf-425a-859e-767f8d94c5f7.png) ## Welcome to Join the Technical Exchange Group @@ -52,12 +55,31 @@ Based on th algorithms above, PaddleClas release PP-ShiTu image recognition syst ## Quick Start Quick experience of PP-ShiTu image recognition system:[Link](./docs/en/quick_start/quick_start_recognition_en.md) +
+ +

PP-ShiTuV2 Android Demo

+
+ Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassification models:[Link](docs/en/PULC/PULC_quickstart_en.md) ## Tutorials - [Install Paddle](./docs/en/installation/install_paddle_en.md) - [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md) +- [PP-ShiTuV2 Image Recognition Systems Introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md) + - [Image Recognition Quick Start](docs/en/quick_start/quick_start_recognition_en.md) + - [20+ application scenarios](docs/zh_CN/introduction/ppshitu_application_scenarios.md) + - Submodule Introduction and Model Training + - [Mainbody Detection](./docs/zh_CN/image_recognition_pipeline/mainbody_detection.md) + - [Feature Extraction](./docs/en/image_recognition_pipeline/feature_extraction_en.md) + - [Vector Search](./docs/en/image_recognition_pipeline/vector_search_en.md) + - [Hash Encoding](./docs/zh_CN/image_recognition_pipeline/deep_hashing.md) + - PipeLine Inference and Deployment + - [Python Inference](docs/en/inference_deployment/python_deploy_en.md) + - [C++ Inference](deploy/cpp_shitu/readme_en.md) + - [Serving Deployment](docs/en/inference_deployment/recognition_serving_deploy_en.md) + - [Lite Deployment](docs/en/inference_deployment/paddle_lite_deploy_en.md) + - [Shitu Gallery Manager Tool](docs/zh_CN/inference_deployment/shitu_gallery_manager.md) - [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md) - [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md) - [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md) @@ -108,41 +130,55 @@ PULC models inference within 3ms on CPU devices, with accuracy comparable with S -Image recognition can be divided into three steps: -- (1)Identify region proposal for target objects through a detection model; -- (2)Extract features for each region proposal; -- (3)Search features in the retrieval database and output results; - +PP-ShiTuV2 is a practical lightweight general image recognition system, which is mainly composed of three modules: mainbody detection model, feature extraction model and vector search tool. The system adopts a variety of strategies including backbone network, loss function, data augmentations, optimal hyperparameters, pre-training model, model pruning and quantization. Compared to V1, PP-ShiTuV2, Recall1 is improved by nearly 8 points. For more details, please refer to [PP-ShiTuV2 introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md). For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised. - -## PULC demo images + +## PP-ShiTuV2 Demo images + +- Drinks recognition +
- +
- -## Image Recognition Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images) + - Product recognition +
+ - Cartoon character recognition +
+ - Logo recognition +
+ + - Car recognition +
+ + +## PULC demo images +
+ +
+ + ## License PaddleClas is released under the Apache 2.0 license Apache 2.0 license diff --git a/deploy/configs/PULC/table_attribute/inference_table_attribute.yaml b/deploy/configs/PULC/table_attribute/inference_table_attribute.yaml new file mode 100644 index 0000000000000000000000000000000000000000..580ca18f225aa56dec7ddfae69e2004fb55aefbc --- /dev/null +++ b/deploy/configs/PULC/table_attribute/inference_table_attribute.yaml @@ -0,0 +1,35 @@ +Global: + infer_imgs: "images/PULC/table_attribute/val_3610.jpg" + inference_model_dir: "./models/table_attribute_infer" + batch_size: 1 + use_gpu: True + enable_mkldnn: True + cpu_num_threads: 10 + benchmark: False + use_fp16: False + ir_optim: True + use_tensorrt: False + gpu_mem: 8000 + enable_profile: False + +PreProcess: + transform_ops: + - ResizeImage: + size: [224, 224] + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + channel_num: 3 + - ToCHWImage: + +PostProcess: + main_indicator: TableAttribute + TableAttribute: + source_threshold: 0.5 + number_threshold: 0.5 + color_threshold: 0.5 + clarity_threshold : 0.5 + obstruction_threshold: 0.5 + angle_threshold: 0.5 diff --git a/deploy/configs/inference_cls.yaml b/deploy/configs/inference_cls.yaml index d9181278cc617822f98e4966abf0d12ceca498a4..35eab39ba51ababeca778cdc4290a7e7ff46a8a8 100644 --- a/deploy/configs/inference_cls.yaml +++ b/deploy/configs/inference_cls.yaml @@ -22,7 +22,7 @@ PreProcess: scale: 0.00392157 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" channel_num: 3 - ToCHWImage: diff --git a/deploy/configs/inference_det.yaml b/deploy/configs/inference_det.yaml index dab7908ef7f59bfed077d9189811aedb650b0e92..08b5302f274cb0ed0beadeacc4fd434ca13ec986 100644 --- a/deploy/configs/inference_det.yaml +++ b/deploy/configs/inference_det.yaml @@ -6,7 +6,7 @@ Global: threshold: 0.2 max_det_results: 1 label_list: - - foreground + - foreground # inference engine config use_gpu: True @@ -30,5 +30,5 @@ DetPreProcess: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - DetPermute: {} - -DetPostProcess: {} \ No newline at end of file + +DetPostProcess: {} diff --git a/deploy/configs/inference_drink.yaml b/deploy/configs/inference_drink.yaml index 1c3e2c29aa8ddd5db46bbc8660c9f45942696a9c..df5b97ecaa27735b61c0896a1f1e56818c1ece91 100644 --- a/deploy/configs/inference_drink.yaml +++ b/deploy/configs/inference_drink.yaml @@ -1,7 +1,7 @@ Global: - infer_imgs: "./drink_dataset_v1.0/test_images/hongniu_1.jpg" + infer_imgs: "./drink_dataset_v2.0/test_images/100.jpeg" det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer" - rec_inference_model_dir: "./models/general_PPLCNet_x2_5_lite_v1.0_infer" + rec_inference_model_dir: "./models/general_PPLCNetV2_base_pretrained_v1.0_infer" rec_nms_thresold: 0.05 batch_size: 1 @@ -9,7 +9,7 @@ Global: threshold: 0.2 max_det_results: 5 label_list: - - foreground + - foreground use_gpu: True enable_mkldnn: False @@ -43,7 +43,7 @@ RecPreProcess: scale: 0.00392157 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" - ToCHWImage: RecPostProcess: null @@ -51,13 +51,13 @@ RecPostProcess: null # indexing engine config IndexProcess: index_method: "HNSW32" # supported: HNSW32, IVF, Flat - image_root: "./drink_dataset_v1.0/gallery" - index_dir: "./drink_dataset_v1.0/index" - data_file: "./drink_dataset_v1.0/gallery/drink_label.txt" + image_root: "./drink_dataset_v2.0/gallery" + index_dir: "./drink_dataset_v2.0/index" + data_file: "./drink_dataset_v2.0/gallery/drink_label.txt" index_operation: "new" # suported: "append", "remove", "new" - delimiter: " " + delimiter: "\t" dist_type: "IP" embedding_size: 512 batch_size: 32 return_k: 5 - score_thres: 0.4 \ No newline at end of file + score_thres: 0.4 diff --git a/deploy/configs/inference_general.yaml b/deploy/configs/inference_general.yaml index 8fb8ae3a56697b882be00da554f33750ead42f70..c325d77575d98fb706252fc3611fe02ac71353a3 100644 --- a/deploy/configs/inference_general.yaml +++ b/deploy/configs/inference_general.yaml @@ -1,7 +1,7 @@ Global: - infer_imgs: "./drink_dataset_v1.0/test_images/nongfu_spring.jpeg" + infer_imgs: "./drink_dataset_v2.0/test_images/100.jpeg" det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer" - rec_inference_model_dir: "./models/general_PPLCNet_x2_5_lite_v1.0_infer" + rec_inference_model_dir: "./models/general_PPLCNetV2_base_pretrained_v1.0_infer" rec_nms_thresold: 0.05 batch_size: 1 @@ -9,7 +9,7 @@ Global: threshold: 0.2 max_det_results: 5 label_list: - - foreground + - foreground use_gpu: True enable_mkldnn: True @@ -38,12 +38,15 @@ DetPostProcess: {} RecPreProcess: transform_ops: - ResizeImage: - size: 224 + size: [224, 224] + return_numpy: False + interpolation: bilinear + backend: cv2 - NormalizeImage: - scale: 0.00392157 + scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: hwc - ToCHWImage: RecPostProcess: null @@ -51,13 +54,13 @@ RecPostProcess: null # indexing engine config IndexProcess: index_method: "HNSW32" # supported: HNSW32, IVF, Flat - image_root: "./drink_dataset_v1.0/gallery/" - index_dir: "./drink_dataset_v1.0/index" - data_file: "./drink_dataset_v1.0/gallery/drink_label.txt" + image_root: "./drink_dataset_v2.0/gallery/" + index_dir: "./drink_dataset_v2.0/index" + data_file: "./drink_dataset_v2.0/gallery/drink_label.txt" index_operation: "new" # suported: "append", "remove", "new" delimiter: "\t" dist_type: "IP" embedding_size: 512 batch_size: 32 return_k: 5 - score_thres: 0.5 \ No newline at end of file + score_thres: 0.5 diff --git a/deploy/configs/inference_rec.yaml b/deploy/configs/inference_rec.yaml index e183ef07538ba24bc6092895338dfd8fc1551c43..b821ebe54db2f6297fbdb74bb038df41dccd7529 100644 --- a/deploy/configs/inference_rec.yaml +++ b/deploy/configs/inference_rec.yaml @@ -1,6 +1,6 @@ Global: infer_imgs: "./images/wangzai.jpg" - rec_inference_model_dir: "./models/product_ResNet50_vd_aliproduct_v1.0_infer" + rec_inference_model_dir: "./models/general_PPLCNetV2_base_pretrained_v1.0_infer" batch_size: 1 use_gpu: False enable_mkldnn: True @@ -15,14 +15,15 @@ Global: RecPreProcess: transform_ops: - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 + size: [224, 224] + return_numpy: False + interpolation: bilinear + backend: cv2 - NormalizeImage: - scale: 0.00392157 + scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: hwc - ToCHWImage: -RecPostProcess: null \ No newline at end of file +RecPostProcess: null diff --git a/deploy/cpp_shitu/readme.md b/deploy/cpp_shitu/readme.md index 97315ec327b19158dbd24c888ce3e1c874308903..84886a0560c8eec5420037ebef1d374a96916e03 100644 --- a/deploy/cpp_shitu/readme.md +++ b/deploy/cpp_shitu/readme.md @@ -1,6 +1,6 @@ # 服务器端C++预测 -本教程将介绍在服务器端部署PP-ShiTU的详细步骤。 +本教程将介绍在服务器端部署PP-ShiTu的详细步骤。 ## 目录 @@ -30,39 +30,39 @@ - 下载最新版本cmake -```shell -# 当前版本最新为3.22.0,根据实际情况自行下载,建议最新版本 -wget https://github.com/Kitware/CMake/releases/download/v3.22.0/cmake-3.22.0.tar.gz -tar xf cmake-3.22.0.tar.gz -``` + ```shell + # 当前版本最新为3.22.0,根据实际情况自行下载,建议最新版本 + wget https://github.com/Kitware/CMake/releases/download/v3.22.0/cmake-3.22.0.tar.gz + tar -xf cmake-3.22.0.tar.gz + ``` -最终可以在当前目录下看到`cmake-3.22.0/`的文件夹。 + 最终可以在当前目录下看到`cmake-3.22.0/`的文件夹。 -- 编译cmake,首先设置came源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的came源码路径,`install_path`为came的安装路径。在本例中,源码路径即为当前目录下的`cmake-3.22.0/`。 +- 编译cmake,首先设置cmake源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的cmake源码路径,`install_path`为cmake的安装路径。在本例中,源码路径即为当前目录下的`cmake-3.22.0/`。 -```shell -cd ./cmake-3.22.0 -export root_path=$PWD -export install_path=${root_path}/cmake -``` + ```shell + cd ./cmake-3.22.0 + export root_path=$PWD + export install_path=${root_path}/cmake + ``` -- 然后在cmake源码路径下,按照下面的方式进行编译 +- 然后在cmake源码路径下,执行以下命令进行编译 -```shell -./bootstrap --prefix=${install_path} -make -j -make install -``` + ```shell + ./bootstrap --prefix=${install_path} + make -j + make install + ``` -- 设置环境变量 +- 编译安装cmake完成后,设置cmake的环境变量供后续程序使用 -```shell -export PATH=${install_path}/bin:$PATH -#检查是否正常使用 -cmake --version -``` + ```shell + export PATH=${install_path}/bin:$PATH + #检查是否正常使用 + cmake --version + ``` -此时,cmake就可以使用了 +此时cmake就可以正常使用了 @@ -70,61 +70,66 @@ cmake --version * 首先需要从opencv官网上下载在Linux环境下源码编译的包,以3.4.7版本为例,下载及解压缩命令如下: -``` -wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz -tar -xvf 3.4.7.tar.gz -``` + ```shell + wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/test/opencv-3.4.7.tar.gz + tar -xvf 3.4.7.tar.gz + ``` -最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。 + 最终可以在当前目录下看到`opencv-3.4.7/`的文件夹。 * 编译opencv,首先设置opencv源码路径(`root_path`)以及安装路径(`install_path`),`root_path`为下载的opencv源码路径,`install_path`为opencv的安装路径。在本例中,源码路径即为当前目录下的`opencv-3.4.7/`。 -```shell -cd ./opencv-3.4.7 -export root_path=$PWD -export install_path=${root_path}/opencv3 -``` - -* 然后在opencv源码路径下,按照下面的方式进行编译。 + ```shell + # 进入deploy/cpp_shitu目录 + cd deploy/cpp_shitu -```shell -rm -rf build -mkdir build -cd build + # 安装opencv + cd ./opencv-3.4.7 + export root_path=$PWD + export install_path=${root_path}/opencv3 + ``` -cmake .. \ - -DCMAKE_INSTALL_PREFIX=${install_path} \ - -DCMAKE_BUILD_TYPE=Release \ - -DBUILD_SHARED_LIBS=OFF \ - -DWITH_IPP=OFF \ - -DBUILD_IPP_IW=OFF \ - -DWITH_LAPACK=OFF \ - -DWITH_EIGEN=OFF \ - -DCMAKE_INSTALL_LIBDIR=lib64 \ - -DWITH_ZLIB=ON \ - -DBUILD_ZLIB=ON \ - -DWITH_JPEG=ON \ - -DBUILD_JPEG=ON \ - -DWITH_PNG=ON \ - -DBUILD_PNG=ON \ - -DWITH_TIFF=ON \ - -DBUILD_TIFF=ON +* 然后在opencv源码路径下,按照下面的方式进行编译。 -make -j -make install -``` + ```shell + rm -rf build + mkdir build + cd build + + cmake .. \ + -DCMAKE_INSTALL_PREFIX=${install_path} \ + -DCMAKE_BUILD_TYPE=Release \ + -DBUILD_SHARED_LIBS=OFF \ + -DWITH_IPP=OFF \ + -DBUILD_IPP_IW=OFF \ + -DWITH_LAPACK=OFF \ + -DWITH_EIGEN=OFF \ + -DCMAKE_INSTALL_LIBDIR=lib64 \ + -DWITH_ZLIB=ON \ + -DBUILD_ZLIB=ON \ + -DWITH_JPEG=ON \ + -DBUILD_JPEG=ON \ + -DWITH_PNG=ON \ + -DBUILD_PNG=ON \ + -DWITH_TIFF=ON \ + -DBUILD_TIFF=ON + + make -j + make install + ``` * `make install`完成之后,会在该文件夹下生成opencv头文件和库文件,用于后面的PaddleClas代码编译。 -以opencv3.4.7版本为例,最终在安装路径下的文件结构如下所示。**注意**:不同的opencv版本,下述的文件结构可能不同。 + 以opencv3.4.7版本为例,最终在安装路径下的文件结构如下所示。**注意**:不同的opencv版本,下述的文件结构可能不同。 -``` -opencv3/ -|-- bin -|-- include -|-- lib64 -|-- share -``` + ```log + opencv3/ + ├── bin + ├── include + ├── lib + ├── lib64 + └── share + ``` @@ -139,44 +144,48 @@ opencv3/ * 如果希望获取最新预测库特性,可以从Paddle github上克隆最新代码,源码编译预测库。 * 可以参考[Paddle预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)的说明,从github上获取Paddle代码,然后进行编译,生成最新的预测库。使用git获取代码方法如下。 -```shell -git clone https://github.com/PaddlePaddle/Paddle.git -``` + ```shell + # 进入deploy/cpp_shitu目录 + cd deploy/cpp_shitu -* 进入Paddle目录后,使用如下方法编译。 + git clone https://github.com/PaddlePaddle/Paddle.git + ``` -```shell -rm -rf build -mkdir build -cd build +* 进入Paddle目录后,使用如下方法编译。 -cmake .. \ - -DWITH_CONTRIB=OFF \ - -DWITH_MKL=ON \ - -DWITH_MKLDNN=ON \ - -DWITH_TESTING=OFF \ - -DCMAKE_BUILD_TYPE=Release \ - -DWITH_INFERENCE_API_TEST=OFF \ - -DON_INFER=ON \ - -DWITH_PYTHON=ON -make -j -make inference_lib_dist -``` + ```shell + rm -rf build + mkdir build + cd build + + cmake .. \ + -DWITH_CONTRIB=OFF \ + -DWITH_MKL=ON \ + -DWITH_MKLDNN=ON \ + -DWITH_TESTING=OFF \ + -DCMAKE_BUILD_TYPE=Release \ + -DWITH_INFERENCE_API_TEST=OFF \ + -DON_INFER=ON \ + -DWITH_PYTHON=ON + + make -j + make inference_lib_dist + ``` -更多编译参数选项可以参考[Paddle C++预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)。 + 更多编译参数选项可以参考[Paddle C++预测库官网](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#id16)。 * 编译完成之后,可以在`build/paddle_inference_install_dir/`文件下看到生成了以下文件及文件夹。 -``` -build/paddle_inference_install_dir/ -|-- CMakeCache.txt -|-- paddle -|-- third_party -|-- version.txt -``` + ```log + build/paddle_inference_install_dir/ + ├── CMakeCache.txt + ├── paddle + ├── third_party + └── version.txt + ``` -其中`paddle`就是之后进行C++预测时所需的Paddle库,`version.txt`中包含当前预测库的版本信息。 + 其中`paddle`就是之后进行C++预测时所需的Paddle库,`version.txt`中包含当前预测库的版本信息。 @@ -187,33 +196,41 @@ build/paddle_inference_install_dir/ 以`https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz`的`develop`版本为例,使用下述命令下载并解压: -```shell -wget https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz + ```shell + # 进入deploy/cpp_shitu目录 + cd deploy/cpp_shitu -tar -xvf paddle_inference.tgz -``` + wget https://paddle-inference-lib.bj.bcebos.com/2.1.1-gpu-cuda10.2-cudnn8.1-mkl-gcc8.2/paddle_inference.tgz + + tar -xvf paddle_inference.tgz + ``` -最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。 + 最终会在当前的文件夹中生成`paddle_inference/`的子文件夹。 ### 1.4 安装faiss库 +在安装`faiss`前,请安装`openblas`,`ubuntu`系统中安装命令如下: + ```shell - # 下载 faiss - git clone https://github.com/facebookresearch/faiss.git - cd faiss - export faiss_install_path=$PWD/faiss_install - cmake -B build . -DFAISS_ENABLE_PYTHON=OFF -DCMAKE_INSTALL_PREFIX=${faiss_install_path} - make -C build -j faiss - make -C build install +apt-get install libopenblas-dev ``` -在安装`faiss`前,请安装`openblas`,`ubuntu`系统中安装命令如下: +然后按照以下命令编译并安装faiss ```shell -apt-get install libopenblas-dev +# 进入deploy/cpp_shitu目录 +cd deploy/cpp_shitu + +# 下载 faiss +git clone https://github.com/facebookresearch/faiss.git +cd faiss +export faiss_install_path=$PWD/faiss_install +cmake -B build . -DFAISS_ENABLE_PYTHON=OFF -DCMAKE_INSTALL_PREFIX=${faiss_install_path} +make -C build -j faiss +make -C build install ``` 注意本教程以安装faiss cpu版本为例,安装时请参考[faiss](https://github.com/facebookresearch/faiss)官网文档,根据需求自行安装。 @@ -224,12 +241,14 @@ apt-get install libopenblas-dev 编译命令如下,其中Paddle C++预测库、opencv等其他依赖库的地址需要换成自己机器上的实际地址。同时,编译过程中需要下载编译`yaml-cpp`等C++库,请保持联网环境。 - ```shell +# 进入deploy/cpp_shitu目录 +cd deploy/cpp_shitu + sh tools/build.sh ``` -具体地,`tools/build.sh`中内容如下,请根据具体路径修改。 +具体地,`tools/build.sh`中内容如下,请根据具体路径和配置情况进行修改。 ```shell OPENCV_DIR=${opencv_install_dir} @@ -261,14 +280,13 @@ cd .. 上述命令中, -* `OPENCV_DIR`为opencv编译安装的地址(本例中为`opencv-3.4.7/opencv3`文件夹的路径); -* `LIB_DIR`为下载的Paddle预测库(`paddle_inference`文件夹),或编译生成的Paddle预测库(`build/paddle_inference_install_dir`文件夹)的路径; -* `CUDA_LIB_DIR`为cuda库文件地址,在docker中为`/usr/local/cuda/lib64`; -* `CUDNN_LIB_DIR`为cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/`。 -* `TENSORRT_DIR`是tensorrt库文件地址,在dokcer中为`/usr/local/TensorRT6-cuda10.0-cudnn7/`,TensorRT需要结合GPU使用。 -* `FAISS_DIR`是faiss的安装地址 -* `FAISS_WITH_MKL`是指在编译faiss的过程中,是否使用了mkldnn,本文档中编译faiss,没有使用,而使用了openblas,故设置为`OFF`,若使用了mkldnn,则为`ON`. - +* `OPENCV_DIR`:opencv编译安装的地址(本例中为`opencv-3.4.7/opencv3`文件夹的路径); +* `LIB_DIR`:下载的Paddle预测库(`paddle_inference`文件夹),或编译生成的Paddle预测库(`build/paddle_inference_install_dir`文件夹)的路径; +* `CUDA_LIB_DIR`:cuda库文件地址,在docker中为`/usr/local/cuda/lib64`; +* `CUDNN_LIB_DIR`:cudnn库文件地址,在docker中为`/usr/lib/x86_64-linux-gnu/`。 +* `TENSORRT_DIR`:tensorrt库文件地址,在dokcer中为`/usr/local/TensorRT6-cuda10.0-cudnn7/`,TensorRT需要结合GPU使用。 +* `FAISS_DIR`:faiss的安装地址 +* `FAISS_WITH_MKL`:指在编译faiss的过程中是否使用mkldnn,本文档中编译faiss没有使用,而使用了openblas,故设置为`OFF`,若使用了mkldnn则为`ON`. 在执行上述命令,编译完成之后,会在当前路径下生成`build`文件夹,其中生成一个名为`pp_shitu`的可执行文件。 @@ -276,60 +294,68 @@ cd .. ## 3. 运行demo -- 请参考[识别快速开始文档](../../docs/zh_CN/quick_start/quick_start_recognition.md),下载好相应的 轻量级通用主体检测模型、轻量级通用识别模型及瓶装饮料测试数据并解压。 +- 按照如下命令下载好相应的轻量级通用主体检测模型、轻量级通用识别模型及瓶装饮料测试数据并解压。 ```shell + # 进入deploy目录 + cd deploy/ + mkdir models cd models + + # 下载并解压主体检测模型 wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar - wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar - tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar + + # 下载并解压特征提取模型 + wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar + tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar cd .. mkdir data cd data - wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar - tar -xf drink_dataset_v1.0.tar + wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar + tar -xf drink_dataset_v2.0.tar cd .. ``` - 将相应的yaml文件拷到当前文件夹下 ```shell - cp ../configs/inference_drink.yaml . + cp ../configs/inference_drink.yaml ./ ``` -- 将`inference_drink.yaml`中的相对路径,改成基于本目录的路径或者绝对路径。涉及到的参数有 +- 将`inference_drink.yaml`中的相对路径,改成基于 `deploy/cpp_shitu` 目录的相对路径或者绝对路径。涉及到的参数有 - - Global.infer_imgs :此参数可以是具体的图像地址,也可以是图像集所在的目录 - - Global.det_inference_model_dir : 检测模型存储目录 - - Global.rec_inference_model_dir : 识别模型存储目录 - - IndexProcess.index_dir : 检索库的存储目录,在示例中,检索库在下载的demo数据中。 + - `Global.infer_imgs` :此参数可以是具体的图像地址,也可以是图像集所在的目录 + - `Global.det_inference_model_dir` : 检测模型存储目录 + - `Global.rec_inference_model_dir` : 识别模型存储目录 + - `IndexProcess.index_dir` : 检索库的存储目录,在示例中,检索库在下载的demo数据中。 -- 字典转换 +- 标签文件转换 - 由于python的检索库的字典,使用`pickle`进行的序列化存储,导致C++不方便读取,因此进行转换 + 由于python的检索库的字典是使用`pickle`转换得到的序列化存储结果,导致C++不方便读取,因此需要先转换成普通的文本文件。 ```shell - python tools/transform_id_map.py -c inference_drink.yaml + python3.7 tools/transform_id_map.py -c inference_drink.yaml ``` - 转换成功后,在`IndexProcess.index_dir`目录下生成`id_map.txt`,方便c++ 读取。 + 转换成功后,在`IndexProcess.index_dir`目录下生成`id_map.txt`,以便在C++推理时读取。 - 执行程序 ```shell ./build/pp_shitu -c inference_drink.yaml - # or - ./build/pp_shitu -config inference_drink.yaml ``` - 若对图像集进行检索,则可能得到,如下结果。注意,此结果只做展示,具体以实际运行结果为准。 + 以 `drink_dataset_v2.0/test_images/nongfu_spring.jpeg` 作为输入图像,则执行上述推理命令可以得到如下结果 - 同时,需注意的是,由于opencv 版本问题,会导致图像在预处理的过程中,resize产生细微差别,导致python 和c++结果,轻微不同,如bbox相差几个像素,检索结果小数点后3位diff等。但不会改变最终检索label。 + ```log + ../../deploy/drink_dataset_v2.0/test_images/nongfu_spring.jpeg: + result0: bbox[0, 0, 729, 1094], score: 0.688691, label: 农夫山泉-饮用天然水 + ``` - ![](../../docs/images/quick_start/shitu_c++_result.png) + 由于python和C++的opencv实现存在部分不同,可能导致python推理和C++推理结果有微小差异。但基本不影响最终的检索结果。 diff --git a/deploy/images/PULC/table_attribute/val_3610.jpg b/deploy/images/PULC/table_attribute/val_3610.jpg new file mode 100644 index 0000000000000000000000000000000000000000..bb2772d095c2e151b404bf7e9e76f1de9c665129 Binary files /dev/null and b/deploy/images/PULC/table_attribute/val_3610.jpg differ diff --git a/deploy/images/PULC/table_attribute/val_851.jpg b/deploy/images/PULC/table_attribute/val_851.jpg new file mode 100644 index 0000000000000000000000000000000000000000..7da63f169146861e14c0d74301c01aa192806415 Binary files /dev/null and b/deploy/images/PULC/table_attribute/val_851.jpg differ diff --git a/deploy/lite_shitu/README.md b/deploy/lite_shitu/README.md deleted file mode 100644 index e2a03caedd0d4bf63af96d3541d1a8d021206e52..0000000000000000000000000000000000000000 --- a/deploy/lite_shitu/README.md +++ /dev/null @@ -1,353 +0,0 @@ -# PP-ShiTu在Paddle-Lite端侧部署 - -本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleClas PP-ShiTu模型的详细步骤。 - -Paddle Lite是飞桨轻量化推理引擎,为手机、IoT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。 - -## 1. 准备环境 - -### 运行准备 -- 电脑(编译Paddle Lite) -- 安卓手机(armv7或armv8) - -### 1.1 准备交叉编译环境 -交叉编译环境用于编译 Paddle Lite 和 PaddleClas 的PP-ShiTu Lite demo。 -支持多种开发环境,不同开发环境的编译流程请参考对应文档,请确保安装完成Java jdk、Android NDK(R17以上)。 - -1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker) -2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux) -3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os) - -```shell -# 配置完成交叉编译环境后,更新环境变量 -# for docker、Linux -source ~/.bashrc -# for Mac OS -source ~/.bash_profile -``` - -### 1.2 准备预测库 - -预测库有两种获取方式: -1. [**建议**]直接下载,预测库下载链接如下: - |平台| 架构 | 预测库下载链接| - |-|-|-| - |Android| arm7 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv7.clang.c++_static.with_extra.with_cv.tar.gz) | - | Android | arm8 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv.tar.gz) | - | Android | arm8(FP16) | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip) | - -**注意**:1. 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo) - - -2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: -```shell -git clone https://github.com/PaddlePaddle/Paddle-Lite.git -cd Paddle-Lite -# 如果使用编译方式,建议使用develop分支编译预测库 -git checkout develop -# FP32 -./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON -# FP16 -./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON --with_arm82_fp16=ON -``` - -**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_andriod.html#id2)。 - -直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv/`文件夹,通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。 -预测库的文件目录如下: - -``` -inference_lite_lib.android.armv8/ -|-- cxx C++ 预测库和头文件 -| |-- include C++ 头文件 -| | |-- paddle_api.h -| | |-- paddle_image_preprocess.h -| | |-- paddle_lite_factory_helper.h -| | |-- paddle_place.h -| | |-- paddle_use_kernels.h -| | |-- paddle_use_ops.h -| | `-- paddle_use_passes.h -| `-- lib C++预测库 -| |-- libpaddle_api_light_bundled.a C++静态库 -| `-- libpaddle_light_api_shared.so C++动态库 -|-- java Java预测库 -| |-- jar -| | `-- PaddlePredictor.jar -| |-- so -| | `-- libpaddle_lite_jni.so -| `-- src -|-- demo C++和Java示例代码 -| |-- cxx C++ 预测库demo -| `-- java Java 预测库demo -``` - -## 2 模型准备 - -### 2.1 模型准备 - -PaddleClas 提供了转换并优化后的推理模型,可以直接参考下方 2.1.1 小节进行下载。如果需要使用其他模型,请参考后续 2.1.2 小节自行转换并优化模型。 - -#### 2.1.1 使用PaddleClas提供的推理模型 - -```shell -# 进入lite_ppshitu目录 -cd $PaddleClas/deploy/lite_shitu -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/lite/ppshitu_lite_models_v1.2.tar -tar -xf ppshitu_lite_models_v1.2.tar -rm -f ppshitu_lite_models_v1.2.tar -``` - -#### 2.1.2 使用其他模型 - -Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。 - -**注意**:如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。 - -##### 2.1.2.1 安装paddle_lite_opt工具 - -安装`paddle_lite_opt`工具有如下两种方法: - -1. [**建议**]pip安装paddlelite并进行转换 - ```shell - pip install paddlelite==2.10rc - ``` - -2. 源码编译Paddle-Lite生成`paddle_lite_opt`工具 - - 模型优化需要Paddle-Lite的`opt`可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下: - ```shell - # 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite - git clone https://github.com/PaddlePaddle/Paddle-Lite.git - cd Paddle-Lite - git checkout develop - # 启动编译 - ./lite/tools/build.sh build_optimize_tool - ``` - - 编译完成后,`opt`文件位于`build.opt/lite/api/`下,可通过如下方式查看`opt`的运行选项和使用方式; - ```shell - cd build.opt/lite/api/ - ./opt - ``` - - `opt`的使用方式与参数与上面的`paddle_lite_opt`完全一致。 - -之后使用`paddle_lite_opt`工具可以进行inference模型的转换。`paddle_lite_opt`的部分参数如下: - -|选项|说明| -|-|-| -|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径| -|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径| -|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现,默认为naive_buffer| -|--optimize_out|优化模型的输出路径| -|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm| - -更详细的`paddle_lite_opt`工具使用说明请参考[使用opt转化模型文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html) - -`--model_file`表示inference模型的model文件地址,`--param_file`表示inference模型的param文件地址;`optimize_out`用于指定输出文件的名称(不需要添加`.nb`的后缀)。直接在命令行中运行`paddle_lite_opt`,也可以查看所有参数及其说明。 - - -##### 2.1.2.2 转换示例 - -下面介绍使用`paddle_lite_opt`完成主体检测模型和识别模型的预训练模型,转成inference模型,最终转换成Paddle-Lite的优化模型的过程。 - -1. 转换主体检测模型 - -```shell -# 当前目录为 $PaddleClas/deploy/lite_shitu -# $code_path需替换成相应的运行目录,可以根据需要,将$code_path设置成需要的目录 -export $code_path=~ -cd $code_path -git clone https://github.com/PaddlePaddle/PaddleDetection.git -# 进入PaddleDetection根目录 -cd PaddleDetection -# 将预训练模型导出为inference模型 -python tools/export_model.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml -o weights=https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams export_post_process=False --output_dir=inference -# 将inference模型转化为Paddle-Lite优化模型 -paddle_lite_opt --model_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdmodel --param_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdiparams --optimize_out=inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det -# 将转好的模型复制到lite_shitu目录下 -cd $PaddleClas/deploy/lite_shitu -mkdir models -cp $code_path/PaddleDetection/inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det.nb $PaddleClas/deploy/lite_shitu/models -``` - -2. 转换识别模型 - -```shell -# 转换为Paddle-Lite模型 -paddle_lite_opt --model_file=inference/inference.pdmodel --param_file=inference/inference.pdiparams --optimize_out=inference/rec -# 将模型文件拷贝到lite_shitu下 -cp inference/rec.nb deploy/lite_shitu/models/ -cd deploy/lite_shitu -``` - -**注意**:`--optimize_out` 参数为优化后模型的保存路径,无需加后缀`.nb`;`--model_file` 参数为模型结构信息文件的路径,`--param_file` 参数为模型权重信息文件的路径,请注意文件名。 - -### 2.2 生成新的检索库 - -由于lite 版本的检索库用的是`faiss1.5.3`版本,与新版本不兼容,因此需要重新生成index库 - -#### 2.2.1 数据及环境配置 - -```shell -# 进入上级目录 -cd .. -# 下载瓶装饮料数据集 -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar -rm -rf drink_dataset_v1.0.tar -rm -rf drink_dataset_v1.0/index - -# 安装1.5.3版本的faiss -pip install faiss-cpu==1.5.3 - -# 下载通用识别模型,可替换成自己的inference model -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar -tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar -rm -rf general_PPLCNet_x2_5_lite_v1.0_infer.tar -``` - -#### 2.2.2 生成新的index文件 - -```shell -# 生成新的index库,注意指定好识别模型的路径,同时将index_mothod修改成Flat,HNSW32和IVF在此版本中可能存在bug,请慎重使用。 -# 如果使用自己的识别模型,对应的修改inference model的目录 -python python/build_gallery.py -c configs/inference_drink.yaml -o Global.rec_inference_model_dir=general_PPLCNet_x2_5_lite_v1.0_infer -o IndexProcess.index_method=Flat - -# 进入到lite_shitu目录 -cd lite_shitu -mv ../drink_dataset_v1.0 . -``` - -### 2.3 将yaml文件转换成json文件 - -```shell -# 如果测试单张图像 -python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_path images/demo.jpeg -# or -# 如果测试多张图像 -python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_dir images -# 执行完成后,会在lit_shitu下生成shitu_config.json配置文件 -``` - -### 2.4 index字典转换 -由于python的检索库字典,使用`pickle`进行的序列化存储,导致C++不方便读取,因此需要进行转换 - -```shell - -# 转化id_map.pkl为id_map.txt -python transform_id_map.py -c ../configs/inference_drink.yaml -``` -转换成功后,会在`IndexProcess.index_dir`目录下生成`id_map.txt`。 - - -### 2.5 与手机联调 - -首先需要进行一些准备工作。 -1. 准备一台arm8的安卓手机,如果编译的预测库是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI=arm7`。 -2. 电脑上安装ADB工具,用于调试。 ADB安装方式如下: - - 2.1. MAC电脑安装ADB: - - ```shell - brew cask install android-platform-tools - ``` - 2.2. Linux安装ADB - ```shell - sudo apt update - sudo apt install -y wget adb - ``` - 2.3. Window安装ADB - - win上安装需要去谷歌的安卓平台下载ADB软件包进行安装:[链接](https://developer.android.com/studio) - -3. 手机连接电脑后,开启手机`USB调试`选项,选择`文件传输`模式,在电脑终端中输入: - -```shell -adb devices -``` -如果有device输出,则表示安装成功,如下所示: -``` -List of devices attached -744be294 device -``` - -4. 编译lite部署代码生成移动端可执行文件 - -```shell -cd $PaddleClas/deploy/lite_shitu -# ${lite prediction library path}下载的Paddle-Lite库路径 -inference_lite_path=${lite prediction library path}/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.with_cv/ -mkdir $inference_lite_path/demo/cxx/ppshitu_lite - -cp -r * $inference_lite_path/demo/cxx/ppshitu_lite -cd $inference_lite_path/demo/cxx/ppshitu_lite - -# 执行编译,等待完成后得到可执行文件main -make ARM_ABI=arm8 -#如果是arm7,则执行 make ARM_ABI = arm7 (或者在Makefile中修改该项) -``` - -5. 准备优化后的模型、预测库文件、测试图像。 - -```shell -mkdir deploy -mv ppshitu_lite_models_v1.1 deploy/ -mv drink_dataset_v1.0 deploy/ -mv images deploy/ -mv shitu_config.json deploy/ -cp pp_shitu deploy/ - -# 将C++预测动态库so文件复制到deploy文件夹中 -cp ../../../cxx/lib/libpaddle_light_api_shared.so deploy/ -``` - -执行完成后,deploy文件夹下将有如下文件格式: - -```shell -deploy/ -|-- ppshitu_lite_models_v1.1/ -| |--mainbody_PPLCNet_x2_5_640_quant_v1.1_lite.nb 优化后的主体检测模型文件 -| |--general_PPLCNet_x2_5_lite_v1.1_infer.nb 优化后的识别模型文件 -|-- images/ -| |--demo.jpg 图片文件 -|-- drink_dataset_v1.0/ 瓶装饮料demo数据 -| |--index 检索index目录 -|-- pp_shitu 生成的移动端执行文件 -|-- shitu_config.json 执行时参数配置文件 -|-- libpaddle_light_api_shared.so Paddle-Lite库文件 -``` - -**注意:** -* `shitu_config.json` 包含了目标检测的超参数,请按需进行修改 - -6. 启动调试,上述步骤完成后就可以使用ADB将文件夹 `deploy/` push到手机上运行,步骤如下: - -```shell -# 将上述deploy文件夹push到手机上 -adb push deploy /data/local/tmp/ - -adb shell -cd /data/local/tmp/deploy -export LD_LIBRARY_PATH=/data/local/tmp/deploy:$LD_LIBRARY_PATH - -# 修改权限为可执行 -chmod 777 pp_shitu -# 执行程序 -./pp_shitu shitu_config.json -``` - -如果对代码做了修改,则需要重新编译并push到手机上。 - -运行效果如下: -``` -images/demo.jpeg: - result0: bbox[344, 98, 527, 593], score: 0.811656, label: 红牛-强化型 - result1: bbox[0, 0, 600, 600], score: 0.729664, label: 红牛-强化型 -``` - -## FAQ -Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗? -A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件即可,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。 - -Q2:换一个图测试怎么做? -A2:替换 deploy 下的测试图像为你想要测试的图像,并重新生成json配置文件(或者直接修改图像路径),使用 ADB 再次 push 到手机上即可。 diff --git a/deploy/lite_shitu/README.md b/deploy/lite_shitu/README.md new file mode 120000 index 0000000000000000000000000000000000000000..862cca14887f2a4c49f653f468081f8da500891e --- /dev/null +++ b/deploy/lite_shitu/README.md @@ -0,0 +1 @@ +../../docs/zh_CN/inference_deployment/lite_shitu.md \ No newline at end of file diff --git a/deploy/paddleserving/build_server.sh b/deploy/paddleserving/build_server.sh index 1329a3684ff72862858ee25c0a938bd61ff654ae..8d513b6d39b411b6a624d403535348e55e069c34 100644 --- a/deploy/paddleserving/build_server.sh +++ b/deploy/paddleserving/build_server.sh @@ -9,15 +9,15 @@ # 默认编译时的${PWD}=PaddleClas/deploy/paddleserving/ -python_name=${1:-'python'} +export python_name=${1:-'python'} apt-get update apt install -y libcurl4-openssl-dev libbz2-dev wget -nc https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar tar xf centos_ssl.tar rm -rf centos_ssl.tar -mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k -mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k +\mv libcrypto.so.1.0.2k /usr/lib/libcrypto.so.1.0.2k +\mv libssl.so.1.0.2k /usr/lib/libssl.so.1.0.2k ln -sf /usr/lib/libcrypto.so.1.0.2k /usr/lib/libcrypto.so.10 ln -sf /usr/lib/libssl.so.1.0.2k /usr/lib/libssl.so.10 ln -sf /usr/lib/libcrypto.so.10 /usr/lib/libcrypto.so diff --git a/deploy/paddleserving/recognition/config.yml b/deploy/paddleserving/recognition/config.yml index e4108006e6f2ea1a3698e4fdf9c32f25dcbfbeb0..b099fe549e7c957d5dbf458899665440cd83c049 100644 --- a/deploy/paddleserving/recognition/config.yml +++ b/deploy/paddleserving/recognition/config.yml @@ -16,9 +16,8 @@ op: #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置 local_service_conf: - #uci模型路径 - model_config: ../../models/general_PPLCNet_x2_5_lite_v1.0_serving + model_config: ../../models/general_PPLCNetV2_base_pretrained_v1.0_serving #计算硬件类型: 空缺时由devices决定(CPU/GPU),0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu device_type: 1 @@ -37,7 +36,7 @@ op: local_service_conf: client_type: local_predictor device_type: 1 - devices: '0' + devices: "0" fetch_list: - - save_infer_model/scale_0.tmp_1 + - save_infer_model/scale_0.tmp_1 model_config: ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ diff --git a/deploy/paddleserving/recognition/pipeline_http_client.py b/deploy/paddleserving/recognition/pipeline_http_client.py index efc0f3afeeb18b73c9bfc1f0378a548ed2a12d6a..3e6282d29fef4546f0a8450b36e7f16ec34f854c 100644 --- a/deploy/paddleserving/recognition/pipeline_http_client.py +++ b/deploy/paddleserving/recognition/pipeline_http_client.py @@ -1,20 +1,45 @@ -import requests -import json +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. import base64 +import json import os -imgpath = "../../drink_dataset_v1.0/test_images/001.jpeg" +import requests + +image_path = "../../drink_dataset_v2.0/test_images/100.jpeg" + + +def bytes_to_base64(image_bytes: bytes) -> bytes: + """encode bytes using base64 algorithm + + Args: + image_bytes (bytes): bytes object to be encoded + + Returns: + bytes: base64 bytes + """ + return base64.b64encode(image_bytes).decode('utf8') -def cv2_to_base64(image): - return base64.b64encode(image).decode('utf8') if __name__ == "__main__": url = "http://127.0.0.1:18081/recognition/prediction" - with open(os.path.join(".", imgpath), 'rb') as file: - image_data1 = file.read() - image = cv2_to_base64(image_data1) - data = {"key": ["image"], "value": [image]} + with open(os.path.join(".", image_path), 'rb') as file: + image_bytes = file.read() + + image_base64 = bytes_to_base64(image_bytes) + data = {"key": ["image"], "value": [image_base64]} for i in range(1): r = requests.post(url=url, data=json.dumps(data)) diff --git a/deploy/paddleserving/recognition/pipeline_rpc_client.py b/deploy/paddleserving/recognition/pipeline_rpc_client.py index 50a1e42c45502cb0b336d63bd50a37d176a8d7a4..c4beb35b3e827b618fba6d800975818825dec7dc 100644 --- a/deploy/paddleserving/recognition/pipeline_rpc_client.py +++ b/deploy/paddleserving/recognition/pipeline_rpc_client.py @@ -15,20 +15,33 @@ try: from paddle_serving_server_gpu.pipeline import PipelineClient except ImportError: from paddle_serving_server.pipeline import PipelineClient + import base64 +import os client = PipelineClient() client.connect(['127.0.0.1:9994']) -imgpath = "../../drink_dataset_v1.0/test_images/001.jpeg" +image_path = "../../drink_dataset_v2.0/test_images/100.jpeg" + + +def bytes_to_base64(image_bytes: bytes) -> bytes: + """encode bytes using base64 algorithm + + Args: + image_bytes (bytes): bytes to be encoded + + Returns: + bytes: base64 bytes + """ + return base64.b64encode(image_bytes).decode('utf8') -def cv2_to_base64(image): - return base64.b64encode(image).decode('utf8') if __name__ == "__main__": - with open(imgpath, 'rb') as file: - image_data = file.read() - image = cv2_to_base64(image_data) + with open(os.path.join(".", image_path), 'rb') as file: + image_bytes = file.read() + image_base64 = bytes_to_base64(image_bytes) for i in range(1): - ret = client.predict(feed_dict={"image": image}, fetch=["result"]) + ret = client.predict( + feed_dict={"image": image_base64}, fetch=["result"]) print(ret) diff --git a/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt b/deploy/paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_client/serving_client_conf.prototxt similarity index 90% rename from deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt rename to deploy/paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_client/serving_client_conf.prototxt index c781eb6f449fe06afbba7f96e01798c974bccf54..af2f77d42bb1a000cfaf22e02760cd80951de74e 100644 --- a/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/serving_client_conf.prototxt +++ b/deploy/paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_client/serving_client_conf.prototxt @@ -15,7 +15,7 @@ feed_var { shape: 6 } fetch_var { - name: "save_infer_model/scale_0.tmp_1" + name: "batch_norm_25.tmp_2" alias_name: "features" is_lod_tensor: false fetch_type: 1 diff --git a/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/serving_server_conf.prototxt b/deploy/paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_serving/serving_server_conf.prototxt similarity index 90% rename from deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/serving_server_conf.prototxt rename to deploy/paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_serving/serving_server_conf.prototxt index 04812f42ed90fbbd47c73b9ec706d57c04b4c571..6e93f4657c893ff27be7f4418100f4aeb0479334 100644 --- a/deploy/paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/serving_server_conf.prototxt +++ b/deploy/paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_serving/serving_server_conf.prototxt @@ -15,7 +15,7 @@ feed_var { shape: 6 } fetch_var { - name: "save_infer_model/scale_0.tmp_1" + name: "batch_norm_25.tmp_2" alias_name: "features" is_lod_tensor: false fetch_type: 1 diff --git a/deploy/paddleserving/recognition/recognition_web_service.py b/deploy/paddleserving/recognition/recognition_web_service.py index 4a3478b65fa43a45d050e1b3341066acbe199138..5105382e23cebdeede6d992c0eed926ae3c5ad3e 100644 --- a/deploy/paddleserving/recognition/recognition_web_service.py +++ b/deploy/paddleserving/recognition/recognition_web_service.py @@ -11,17 +11,24 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -from paddle_serving_server.web_service import WebService, Op +import base64 +import json import logging -import numpy as np +import os +import pickle import sys + import cv2 -from paddle_serving_app.reader import * -import base64 -import os import faiss -import pickle -import json +import numpy as np +from paddle_serving_app.reader import BGR2RGB +from paddle_serving_app.reader import Div +from paddle_serving_app.reader import Normalize +from paddle_serving_app.reader import RCNNPostprocess +from paddle_serving_app.reader import Resize +from paddle_serving_app.reader import Sequential +from paddle_serving_app.reader import Transpose +from paddle_serving_server.web_service import Op, WebService class DetOp(Op): @@ -101,11 +108,11 @@ class RecOp(Op): def init_op(self): self.seq = Sequential([ BGR2RGB(), Resize((224, 224)), Div(255), - Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], - False), Transpose((2, 0, 1)) + Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False), + Transpose((2, 0, 1)) ]) - index_dir = "../../drink_dataset_v1.0/index" + index_dir = "../../drink_dataset_v2.0/index" assert os.path.exists(os.path.join( index_dir, "vector.index")), "vector.index not found ..." assert os.path.exists(os.path.join( @@ -136,7 +143,7 @@ class RecOp(Op): }) self.det_boxes = boxes - #construct batch images for rec + # construct batch images for rec imgs = [] for box in boxes: box = [int(x) for x in box["bbox"]] @@ -192,7 +199,7 @@ class RecOp(Op): pred["rec_scores"] = scores[i][0] results.append(pred) - #do nms + # do NMS results = self.nms_to_rec_results(results, self.rec_nms_thresold) return {"result": str(results)}, None, "" diff --git a/deploy/paddleserving/recognition/run_cpp_serving.sh b/deploy/paddleserving/recognition/run_cpp_serving.sh index e1deb1148b1705031c0e92522e7eaf7cf4679a45..93730b5a9962e1cd7f37d4f04161d6e313620203 100644 --- a/deploy/paddleserving/recognition/run_cpp_serving.sh +++ b/deploy/paddleserving/recognition/run_cpp_serving.sh @@ -3,12 +3,12 @@ gpu_id=$1 # PP-ShiTu CPP serving script if [[ -n "${gpu_id}" ]]; then nohup python3.7 -m paddle_serving_server.serve \ - --model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNet_x2_5_lite_v1.0_serving \ + --model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNetV2_base_pretrained_v1.0_serving \ --op GeneralPicodetOp GeneralFeatureExtractOp \ --port 9400 --gpu_id="${gpu_id}" > log_PPShiTu.txt 2>&1 & else nohup python3.7 -m paddle_serving_server.serve \ - --model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNet_x2_5_lite_v1.0_serving \ + --model ../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving ../../models/general_PPLCNetV2_base_pretrained_v1.0_serving \ --op GeneralPicodetOp GeneralFeatureExtractOp \ --port 9400 > log_PPShiTu.txt 2>&1 & fi diff --git a/deploy/paddleserving/recognition/test_cpp_serving_client.py b/deploy/paddleserving/recognition/test_cpp_serving_client.py index e2cd17e855ebfe8fb286ebaeff8ab63874e2e972..0a51f1e02df021066a1ba474b076bcd82baa9851 100644 --- a/deploy/paddleserving/recognition/test_cpp_serving_client.py +++ b/deploy/paddleserving/recognition/test_cpp_serving_client.py @@ -12,20 +12,19 @@ # See the License for the specific language governing permissions and # limitations under the License. -import numpy as np +import os +import pickle -from paddle_serving_client import Client -from paddle_serving_app.reader import * import cv2 import faiss -import os -import pickle +import numpy as np +from paddle_serving_client import Client rec_nms_thresold = 0.05 rec_score_thres = 0.5 feature_normalize = True return_k = 1 -index_dir = "../../drink_dataset_v1.0/index" +index_dir = "../../drink_dataset_v2.0/index" def init_index(index_dir): @@ -41,7 +40,7 @@ def init_index(index_dir): return searcher, id_map -#get box +# get box def nms_to_rec_results(results, thresh=0.1): filtered_results = [] @@ -91,21 +90,21 @@ def postprocess(fetch_dict, feature_normalize, det_boxes, searcher, id_map, pred["rec_scores"] = scores[i][0] results.append(pred) - #do nms + # do NMS results = nms_to_rec_results(results, rec_nms_thresold) return results -#do client +# do client if __name__ == "__main__": client = Client() client.load_client_config([ "../../models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client", - "../../models/general_PPLCNet_x2_5_lite_v1.0_client" + "../../models/general_PPLCNetV2_base_pretrained_v1.0_client" ]) client.connect(['127.0.0.1:9400']) - im = cv2.imread("../../drink_dataset_v1.0/test_images/001.jpeg") + im = cv2.imread("../../drink_dataset_v2.0/test_images/100.jpeg") im_shape = np.array(im.shape[:2]).reshape(-1) fetch_map = client.predict( feed={"image": im, @@ -113,7 +112,7 @@ if __name__ == "__main__": fetch=["features", "boxes"], batch=False) - #add retrieval procedure + # add retrieval procedure det_boxes = fetch_map["boxes"] searcher, id_map = init_index(index_dir) results = postprocess(fetch_map, feature_normalize, det_boxes, searcher, diff --git a/deploy/python/build_gallery.py b/deploy/python/build_gallery.py index 8184d59608d4f6593a7170f9f933794d85ef675e..63c411c64ead923cf77d3ed1b870642c58be9d92 100644 --- a/deploy/python/build_gallery.py +++ b/deploy/python/build_gallery.py @@ -12,16 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. import os +import pickle import cv2 import faiss import numpy as np -from tqdm import tqdm -import pickle - -from paddleclas.deploy.utils import logger, config -from paddleclas.deploy.python.predict_rec import RecPredictor from paddleclas.deploy.python.predict_rec import RecPredictor +from paddleclas.deploy.utils import config, logger +from tqdm import tqdm def split_datafile(data_file, image_root, delimiter="\t"): @@ -52,6 +50,7 @@ class GalleryBuilder(object): self.config = config self.rec_predictor = RecPredictor(config) assert 'IndexProcess' in config.keys(), "Index config not found ... " + self.android_demo = config["Global"].get("android_demo", False) self.build(config['IndexProcess']) def build(self, config): @@ -70,98 +69,50 @@ class GalleryBuilder(object): "new", "remove", "append" ], "Only append, remove and new operation are supported" + if self.android_demo: + self._create_index_for_android_demo(config, gallery_features, gallery_docs) + return + # vector.index: faiss index file # id_map.pkl: use this file to map id to image_doc + index, ids = None, None if operation_method in ["remove", "append"]: - # if remove or append, vector.index and id_map.pkl must exist - assert os.path.join( - config["index_dir"], "vector.index" - ), "The vector.index dose not exist in {} when 'index_operation' is not None".format( - config["index_dir"]) - assert os.path.join( - config["index_dir"], "id_map.pkl" - ), "The id_map.pkl dose not exist in {} when 'index_operation' is not None".format( - config["index_dir"]) - index = faiss.read_index( - os.path.join(config["index_dir"], "vector.index")) - with open(os.path.join(config["index_dir"], "id_map.pkl"), - 'rb') as fd: - ids = pickle.load(fd) - assert index.ntotal == len(ids.keys( - )), "data number in index is not equal in in id_map" - else: - if not os.path.exists(config["index_dir"]): - os.makedirs(config["index_dir"], exist_ok=True) + # if remove or append, load vector.index and id_map.pkl + index, ids = self._load_index(config) index_method = config.get("index_method", "HNSW32") - - # if IVF method, cal ivf number automaticlly - if index_method == "IVF": - index_method = index_method + str( - min(int(len(gallery_images) // 8), 65536)) + ",Flat" - - # for binary index, add B at head of index_method - if config["dist_type"] == "hamming": - index_method = "B" + index_method - - #dist_type - dist_type = faiss.METRIC_INNER_PRODUCT if config[ - "dist_type"] == "IP" else faiss.METRIC_L2 - - #build index - if config["dist_type"] == "hamming": - index = faiss.index_binary_factory(config["embedding_size"], - index_method) - else: - index = faiss.index_factory(config["embedding_size"], - index_method, dist_type) - index = faiss.IndexIDMap2(index) - ids = {} - - if config["index_method"] == "HNSW32": + else: + index_method, index, ids = self._create_index(config) + if index_method == "HNSW32": logger.warning( "The HNSW32 method dose not support 'remove' operation") if operation_method != "remove": # calculate id for new data - start_id = max(ids.keys()) + 1 if ids else 0 - ids_now = ( - np.arange(0, len(gallery_images)) + start_id).astype(np.int64) - - # only train when new index file - if operation_method == "new": - if config["dist_type"] == "hamming": - index.add(gallery_features) - else: - index.train(gallery_features) - - if not config["dist_type"] == "hamming": - index.add_with_ids(gallery_features, ids_now) - - for i, d in zip(list(ids_now), gallery_docs): - ids[i] = d + index, ids = self._add_gallery(index, ids, gallery_features, gallery_docs, config, operation_method) else: - if config["index_method"] == "HNSW32": + if index_method == "HNSW32": raise RuntimeError( "The index_method: HNSW32 dose not support 'remove' operation" ) # remove ids in id_map, remove index data in faiss index - remove_ids = list( - filter(lambda k: ids.get(k) in gallery_docs, ids.keys())) - remove_ids = np.asarray(remove_ids) - index.remove_ids(remove_ids) - for k in remove_ids: - del ids[k] + index, ids = self._rm_id_in_galllery(index, ids, gallery_docs) # store faiss index file and id_map file - if config["dist_type"] == "hamming": - faiss.write_index_binary( - index, os.path.join(config["index_dir"], "vector.index")) - else: - faiss.write_index( - index, os.path.join(config["index_dir"], "vector.index")) - - with open(os.path.join(config["index_dir"], "id_map.pkl"), 'wb') as fd: - pickle.dump(ids, fd) + self._save_gallery(config, index, ids) + + def _create_index_for_android_demo(self, config, gallery_features, gallery_docs): + if not os.path.exists(config["index_dir"]): + os.makedirs(config["index_dir"], exist_ok=True) + #build index + index = faiss.IndexFlatIP(config["embedding_size"]) + index.add(gallery_features) + + # calculate id for data + ids_now = (np.arange(0, len(gallery_docs))).astype(np.int64) + ids = {} + for i, d in zip(list(ids_now), gallery_docs): + ids[i] = d + self._save_gallery(config, index, ids) def _extract_features(self, gallery_images, config): # extract gallery features @@ -197,6 +148,93 @@ class GalleryBuilder(object): return gallery_features + def _load_index(self, config): + assert os.path.join( + config["index_dir"], "vector.index" + ), "The vector.index dose not exist in {} when 'index_operation' is not None".format( + config["index_dir"]) + assert os.path.join( + config["index_dir"], "id_map.pkl" + ), "The id_map.pkl dose not exist in {} when 'index_operation' is not None".format( + config["index_dir"]) + index = faiss.read_index( + os.path.join(config["index_dir"], "vector.index")) + with open(os.path.join(config["index_dir"], "id_map.pkl"), + 'rb') as fd: + ids = pickle.load(fd) + assert index.ntotal == len(ids.keys( + )), "data number in index is not equal in in id_map" + return index, ids + + def _create_index(self, config): + if not os.path.exists(config["index_dir"]): + os.makedirs(config["index_dir"], exist_ok=True) + index_method = config.get("index_method", "HNSW32") + + # if IVF method, cal ivf number automaticlly + if index_method == "IVF": + index_method = index_method + str( + min(int(len(gallery_images) // 8), 65536)) + ",Flat" + + # for binary index, add B at head of index_method + if config["dist_type"] == "hamming": + index_method = "B" + index_method + + #dist_type + dist_type = faiss.METRIC_INNER_PRODUCT if config[ + "dist_type"] == "IP" else faiss.METRIC_L2 + + #build index + if config["dist_type"] == "hamming": + index = faiss.index_binary_factory(config["embedding_size"], + index_method) + else: + index = faiss.index_factory(config["embedding_size"], + index_method, dist_type) + index = faiss.IndexIDMap2(index) + ids = {} + return index_method, index, ids + + def _add_gallery(self, index, ids, gallery_features, gallery_docs, config, operation_method): + start_id = max(ids.keys()) + 1 if ids else 0 + ids_now = ( + np.arange(0, len(gallery_docs)) + start_id).astype(np.int64) + + # only train when new index file + if operation_method == "new": + if config["dist_type"] == "hamming": + index.add(gallery_features) + else: + index.train(gallery_features) + + if not config["dist_type"] == "hamming": + index.add_with_ids(gallery_features, ids_now) + + for i, d in zip(list(ids_now), gallery_docs): + ids[i] = d + return index, ids + + def _rm_id_in_galllery(self, index, ids, gallery_docs): + remove_ids = list( + filter(lambda k: ids.get(k) in gallery_docs, ids.keys())) + remove_ids = np.asarray(remove_ids) + index.remove_ids(remove_ids) + for k in remove_ids: + del ids[k] + + return index, ids + + def _save_gallery(self, config, index, ids): + if config["dist_type"] == "hamming": + faiss.write_index_binary( + index, os.path.join(config["index_dir"], "vector.index")) + else: + faiss.write_index( + index, os.path.join(config["index_dir"], "vector.index")) + + with open(os.path.join(config["index_dir"], "id_map.pkl"), 'wb') as fd: + pickle.dump(ids, fd) + def main(config): GalleryBuilder(config) diff --git a/deploy/python/postprocess.py b/deploy/python/postprocess.py index ce373b9483ba865870b366ca63bbf106e6883c0c..bb1b86f910569ba5bf343a0860c1ec75c22ac9c9 100644 --- a/deploy/python/postprocess.py +++ b/deploy/python/postprocess.py @@ -364,3 +364,49 @@ class VehicleAttribute(object): ).astype(np.int8).tolist() batch_res.append({"attributes": label_res, "output": pred_res}) return batch_res + + +class TableAttribute(object): + def __init__( + self, + source_threshold=0.5, + number_threshold=0.5, + color_threshold=0.5, + clarity_threshold=0.5, + obstruction_threshold=0.5, + angle_threshold=0.5, ): + self.source_threshold = source_threshold + self.number_threshold = number_threshold + self.color_threshold = color_threshold + self.clarity_threshold = clarity_threshold + self.obstruction_threshold = obstruction_threshold + self.angle_threshold = angle_threshold + + def __call__(self, batch_preds, file_names=None): + # postprocess output of predictor + batch_res = [] + + for res in batch_preds: + res = res.tolist() + label_res = [] + source = 'Scanned' if res[0] > self.source_threshold else 'Photo' + number = 'Little' if res[1] > self.number_threshold else 'Numerous' + color = 'Black-and-White' if res[ + 2] > self.color_threshold else 'Multicolor' + clarity = 'Clear' if res[3] > self.clarity_threshold else 'Blurry' + obstruction = 'Without-Obstacles' if res[ + 4] > self.number_threshold else 'With-Obstacles' + angle = 'Horizontal' if res[ + 5] > self.number_threshold else 'Tilted' + + label_res = [source, number, color, clarity, obstruction, angle] + + threshold_list = [ + self.source_threshold, self.number_threshold, + self.color_threshold, self.clarity_threshold, + self.obstruction_threshold, self.angle_threshold + ] + pred_res = (np.array(res) > np.array(threshold_list) + ).astype(np.int8).tolist() + batch_res.append({"attributes": label_res, "output": pred_res}) + return batch_res diff --git a/deploy/python/predict_cls.py b/deploy/python/predict_cls.py index 6c312b423345967a1f163f09d79ab82f4ea83c72..bffef381abfe6d55597b4f334a4a879aefa854cd 100644 --- a/deploy/python/predict_cls.py +++ b/deploy/python/predict_cls.py @@ -11,7 +11,6 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. - import os import cv2 @@ -136,7 +135,8 @@ def main(config): for number, result_dict in enumerate(batch_results): if "PersonAttribute" in config[ "PostProcess"] or "VehicleAttribute" in config[ - "PostProcess"]: + "PostProcess"] or "TableAttribute" in config[ + "PostProcess"]: filename = batch_names[number] print("{}:\t {}".format(filename, result_dict)) else: diff --git a/deploy/shitu_index_manager/README.md b/deploy/shitu_index_manager/README.md new file mode 120000 index 0000000000000000000000000000000000000000..2e801b61cd70669dac3795e4e8ecb16ca2238b2a --- /dev/null +++ b/deploy/shitu_index_manager/README.md @@ -0,0 +1 @@ +../../docs/zh_CN/inference_deployment/shitu_gallery_manager.md \ No newline at end of file diff --git a/deploy/shitu_index_manager/index_manager.py b/deploy/shitu_index_manager/index_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..97e3eec561cf7a45476bd750624d721dfd85fdb9 --- /dev/null +++ b/deploy/shitu_index_manager/index_manager.py @@ -0,0 +1,349 @@ +# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import sys +from PyQt5 import QtCore, QtGui, QtWidgets +import mod.mainwindow + +from paddleclas.deploy.utils import config, logger +from paddleclas.deploy.python.predict_rec import RecPredictor +from fastapi import FastAPI +import uvicorn +import numpy as np +import faiss +from typing import List +import pickle +import cv2 +import socket +import json +import operator +from multiprocessing import Process +""" +完整的index库如下: +root_path/ # 库存储目录 +|-- image_list.txt # 图像列表,每行:image_path label。由前端生成及修改。后端只读 +|-- features.pkl # 建库之后,保存的embedding向量,后端生成,前端无需操作 +|-- images # 图像存储目录,由前端生成及增删查等操作。后端只读 +| |-- md5.jpg +| |-- md5.jpg +| |-- …… +|-- index # 真正的生成的index库存储目录,后端生成及操作,前端无需操作。 +| |-- vector.index # faiss生成的索引库 +| |-- id_map.pkl # 索引文件 +""" + + +class ShiTuIndexManager(object): + + def __init__(self, config): + self.root_path = None + self.image_list_path = "image_list.txt" + self.image_dir = "images" + self.index_path = "index/vector.index" + self.id_map_path = "index/id_map.pkl" + self.features_path = "features.pkl" + self.index = None + self.id_map = None + self.features = None + self.config = config + self.predictor = RecPredictor(config) + + def _load_pickle(self, path): + if os.path.exists(path): + return pickle.load(open(path, 'rb')) + else: + return None + + def _save_pickle(self, path, data): + if not os.path.exists(os.path.dirname(path)): + os.makedirs(os.path.dirname(path), exist_ok=True) + with open(path, 'wb') as fd: + pickle.dump(data, fd) + + def _load_index(self): + self.index = faiss.read_index( + os.path.join(self.root_path, self.index_path)) + self.id_map = self._load_pickle( + os.path.join(self.root_path, self.id_map_path)) + self.features = self._load_pickle( + os.path.join(self.root_path, self.features_path)) + + def _save_index(self, index, id_map, features): + faiss.write_index(index, os.path.join(self.root_path, self.index_path)) + self._save_pickle(os.path.join(self.root_path, self.id_map_path), + id_map) + self._save_pickle(os.path.join(self.root_path, self.features_path), + features) + + def _update_path(self, root_path, image_list_path=None): + if root_path == self.root_path: + pass + else: + self.root_path = root_path + if not os.path.exists(os.path.join(root_path, "index")): + os.mkdir(os.path.join(root_path, "index")) + if image_list_path is not None: + self.image_list_path = image_list_path + + def _cal_featrue(self, image_list): + batch_images = [] + featrures = None + cnt = 0 + for idx, image_path in enumerate(image_list): + image = cv2.imread(image_path) + if image is None: + return "{} is broken or not exist. Stop" + else: + image = image[:, :, ::-1] + batch_images.append(image) + cnt += 1 + if cnt % self.config["Global"]["batch_size"] == 0 or ( + idx + 1) == len(image_list): + if len(batch_images) == 0: + continue + batch_results = self.predictor.predict(batch_images) + featrures = batch_results if featrures is None else np.concatenate( + (featrures, batch_results), axis=0) + batch_images = [] + return featrures + + def _split_datafile(self, data_file, image_root): + ''' + data_file: image path and info, which can be splitted by spacer + image_root: image path root + delimiter: delimiter + ''' + gallery_images = [] + gallery_docs = [] + gallery_ids = [] + with open(data_file, 'r', encoding='utf-8') as f: + lines = f.readlines() + for _, ori_line in enumerate(lines): + line = ori_line.strip().split() + text_num = len(line) + assert text_num >= 2, f"line({ori_line}) must be splitted into at least 2 parts, but got {text_num}" + image_file = os.path.join(image_root, line[0]) + + gallery_images.append(image_file) + gallery_docs.append(ori_line.strip()) + gallery_ids.append(os.path.basename(line[0]).split(".")[0]) + + return gallery_images, gallery_docs, gallery_ids + + def create_index(self, + image_list: str, + index_method: str = "HNSW32", + image_root: str = None): + if not os.path.exists(image_list): + return "{} is not exist".format(image_list) + if index_method.lower() not in ['hnsw32', 'ivf', 'flat']: + return "The index method Only support: HNSW32, IVF, Flat" + self._update_path(os.path.dirname(image_list), image_list) + + # get image_paths + image_root = image_root if image_root is not None else self.root_path + gallery_images, gallery_docs, image_ids = self._split_datafile( + image_list, image_root) + + # gernerate index + if index_method == "IVF": + index_method = index_method + str( + min(max(int(len(gallery_images) // 32), 2), 65536)) + ",Flat" + index = faiss.index_factory( + self.config["IndexProcess"]["embedding_size"], index_method, + faiss.METRIC_INNER_PRODUCT) + self.index = faiss.IndexIDMap2(index) + features = self._cal_featrue(gallery_images) + self.index.train(features) + index_ids = np.arange(0, len(gallery_images)).astype(np.int64) + self.index.add_with_ids(features, index_ids) + + self.id_map = dict() + for i, d in zip(list(index_ids), gallery_docs): + self.id_map[i] = d + + self.features = { + "features": features, + "index_method": index_method, + "image_ids": image_ids, + "index_ids": index_ids.tolist() + } + self._save_index(self.index, self.id_map, self.features) + + def open_index(self, root_path: str, image_list_path: str) -> str: + self._update_path(root_path) + _, _, image_ids = self._split_datafile(image_list_path, root_path) + if os.path.exists(os.path.join(self.root_path, self.index_path)) and \ + os.path.exists(os.path.join(self.root_path, self.id_map_path)) and \ + os.path.exists(os.path.join(self.root_path, self.features_path)): + self._update_path(root_path) + self._load_index() + if operator.eq(set(image_ids), set(self.features['image_ids'])): + return "" + else: + return "The image list is different from index, Please update index" + else: + return "File not exist: features.pkl, vector.index, id_map.pkl" + + def update_index(self, image_list: str, image_root: str = None) -> str: + if self.index and self.id_map and self.features: + image_paths, image_docs, image_ids = self._split_datafile( + image_list, + image_root if image_root is not None else self.root_path) + + # for add image + add_ids = list( + set(image_ids).difference(set(self.features["image_ids"]))) + add_indexes = [i for i, x in enumerate(image_ids) if x in add_ids] + add_image_paths = [image_paths[i] for i in add_indexes] + add_image_docs = [image_docs[i] for i in add_indexes] + add_image_ids = [image_ids[i] for i in add_indexes] + self._add_index(add_image_paths, add_image_docs, add_image_ids) + + # delete images + delete_ids = list( + set(self.features["image_ids"]).difference(set(image_ids))) + self._delete_index(delete_ids) + self._save_index(self.index, self.id_map, self.features) + return "" + else: + return "Failed. Please create or open index first" + + def _add_index(self, image_list: List, image_docs: List, image_ids: List): + if len(image_ids) == 0: + return + featrures = self._cal_featrue(image_list) + index_ids = (np.arange(0, len(image_list)) + max(self.id_map.keys()) + + 1).astype(np.int64) + self.index.add_with_ids(featrures, index_ids) + + for i, d in zip(index_ids, image_docs): + self.id_map[i] = d + + self.features['features'] = np.concatenate( + [self.features['features'], featrures], axis=0) + self.features['image_ids'].extend(image_ids) + self.features['index_ids'].extend(index_ids.tolist()) + + def _delete_index(self, image_ids: List): + if len(image_ids) == 0: + return + indexes = [ + i for i, x in enumerate(self.features['image_ids']) + if x in image_ids + ] + self.features["features"] = np.delete(self.features["features"], + indexes, + axis=0) + self.features["image_ids"] = np.delete(np.asarray( + self.features["image_ids"]), + indexes, + axis=0).tolist() + index_ids = np.delete(np.asarray(self.features["index_ids"]), + indexes, + axis=0).tolist() + id_map_values = [self.id_map[i] for i in index_ids] + self.index.reset() + ids = np.arange(0, len(id_map_values)).astype(np.int64) + self.index.add_with_ids(self.features['features'], ids) + self.id_map.clear() + for i, d in zip(ids, id_map_values): + self.id_map[i] = d + self.features["index_ids"] = ids + + +app = FastAPI() + + +@app.get("/new_index") +def new_index(image_list_path: str, + index_method: str = "HNSW32", + index_root_path: str = None, + force: bool = False): + result = "" + try: + if index_root_path is not None: + image_list_path = os.path.join(index_root_path, image_list_path) + index_path = os.path.join(index_root_path, "index", "vector.index") + id_map_path = os.path.join(index_root_path, "index", "id_map.pkl") + + if not (os.path.exists(index_path) + and os.path.exists(id_map_path)) or force: + manager.create_index(image_list_path, index_method, index_root_path) + else: + result = "There alrealy has index in {}".format(index_root_path) + except Exception as e: + result = e.__str__() + data = {"error_message": result} + return json.dumps(data).encode() + + +@app.get("/open_index") +def open_index(index_root_path: str, image_list_path: str): + result = "" + try: + image_list_path = os.path.join(index_root_path, image_list_path) + result = manager.open_index(index_root_path, image_list_path) + except Exception as e: + result = e.__str__() + + data = {"error_message": result} + return json.dumps(data).encode() + + +@app.get("/update_index") +def update_index(image_list_path: str, index_root_path: str = None): + result = "" + try: + if index_root_path is not None: + image_list_path = os.path.join(index_root_path, image_list_path) + result = manager.update_index(image_list=image_list_path, + image_root=index_root_path) + except Exception as e: + result = e.__str__() + data = {"error_message": result} + return json.dumps(data).encode() + + +def FrontInterface(server_process=None): + front = QtWidgets.QApplication([]) + main_window = mod.mainwindow.MainWindow(process=server_process) + main_window.showMaximized() + sys.exit(front.exec_()) + + +def Server(args): + [app, host, port] = args + uvicorn.run(app, host=host, port=port) + + +if __name__ == '__main__': + args = config.parse_args() + model_config = config.get_config(args.config, + overrides=args.override, + show=True) + manager = ShiTuIndexManager(model_config) + try: + ip = socket.gethostbyname(socket.gethostname()) + except: + ip = '127.0.0.1' + port = 8000 + p_server = Process(target=Server, args=([app, ip, port],)) + p_server.start() + # p_client = Process(target=FrontInterface, args=()) + # p_client.start() + # p_client.join() + FrontInterface(p_server) + p_server.terminate() + sys.exit(0) diff --git a/deploy/shitu_index_manager/mod/__init__.py b/deploy/shitu_index_manager/mod/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/deploy/shitu_index_manager/mod/classify_ui_context.py b/deploy/shitu_index_manager/mod/classify_ui_context.py new file mode 100644 index 0000000000000000000000000000000000000000..4141878856f6cacdf7261bfb4017f2e87e614c19 --- /dev/null +++ b/deploy/shitu_index_manager/mod/classify_ui_context.py @@ -0,0 +1,144 @@ +import os + +from PyQt5 import QtCore, QtWidgets +from mod import image_list_manager as imglistmgr +from mod import utils +from mod import ui_addclassifydialog +from mod import ui_renameclassifydialog + + +class ClassifyUiContext(QtCore.QObject): + # 分类界面相关业务 + selected = QtCore.pyqtSignal(str) # 选择分类信号 + + def __init__(self, ui: QtWidgets.QListView, parent: QtWidgets.QMainWindow, + image_list_mgr: imglistmgr.ImageListManager): + super(ClassifyUiContext, self).__init__() + self.__ui = ui + self.__parent = parent + self.__imageListMgr = image_list_mgr + self.__menu = QtWidgets.QMenu() + self.__initMenu() + self.__initUi() + self.__connectSignal() + + @property + def ui(self): + return self.__ui + + @property + def parent(self): + return self.__parent + + @property + def imageListManager(self): + return self.__imageListMgr + + @property + def menu(self): + return self.__menu + + def __initUi(self): + """初始化分类界面""" + self.__ui.setEditTriggers(QtWidgets.QAbstractItemView.NoEditTriggers) + + def __connectSignal(self): + """连接信号""" + self.__ui.clicked.connect(self.uiClicked) + self.__ui.doubleClicked.connect(self.uiDoubleClicked) + + def __initMenu(self): + """初始化分类界面菜单""" + utils.setMenu(self.__menu, "添加分类", self.addClassify) + utils.setMenu(self.__menu, "移除分类", self.removeClassify) + utils.setMenu(self.__menu, "重命名分类", self.renemeClassify) + + self.__ui.setContextMenuPolicy(QtCore.Qt.CustomContextMenu) + self.__ui.customContextMenuRequested.connect(self.__showMenu) + + def __showMenu(self, pos): + """显示分类界面菜单""" + if len(self.__imageListMgr.filePath) > 0: + self.__menu.exec_(self.__ui.mapToGlobal(pos)) + + def setClassifyList(self, classify_list): + """设置分类列表""" + list_model = QtCore.QStringListModel(classify_list) + self.__ui.setModel(list_model) + + def uiClicked(self, index): + """分类列表点击""" + if not self.__ui.currentIndex().isValid(): + return + txt = index.data() + self.selected.emit(txt) + + def uiDoubleClicked(self, index): + """分类列表双击""" + if not self.__ui.currentIndex().isValid(): + return + ole_name = index.data() + dlg = QtWidgets.QDialog(parent=self.parent) + ui = ui_renameclassifydialog.Ui_RenameClassifyDialog() + ui.setupUi(dlg) + ui.oldNameLineEdit.setText(ole_name) + result = dlg.exec_() + new_name = ui.newNameLineEdit.text() + if result == QtWidgets.QDialog.Accepted: + mgr_result = self.__imageListMgr.renameClassify(ole_name, new_name) + if not mgr_result: + QtWidgets.QMessageBox.warning(self.parent, "重命名分类", "重命名分类错误") + else: + self.setClassifyList(self.__imageListMgr.classifyList) + self.__imageListMgr.writeFile() + + def addClassify(self): + """添加分类""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self.__parent, "提示", + "请先打开正确的图像库") + return + dlg = QtWidgets.QDialog(parent=self.parent) + ui = ui_addclassifydialog.Ui_AddClassifyDialog() + ui.setupUi(dlg) + result = dlg.exec_() + txt = ui.lineEdit.text() + if result == QtWidgets.QDialog.Accepted: + mgr_result = self.__imageListMgr.addClassify(txt) + if not mgr_result: + QtWidgets.QMessageBox.warning(self.parent, "添加分类", "添加分类错误") + else: + self.setClassifyList(self.__imageListMgr.classifyList) + + def removeClassify(self): + """移除分类""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self.__parent, "提示", + "请先打开正确的图像库") + return + if not self.__ui.currentIndex().isValid(): + return + classify = self.__ui.currentIndex().data() + result = QtWidgets.QMessageBox.information( + self.parent, + "移除分类", + "确定移除分类: {}".format(classify), + buttons=QtWidgets.QMessageBox.Ok | QtWidgets.QMessageBox.Cancel, + defaultButton=QtWidgets.QMessageBox.Cancel) + if result == QtWidgets.QMessageBox.Ok: + if len(self.__imageListMgr.imageList(classify)) > 0: + QtWidgets.QMessageBox.warning(self.parent, "移除分类", + "分类下存在图片,请先移除图片") + else: + self.__imageListMgr.removeClassify(classify) + self.setClassifyList(self.__imageListMgr.classifyList) + + def renemeClassify(self): + """重命名分类""" + idx = self.__ui.currentIndex() + if idx.isValid(): + self.uiDoubleClicked(idx) + + def searchClassify(self, classify): + """查找分类""" + self.setClassifyList(self.__imageListMgr.findLikeClassify(classify)) diff --git a/deploy/shitu_index_manager/mod/image_list_manager.py b/deploy/shitu_index_manager/mod/image_list_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..6a662114691115612b92cf2f0a4cd391f13bc181 --- /dev/null +++ b/deploy/shitu_index_manager/mod/image_list_manager.py @@ -0,0 +1,236 @@ +import os + + +class ImageListManager: + """ + 图像列表文件管理器 + """ + def __init__(self, file_path="", encoding="utf-8"): + self.__filePath = "" + self.__dirName = "" + self.__dataList = {} + self.__findLikeClassifyResult = [] + if file_path != "": + self.readFile(file_path, encoding) + + @property + def filePath(self): + return self.__filePath + + @property + def dirName(self): + return self.__dirName + + @dirName.setter + def dirName(self, value): + self.__dirName = value + + @property + def dataList(self): + return self.__dataList + + @property + def classifyList(self): + return self.__dataList.keys() + + @property + def findLikeClassifyResult(self): + return self.__findLikeClassifyResult + + def imageList(self, classify: str): + """ + 获取分类下的图片列表 + + Args: + classify (str): 分类名称 + + Returns: + list: 图片列表 + """ + return self.__dataList[classify] + + def readFile(self, file_path: str, encoding="utf-8"): + """ + 读取文件内容 + + Args: + file_path (str): 文件路径 + encoding (str, optional): 文件编码. 默认 "utf-8". + + Raises: + Exception: 文件不存在 + """ + if not os.path.exists(file_path): + raise Exception("文件不存在:{}".format(file_path)) + self.__filePath = file_path + self.__dirName = os.path.dirname(self.__filePath) + self.__readData(file_path, encoding) + + def __readData(self, file_path: str, encoding="utf-8"): + """ + 读取文件内容 + + Args: + file_path (str): 文件路径 + encoding (str, optional): 文件编码. 默认 "utf-8". + """ + with open(file_path, "r", encoding=encoding) as f: + self.__dataList.clear() + for line in f: + line = line.rstrip("\n") + data = line.split("\t") + self.__appendData(data) + + def __appendData(self, data: list): + """ + 添加数据 + + Args: + data (list): 数据 + """ + if data[1] not in self.__dataList: + self.__dataList[data[1]] = [] + self.__dataList[data[1]].append(data[0]) + + def writeFile(self, file_path="", encoding="utf-8"): + """ + 写入文件 + + Args: + file_path (str, optional): 文件路径. 默认 "". + encoding (str, optional): 文件编码. 默认 "utf-8". + """ + if file_path == "": + file_path = self.__filePath + if not os.path.exists(file_path): + return False + self.__dirName = os.path.dirname(self.__filePath) + lines = [] + for classify in self.__dataList.keys(): + for path in self.__dataList[classify]: + lines.append("{}\t{}\n".format(path, classify)) + with open(file_path, "w", encoding=encoding) as f: + f.writelines(lines) + return True + + def realPath(self, image_path: str): + """ + 获取真实路径 + + Args: + image_path (str): 图片路径 + """ + return os.path.join(self.__dirName, image_path) + + def realPathList(self, classify: str): + """ + 获取分类下的真实路径列表 + + Args: + classify (str): 分类名称 + + Returns: + list: 真实路径列表 + """ + if classify not in self.classifyList: + return [] + paths = self.__dataList[classify] + if len(paths) == 0: + return [] + for i in range(len(paths)): + paths[i] = os.path.join(self.__dirName, paths[i]) + return paths + + def findLikeClassify(self, name: str): + """ + 查找类似的分类名称 + + Args: + name (str): 分类名称 + + Returns: + list: 类似的分类名称列表 + """ + self.__findLikeClassifyResult.clear() + for classify in self.__dataList.keys(): + word = str(name) + if (word in classify): + self.__findLikeClassifyResult.append(classify) + return self.__findLikeClassifyResult + + def addClassify(self, classify: str): + """ + 添加分类 + + Args: + classify (str): 分类名称 + + Returns: + bool: 如果分类名称已经存在,返回False,否则添加分类并返回True + """ + if classify in self.__dataList: + return False + self.__dataList[classify] = [] + return True + + def removeClassify(self, classify: str): + """ + 移除分类 + + Args: + classify (str): 分类名称 + + Returns: + bool: 如果分类名称不存在,返回False,否则移除分类并返回True + """ + if classify not in self.__dataList: + return False + self.__dataList.pop(classify) + return True + + def renameClassify(self, old_classify: str, new_classify: str): + """ + 重命名分类名称 + + Args: + old_classify (str): 原分类名称 + new_classify (str): 新分类名称 + + Returns: + bool: 如果原分类名称不存在,或者新分类名称已经存在,返回False,否则重命名分类名称并返回True + """ + if old_classify not in self.__dataList: + return False + if new_classify in self.__dataList: + return False + self.__dataList[new_classify] = self.__dataList[old_classify] + self.__dataList.pop(old_classify) + return True + + def allClassfiyNotEmpty(self): + """ + 检查所有分类是否都有图片 + + Returns: + bool: 如果有一个分类没有图片,返回False,否则返回True + """ + for classify in self.__dataList.keys(): + if len(self.__dataList[classify]) == 0: + return False + return True + + def resetImageList(self, classify: str, image_list: list): + """ + 重置图片列表 + + Args: + classify (str): 分类名称 + image_list (list): 图片相对路径列表 + + Returns: + bool: 如果分类名称不存在,返回False,否则重置图片列表并返回True + """ + if classify not in self.__dataList: + return False + self.__dataList[classify] = image_list + return True diff --git a/deploy/shitu_index_manager/mod/image_list_ui_context.py b/deploy/shitu_index_manager/mod/image_list_ui_context.py new file mode 100644 index 0000000000000000000000000000000000000000..6d5206194a79137ebff7d7d0f5d61c28ba300bb5 --- /dev/null +++ b/deploy/shitu_index_manager/mod/image_list_ui_context.py @@ -0,0 +1,231 @@ +import os +from stat import filemode + +from PyQt5 import QtCore, QtGui, QtWidgets +from mod import image_list_manager as imglistmgr +from mod import utils +from mod import ui_renameclassifydialog +from mod import imageeditclassifydialog + +# 图像缩放基数 +BASE_IMAGE_SIZE = 64 + + +class ImageListUiContext(QtCore.QObject): + # 图片列表界面相关业务,style sheet 在 MainWindow.ui 相应的 ImageListWidget 中设置 + listCount = QtCore.pyqtSignal(int) # 图像列表图像的数量 + selectedCount = QtCore.pyqtSignal(int) # 图像列表选择图像的数量 + + def __init__(self, ui: QtWidgets.QListWidget, + parent: QtWidgets.QMainWindow, + image_list_mgr: imglistmgr.ImageListManager): + super(ImageListUiContext, self).__init__() + self.__ui = ui + self.__parent = parent + self.__imageListMgr = image_list_mgr + self.__initUi() + self.__menu = QtWidgets.QMenu() + self.__initMenu() + self.__connectSignal() + self.__selectedClassify = "" + self.__imageScale = 1 + + @property + def ui(self): + return self.__ui + + @property + def parent(self): + return self.__parent + + @property + def imageListManager(self): + return self.__imageListMgr + + @property + def menu(self): + return self.__menu + + def __initUi(self): + """初始化图片列表样式""" + self.__ui.setViewMode(QtWidgets.QListView.IconMode) + self.__ui.setSpacing(15) + self.__ui.setMovement(QtWidgets.QListView.Static) + self.__ui.setSelectionMode( + QtWidgets.QAbstractItemView.ExtendedSelection) + + def __initMenu(self): + """初始化图片列表界面菜单""" + utils.setMenu(self.__menu, "添加图片", self.addImage) + utils.setMenu(self.__menu, "移除图片", self.removeImage) + utils.setMenu(self.__menu, "编辑图片分类", self.editImageClassify) + self.__menu.addSeparator() + utils.setMenu(self.__menu, "选择全部图片", self.selectAllImage) + utils.setMenu(self.__menu, "反向选择图片", self.reverseSelectImage) + utils.setMenu(self.__menu, "取消选择图片", self.cancelSelectImage) + + self.__ui.setContextMenuPolicy(QtCore.Qt.CustomContextMenu) + self.__ui.customContextMenuRequested.connect(self.__showMenu) + + def __showMenu(self, pos): + """显示图片列表界面菜单""" + if len(self.__imageListMgr.filePath) > 0: + self.__menu.exec_(self.__ui.mapToGlobal(pos)) + + def __connectSignal(self): + """连接信号与槽""" + self.__ui.itemSelectionChanged.connect(self.onSelectionChanged) + + def setImageScale(self, scale: int): + """设置图片大小""" + self.__imageScale = scale + size = QtCore.QSize(scale * BASE_IMAGE_SIZE, scale * BASE_IMAGE_SIZE) + self.__ui.setIconSize(size) + for i in range(self.__ui.count()): + item = self.__ui.item(i) + item.setSizeHint(size) + + def setImageList(self, classify: str): + """设置图片列表""" + size = QtCore.QSize(self.__imageScale * BASE_IMAGE_SIZE, + self.__imageScale * BASE_IMAGE_SIZE) + self.__selectedClassify = classify + image_list = self.__imageListMgr.imageList(classify) + self.__ui.clear() + count = 0 + for i in image_list: + item = QtWidgets.QListWidgetItem(self.__ui) + item.setIcon(QtGui.QIcon(self.__imageListMgr.realPath(i))) + item.setData(QtCore.Qt.UserRole, i) + item.setSizeHint(size) + self.__ui.addItem(item) + count += 1 + self.listCount.emit(count) + + def clear(self): + """清除图片列表""" + self.__ui.clear() + + def addImage(self): + """添加图片""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self.__parent, "提示", + "请先打开正确的图像库") + return + filter = "图片 (*.png *.jpg *.jpeg *.PNG *.JPG *.JPEG);;所有文件(*.*)" + dlg = QtWidgets.QFileDialog(self.__parent) + dlg.setFileMode(QtWidgets.QFileDialog.ExistingFiles) # 多选文件 + dlg.setViewMode(QtWidgets.QFileDialog.Detail) # 详细模式 + file_paths = dlg.getOpenFileNames(filter=filter)[0] + if len(file_paths) == 0: + return + image_list_dir = self.__imageListMgr.dirName + file_list = [] + for path in file_paths: + if not os.path.exists(path): + continue + new_file = self.__copyToImagesDir(path) + if new_file != "" and image_list_dir in new_file: + # 去掉 image_list_dir 的路径和斜杠 + begin = len(image_list_dir) + 1 + file_list.append(new_file[begin:]) + if len(file_list) > 0: + if self.__selectedClassify == "": + QtWidgets.QMessageBox.warning(self.__parent, "提示", "请先选择分类") + return + new_list = self.__imageListMgr.imageList( + self.__selectedClassify) + file_list + self.__imageListMgr.resetImageList(self.__selectedClassify, + new_list) + self.setImageList(self.__selectedClassify) + self.__imageListMgr.writeFile() + + def __copyToImagesDir(self, image_path: str): + md5 = utils.fileMD5(image_path) + file_ext = utils.fileExtension(image_path) + to_dir = os.path.join(self.__imageListMgr.dirName, "images") + new_path = os.path.join(to_dir, md5 + file_ext) + if os.path.exists(to_dir): + utils.copyFile(image_path, new_path) + return new_path + else: + return "" + + def removeImage(self): + """移除图片""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self.__parent, "提示", + "请先打开正确的图像库") + return + path_list = [] + image_list = self.__ui.selectedItems() + if len(image_list) == 0: + return + question = QtWidgets.QMessageBox.question(self.__parent, "移除图片", + "确定移除所选图片吗?") + if question == QtWidgets.QMessageBox.No: + return + for i in range(self.__ui.count()): + item = self.__ui.item(i) + img_path = item.data(QtCore.Qt.UserRole) + if not item.isSelected(): + path_list.append(img_path) + else: + # 从磁盘上删除图片 + utils.removeFile( + os.path.join(self.__imageListMgr.dirName, img_path)) + self.__imageListMgr.resetImageList(self.__selectedClassify, path_list) + self.setImageList(self.__selectedClassify) + self.__imageListMgr.writeFile() + + def editImageClassify(self): + """编辑图片分类""" + old_classify = self.__selectedClassify + dlg = imageeditclassifydialog.ImageEditClassifyDialog( + parent=self.__parent, + old_classify=old_classify, + classify_list=self.__imageListMgr.classifyList) + result = dlg.exec_() + new_classify = dlg.newClassify + if result == QtWidgets.QDialog.Accepted \ + and new_classify != old_classify \ + and new_classify != "": + self.__moveImage(old_classify, new_classify) + self.__imageListMgr.writeFile() + + def __moveImage(self, old_classify, new_classify): + """移动图片""" + keep_list = [] + is_selected = False + move_list = self.__imageListMgr.imageList(new_classify) + for i in range(self.__ui.count()): + item = self.__ui.item(i) + txt = item.data(QtCore.Qt.UserRole) + if item.isSelected(): + move_list.append(txt) + is_selected = True + else: + keep_list.append(txt) + if is_selected: + self.__imageListMgr.resetImageList(new_classify, move_list) + self.__imageListMgr.resetImageList(old_classify, keep_list) + self.setImageList(old_classify) + + def selectAllImage(self): + """选择所有图片""" + self.__ui.selectAll() + + def reverseSelectImage(self): + """反向选择图片""" + for i in range(self.__ui.count()): + item = self.__ui.item(i) + item.setSelected(not item.isSelected()) + + def cancelSelectImage(self): + """取消选择图片""" + self.__ui.clearSelection() + + def onSelectionChanged(self): + """选择图像该变,发送选择的数量信号""" + count = len(self.__ui.selectedItems()) + self.selectedCount.emit(count) diff --git a/deploy/shitu_index_manager/mod/imageeditclassifydialog.py b/deploy/shitu_index_manager/mod/imageeditclassifydialog.py new file mode 100644 index 0000000000000000000000000000000000000000..250fe217a9473c1ceb5b62a0023160660c7bdeb2 --- /dev/null +++ b/deploy/shitu_index_manager/mod/imageeditclassifydialog.py @@ -0,0 +1,52 @@ +import os +from PyQt5 import QtCore, QtGui, QtWidgets +from mod import image_list_manager +from mod import ui_imageeditclassifydialog +from mod import utils + + +class ImageEditClassifyDialog(QtWidgets.QDialog): + """图像编辑分类对话框""" + def __init__(self, parent, old_classify, classify_list): + super(ImageEditClassifyDialog, self).__init__(parent) + self.ui = ui_imageeditclassifydialog.Ui_Dialog() + self.ui.setupUi(self) # 初始化主窗口界面 + self.__oldClassify = old_classify + self.__classifyList = classify_list + self.__newClassify = "" + self.__searchResult = [] + self.__initUi() + self.__connectSignal() + + @property + def newClassify(self): + return self.__newClassify + + def __initUi(self): + self.ui.oldLineEdit.setText(self.__oldClassify) + self.__setClassifyList(self.__classifyList) + self.ui.classifyListView.setEditTriggers( + QtWidgets.QAbstractItemView.NoEditTriggers) + + def __connectSignal(self): + self.ui.classifyListView.clicked.connect(self.selectedListView) + self.ui.searchButton.clicked.connect(self.searchClassify) + + def __setClassifyList(self, classify_list): + list_model = QtCore.QStringListModel(classify_list) + self.ui.classifyListView.setModel(list_model) + + def selectedListView(self, index): + if not self.ui.classifyListView.currentIndex().isValid(): + return + txt = index.data() + self.ui.newLineEdit.setText(txt) + self.__newClassify = txt + + def searchClassify(self): + txt = self.ui.searchWordLineEdit.text() + self.__searchResult.clear() + for classify in self.__classifyList: + if txt in classify: + self.__searchResult.append(classify) + self.__setClassifyList(self.__searchResult) diff --git a/deploy/shitu_index_manager/mod/index_http_client.py b/deploy/shitu_index_manager/mod/index_http_client.py new file mode 100644 index 0000000000000000000000000000000000000000..6b9353e22150b062c105eb2ae0ea4a322657d001 --- /dev/null +++ b/deploy/shitu_index_manager/mod/index_http_client.py @@ -0,0 +1,60 @@ +import json +import os +import urllib3 +import urllib.parse + + +class IndexHttpClient(): + """索引库客户端,使用 urllib3 连接,使用 urllib.parse 进行 url 编码""" + def __init__(self, host: str, port: int): + self.__host = host + self.__port = port + self.__http = urllib3.PoolManager() + self.__headers = {"Content-type": "application/json"} + + def url(self): + return "http://{}:{}".format(self.__host, self.__port) + + def new_index(self, + image_list_path: str, + index_root_path: str, + index_method="HNSW32", + force=False): + """新建 重建 库""" + if index_method not in ["HNSW32", "FLAT", "IVF"]: + raise Exception( + "index_method 必须是 HNSW32, FLAT, IVF,实际值为:{}".format( + index_method)) + params = {"image_list_path":image_list_path, \ + "index_root_path":index_root_path, \ + "index_method":index_method, \ + "force":force} + return self.__post(self.url() + "/new_index?", params) + + def open_index(self, index_root_path: str, image_list_path: str): + """打开库""" + params = { + "index_root_path": index_root_path, + "image_list_path": image_list_path + } + return self.__post(self.url() + "/open_index?", params) + + def update_index(self, image_list_path: str, index_root_path: str): + """更新索引库""" + params = {"image_list_path":image_list_path, \ + "index_root_path":index_root_path} + return self.__post(self.url() + "/update_index?", params) + + def __post(self, url: str, params: dict): + """发送 url 并接收数据""" + http = self.__http + encode_params = urllib.parse.urlencode(params) + get_url = url + encode_params + req = http.request("GET", get_url, headers=self.__headers) + result = json.loads(req.data) + if isinstance(result, str): + result = eval(result) + msg = result["error_message"] + if msg != None and len(msg) == 0: + msg = None + return msg diff --git a/deploy/shitu_index_manager/mod/mainwindow.py b/deploy/shitu_index_manager/mod/mainwindow.py new file mode 100644 index 0000000000000000000000000000000000000000..40d11f6c480619b537cb0c738e99ede89a8fe50c --- /dev/null +++ b/deploy/shitu_index_manager/mod/mainwindow.py @@ -0,0 +1,492 @@ +from multiprocessing.dummy import active_children +from multiprocessing import Process +import os +import sys +import socket + +from PyQt5 import QtCore, QtGui, QtWidgets +from mod import ui_mainwindow +from mod import image_list_manager +from mod import classify_ui_context +from mod import image_list_ui_context +from mod import ui_newlibrarydialog +from mod import index_http_client +from mod import utils +from mod import ui_waitdialog +import threading + +TOOL_BTN_ICON_SIZE = 64 +TOOL_BTN_ICON_SMALL = 48 + +try: + DEFAULT_HOST = socket.gethostbyname(socket.gethostname()) +except: + DEFAULT_HOST = '127.0.0.1' + +# DEFAULT_HOST = "localhost" +DEFAULT_PORT = 8000 +PADDLECLAS_DOC_URL = "https://gitee.com/paddlepaddle/PaddleClas/docs/zh_CN/inference_deployment/shitu_gallery_manager.md" + + +class MainWindow(QtWidgets.QMainWindow): + """主窗口""" + newIndexMsg = QtCore.pyqtSignal(str) # 新建索引库线程信号 + openIndexMsg = QtCore.pyqtSignal(str) # 打开索引库线程信号 + updateIndexMsg = QtCore.pyqtSignal(str) # 更新索引库线程信号 + importImageCount = QtCore.pyqtSignal(int) # 导入图像数量信号 + + def __init__(self, process=None): + super(MainWindow, self).__init__() + self.server_process = process + self.ui = ui_mainwindow.Ui_MainWindow() + self.ui.setupUi(self) # 初始化主窗口界面 + + self.__imageListMgr = image_list_manager.ImageListManager() + + self.__appMenu = QtWidgets.QMenu() # 应用菜单 + self.__libraryAppendMenu = QtWidgets.QMenu() # 图像库附加功能菜单 + self.__initAppMenu() # 初始化应用菜单 + + self.__pathBar = QtWidgets.QLabel(self) # 路径 + self.__classifyCountBar = QtWidgets.QLabel(self) # 分类数量 + self.__imageCountBar = QtWidgets.QLabel(self) # 图像列表数量 + self.__imageSelectedBar = QtWidgets.QLabel(self) # 图像列表选择数量 + self.__spaceBar1 = QtWidgets.QLabel(self) # 空格间隔栏 + self.__spaceBar2 = QtWidgets.QLabel(self) # 空格间隔栏 + self.__spaceBar3 = QtWidgets.QLabel(self) # 空格间隔栏 + + # 分类界面相关业务 + self.__classifyUiContext = classify_ui_context.ClassifyUiContext( + ui=self.ui.classifyListView, + parent=self, + image_list_mgr=self.__imageListMgr) + + # 图片列表界面相关业务 + self.__imageListUiContext = image_list_ui_context.ImageListUiContext( + ui=self.ui.imageListWidget, + parent=self, + image_list_mgr=self.__imageListMgr) + + # 搜索的历史记录回车快捷键 + self.__historyCmbShortcut = QtWidgets.QShortcut( + QtGui.QKeySequence(QtCore.Qt.Key_Return), + self.ui.searchClassifyHistoryCmb) + + self.__waitDialog = QtWidgets.QDialog() # 等待对话框 + self.__waitDialogUi = ui_waitdialog.Ui_WaitDialog() # 等待对话框界面 + self.__initToolBtn() + self.__connectSignal() + self.__initUI() + self.__initWaitDialog() + + def __initUI(self): + """初始化界面""" + # 窗口图标 + self.setWindowIcon(QtGui.QIcon("./resource/app_icon.png")) + + # 初始化分割窗口 + self.ui.splitter.setStretchFactor(0, 20) + self.ui.splitter.setStretchFactor(1, 80) + + # 初始化图像缩放 + self.ui.imageScaleSlider.setValue(4) + + # 状态栏界面设置 + space_bar = " " # 间隔16空格 + self.__spaceBar1.setText(space_bar) + self.__spaceBar2.setText(space_bar) + self.__spaceBar3.setText(space_bar) + self.ui.statusbar.addWidget(self.__pathBar) + self.ui.statusbar.addWidget(self.__spaceBar1) + self.ui.statusbar.addWidget(self.__classifyCountBar) + self.ui.statusbar.addWidget(self.__spaceBar2) + self.ui.statusbar.addWidget(self.__imageCountBar) + self.ui.statusbar.addWidget(self.__spaceBar3) + self.ui.statusbar.addWidget(self.__imageSelectedBar) + + def __initToolBtn(self): + """初始化工具按钮""" + self.__setToolButton(self.ui.appMenuBtn, "应用菜单", + "./resource/app_menu.png", TOOL_BTN_ICON_SIZE) + + self.__setToolButton(self.ui.saveImageLibraryBtn, "保存图像库", + "./resource/save_image_Library.png", + TOOL_BTN_ICON_SIZE) + self.ui.saveImageLibraryBtn.clicked.connect(self.saveImageLibrary) + + self.__setToolButton(self.ui.addClassifyBtn, "添加分类", + "./resource/add_classify.png", + TOOL_BTN_ICON_SIZE) + self.ui.addClassifyBtn.clicked.connect( + self.__classifyUiContext.addClassify) + + self.__setToolButton(self.ui.removeClassifyBtn, "移除分类", + "./resource/remove_classify.png", + TOOL_BTN_ICON_SIZE) + self.ui.removeClassifyBtn.clicked.connect( + self.__classifyUiContext.removeClassify) + + self.__setToolButton(self.ui.searchClassifyBtn, "查找分类", + "./resource/search_classify.png", + TOOL_BTN_ICON_SMALL) + self.ui.searchClassifyBtn.clicked.connect( + self.__classifyUiContext.searchClassify) + + self.__setToolButton(self.ui.addImageBtn, "添加图片", + "./resource/add_image.png", TOOL_BTN_ICON_SMALL) + self.ui.addImageBtn.clicked.connect(self.__imageListUiContext.addImage) + + self.__setToolButton(self.ui.removeImageBtn, "移除图片", + "./resource/remove_image.png", + TOOL_BTN_ICON_SMALL) + self.ui.removeImageBtn.clicked.connect( + self.__imageListUiContext.removeImage) + + self.ui.searchClassifyHistoryCmb.setToolTip("查找分类历史") + self.ui.imageScaleSlider.setToolTip("图片缩放") + + def __setToolButton(self, button, tool_tip: str, icon_path: str, + icon_size: int): + """设置工具按钮""" + button.setToolTip(tool_tip) + button.setIcon(QtGui.QIcon(icon_path)) + button.setIconSize(QtCore.QSize(icon_size, icon_size)) + + def __initAppMenu(self): + """初始化应用菜单""" + utils.setMenu(self.__appMenu, "新建图像库", self.newImageLibrary) + utils.setMenu(self.__appMenu, "打开图像库", self.openImageLibrary) + utils.setMenu(self.__appMenu, "保存图像库", self.saveImageLibrary) + + self.__libraryAppendMenu.setTitle("导入图像") + utils.setMenu(self.__libraryAppendMenu, "导入 image_list 图像", + self.importImageListImage) + utils.setMenu(self.__libraryAppendMenu, "导入多文件夹图像", + self.importDirsImage) + self.__appMenu.addMenu(self.__libraryAppendMenu) + + self.__appMenu.addSeparator() + utils.setMenu(self.__appMenu, "新建/重建 索引库", self.newIndexLibrary) + utils.setMenu(self.__appMenu, "更新索引库", self.updateIndexLibrary) + self.__appMenu.addSeparator() + utils.setMenu(self.__appMenu, "帮助", self.showHelp) + utils.setMenu(self.__appMenu, "关于", self.showAbout) + utils.setMenu(self.__appMenu, "退出", self.exitApp) + + self.ui.appMenuBtn.setMenu(self.__appMenu) + self.ui.appMenuBtn.setPopupMode(QtWidgets.QToolButton.InstantPopup) + + def __initWaitDialog(self): + """初始化等待对话框""" + self.__waitDialogUi.setupUi(self.__waitDialog) + self.__waitDialog.setWindowFlags(QtCore.Qt.Dialog + | QtCore.Qt.FramelessWindowHint) + + def __startWait(self, msg: str): + """开始显示等待对话框""" + self.setEnabled(False) + self.__waitDialogUi.msgLabel.setText(msg) + self.__waitDialog.setWindowFlags(QtCore.Qt.Dialog + | QtCore.Qt.FramelessWindowHint + | QtCore.Qt.WindowStaysOnTopHint) + self.__waitDialog.show() + self.__waitDialog.repaint() + + def __stopWait(self): + """停止显示等待对话框""" + self.setEnabled(True) + self.__waitDialogUi.msgLabel.setText("执行完毕!") + self.__waitDialog.setWindowFlags(QtCore.Qt.Dialog + | QtCore.Qt.FramelessWindowHint + | QtCore.Qt.CustomizeWindowHint) + self.__waitDialog.close() + + def __connectSignal(self): + """连接信号与槽""" + self.__classifyUiContext.selected.connect( + self.__imageListUiContext.setImageList) + self.ui.searchClassifyBtn.clicked.connect(self.searchClassify) + self.ui.imageScaleSlider.valueChanged.connect( + self.__imageListUiContext.setImageScale) + self.__imageListUiContext.listCount.connect(self.__setImageCountBar) + self.__imageListUiContext.selectedCount.connect( + self.__setImageSelectedCountBar) + self.__historyCmbShortcut.activated.connect(self.searchClassify) + self.newIndexMsg.connect(self.__onNewIndexMsg) + self.openIndexMsg.connect(self.__onOpenIndexMsg) + self.updateIndexMsg.connect(self.__onUpdateIndexMsg) + self.importImageCount.connect(self.__onImportImageCount) + + def newImageLibrary(self): + """新建图像库""" + dir_path = self.__openDirDialog("新建图像库") + if dir_path == None: + return + if not utils.isEmptyDir(dir_path): + QtWidgets.QMessageBox.warning(self, "错误", "该目录不为空,请选择空目录") + return + if not utils.initLibrary(dir_path): + QtWidgets.QMessageBox.warning(self, "错误", "新建图像库失败") + return + QtWidgets.QMessageBox.information(self, "提示", "新建图像库成功") + self.__reload(os.path.join(dir_path, "image_list.txt"), dir_path) + + def __openDirDialog(self, title: str): + """打开目录对话框""" + dlg = QtWidgets.QFileDialog(self) + dlg.setWindowTitle(title) + dlg.setOption(QtWidgets.QFileDialog.ShowDirsOnly, True) + dlg.setFileMode(QtWidgets.QFileDialog.Directory) + dlg.setAcceptMode(QtWidgets.QFileDialog.AcceptOpen) + if dlg.exec_() == QtWidgets.QDialog.Accepted: + dir_path = dlg.selectedFiles()[0] + return dir_path + return None + + def openImageLibrary(self): + """打开图像库""" + dir_path = self.__openDirDialog("打开图像库") + if dir_path != None: + image_list_path = os.path.join(dir_path, "image_list.txt") + if os.path.exists(image_list_path) \ + and os.path.exists(os.path.join(dir_path, "images")): + self.__reload(image_list_path, dir_path) + self.openIndexLibrary() + + def __reload(self, image_list_path: str, msg: str): + """重新加载图像库""" + self.__imageListMgr.readFile(image_list_path) + self.__imageListUiContext.clear() + self.__classifyUiContext.setClassifyList( + self.__imageListMgr.classifyList) + self.__setPathBar(msg) + self.__setClassifyCountBar(len(self.__imageListMgr.classifyList)) + self.__setImageCountBar(0) + self.__setImageSelectedCountBar(0) + + def saveImageLibrary(self): + """保存图像库""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.warning(self, "错误", "请先打开正确的图像库") + return + self.__imageListMgr.writeFile() + self.__reload(self.__imageListMgr.filePath, + self.__imageListMgr.dirName) + hint_str = "为保证图片准确识别,请在修改图片库后更新索引库。\n\ +如果是新建图像库或者没有索引库,请新建索引库。" + + QtWidgets.QMessageBox.information(self, "提示", hint_str) + + def __onImportImageCount(self, count: int): + """导入图像槽""" + self.__stopWait() + if count == -1: + QtWidgets.QMessageBox.warning(self, "错误", "导入到当前图像库错误") + return + QtWidgets.QMessageBox.information(self, "提示", + "导入图像库成功,导入图像:{}".format(count)) + self.__reload(self.__imageListMgr.filePath, + self.__imageListMgr.dirName) + + def __importImageListImageThread(self, from_path: str, to_path: str): + """导入 image_list 图像 线程""" + count = utils.oneKeyImportFromFile(from_path=from_path, + to_path=to_path) + if count == None: + count = -1 + self.importImageCount.emit(count) + + def importImageListImage(self): + """导入 image_list 图像 到当前图像库,建议当前库是新建的空库""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库") + return + from_path = QtWidgets.QFileDialog.getOpenFileName( + caption="导入 image_list 图像", filter="txt (*.txt)")[0] + if not os.path.exists(from_path): + QtWidgets.QMessageBox.information(self, "提示", "打开的文件不存在") + return + from_mgr = image_list_manager.ImageListManager(from_path) + self.__startWait("正在导入图像,请等待。。。") + thread = threading.Thread(target=self.__importImageListImageThread, + args=(from_mgr.filePath, + self.__imageListMgr.filePath)) + thread.start() + + def __importDirsImageThread(self, from_dir: str, to_image_list_path: str): + """导入多文件夹图像 线程""" + count = utils.oneKeyImportFromDirs( + from_dir=from_dir, to_image_list_path=to_image_list_path) + if count == None: + count = -1 + self.importImageCount.emit(count) + + def importDirsImage(self): + """导入 多文件夹图像 到当前图像库,建议当前库是新建的空库""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库") + return + dir_path = self.__openDirDialog("导入多文件夹图像") + if dir_path == None: + return + if not os.path.exists(dir_path): + QtWidgets.QMessageBox.information(self, "提示", "打开的目录不存在") + return + self.__startWait("正在导入图像,请等待。。。") + thread = threading.Thread(target=self.__importDirsImageThread, + args=(dir_path, + self.__imageListMgr.filePath)) + thread.start() + + def __newIndexThread(self, index_root_path: str, image_list_path: str, + index_method: str, force: bool): + """新建重建索引库线程""" + try: + client = index_http_client.IndexHttpClient( + DEFAULT_HOST, DEFAULT_PORT) + err_msg = client.new_index(image_list_path=image_list_path, + index_root_path=index_root_path, + index_method=index_method, + force=force) + if err_msg == None: + err_msg = "" + self.newIndexMsg.emit(err_msg) + except Exception as e: + self.newIndexMsg.emit(str(e)) + + def __onNewIndexMsg(self, err_msg): + """新建重建索引库槽""" + self.__stopWait() + if err_msg == "": + QtWidgets.QMessageBox.information(self, "提示", "新建/重建 索引库成功") + else: + QtWidgets.QMessageBox.warning(self, "错误", err_msg) + + def newIndexLibrary(self): + """新建重建索引库""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库") + return + dlg = QtWidgets.QDialog(self) + ui = ui_newlibrarydialog.Ui_NewlibraryDialog() + ui.setupUi(dlg) + result = dlg.exec_() + index_method = ui.indexMethodCmb.currentText() + force = ui.resetCheckBox.isChecked() + if result == QtWidgets.QDialog.Accepted: + self.__startWait("正在 新建/重建 索引库,请等待。。。") + thread = threading.Thread(target=self.__newIndexThread, + args=(self.__imageListMgr.dirName, + "image_list.txt", index_method, + force)) + thread.start() + + def __openIndexThread(self, index_root_path: str, image_list_path: str): + """打开索引库线程""" + try: + client = index_http_client.IndexHttpClient( + DEFAULT_HOST, DEFAULT_PORT) + err_msg = client.open_index(index_root_path=index_root_path, + image_list_path=image_list_path) + if err_msg == None: + err_msg = "" + self.openIndexMsg.emit(err_msg) + except Exception as e: + self.openIndexMsg.emit(str(e)) + + def __onOpenIndexMsg(self, err_msg): + """打开索引库槽""" + self.__stopWait() + if err_msg == "": + QtWidgets.QMessageBox.information(self, "提示", "打开索引库成功") + else: + QtWidgets.QMessageBox.warning(self, "错误", err_msg) + + def openIndexLibrary(self): + """打开索引库""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库") + return + self.__startWait("正在打开索引库,请等待。。。") + thread = threading.Thread(target=self.__openIndexThread, + args=(self.__imageListMgr.dirName, + "image_list.txt")) + thread.start() + + def __updateIndexThread(self, index_root_path: str, image_list_path: str): + """更新索引库线程""" + try: + client = index_http_client.IndexHttpClient( + DEFAULT_HOST, DEFAULT_PORT) + err_msg = client.update_index(image_list_path=image_list_path, + index_root_path=index_root_path) + if err_msg == None: + err_msg = "" + self.updateIndexMsg.emit(err_msg) + except Exception as e: + self.updateIndexMsg.emit(str(e)) + + def __onUpdateIndexMsg(self, err_msg): + """更新索引库槽""" + self.__stopWait() + if err_msg == "": + QtWidgets.QMessageBox.information(self, "提示", "更新索引库成功") + else: + QtWidgets.QMessageBox.warning(self, "错误", err_msg) + + def updateIndexLibrary(self): + """更新索引库""" + if not os.path.exists(self.__imageListMgr.filePath): + QtWidgets.QMessageBox.information(self, "提示", "请先打开正确的图像库") + return + self.__startWait("正在更新索引库,请等待。。。") + thread = threading.Thread(target=self.__updateIndexThread, + args=(self.__imageListMgr.dirName, + "image_list.txt")) + thread.start() + + def searchClassify(self): + """查找分类""" + if len(self.__imageListMgr.classifyList) == 0: + return + cmb = self.ui.searchClassifyHistoryCmb + txt = cmb.currentText() + is_has = False + if txt != "": + for i in range(cmb.count()): + if cmb.itemText(i) == txt: + is_has = True + break + if not is_has: + cmb.addItem(txt) + self.__classifyUiContext.searchClassify(txt) + + def showHelp(self): + """显示帮助""" + QtGui.QDesktopServices.openUrl(QtCore.QUrl(PADDLECLAS_DOC_URL)) + + def showAbout(self): + """显示关于对话框""" + QtWidgets.QMessageBox.information(self, "关于", "识图图像库管理 V1.0.0") + + def exitApp(self): + """退出应用""" + if isinstance(self.server_process, Process): + self.server_process.terminate() + # os.kill(self.server_pid) + sys.exit(0) + + def __setPathBar(self, msg: str): + """设置路径状态栏信息""" + self.__pathBar.setText("图像库路径:{}".format(msg)) + + def __setClassifyCountBar(self, msg: str): + self.__classifyCountBar.setText("分类总数量:{}".format(msg)) + + def __setImageCountBar(self, count: int): + """设置图像数量状态栏信息""" + self.__imageCountBar.setText("当前图像数量:{}".format(count)) + + def __setImageSelectedCountBar(self, count: int): + """设置选择图像数量状态栏信息""" + self.__imageSelectedBar.setText("选择图像数量:{}".format(count)) diff --git a/deploy/shitu_index_manager/mod/ui_addclassifydialog.py b/deploy/shitu_index_manager/mod/ui_addclassifydialog.py new file mode 100644 index 0000000000000000000000000000000000000000..4c824e5f62936c5d9f61aff9c603f3b377e385d8 --- /dev/null +++ b/deploy/shitu_index_manager/mod/ui_addclassifydialog.py @@ -0,0 +1,56 @@ +# -*- coding: utf-8 -*- + +# Form implementation generated from reading ui file 'ui/AddClassifyDialog.ui' +# +# Created by: PyQt5 UI code generator 5.15.5 +# +# WARNING: Any manual changes made to this file will be lost when pyuic5 is +# run again. Do not edit this file unless you know what you are doing. + +from PyQt5 import QtCore, QtGui, QtWidgets + + +class Ui_AddClassifyDialog(object): + def setupUi(self, AddClassifyDialog): + AddClassifyDialog.setObjectName("AddClassifyDialog") + AddClassifyDialog.resize(286, 127) + AddClassifyDialog.setModal(True) + self.verticalLayout = QtWidgets.QVBoxLayout(AddClassifyDialog) + self.verticalLayout.setObjectName("verticalLayout") + self.label = QtWidgets.QLabel(AddClassifyDialog) + self.label.setObjectName("label") + self.verticalLayout.addWidget(self.label) + self.lineEdit = QtWidgets.QLineEdit(AddClassifyDialog) + self.lineEdit.setObjectName("lineEdit") + self.verticalLayout.addWidget(self.lineEdit) + spacerItem = QtWidgets.QSpacerItem(20, 11, + QtWidgets.QSizePolicy.Minimum, + QtWidgets.QSizePolicy.Expanding) + self.verticalLayout.addItem(spacerItem) + self.buttonBox = QtWidgets.QDialogButtonBox(AddClassifyDialog) + self.buttonBox.setOrientation(QtCore.Qt.Horizontal) + self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel + | QtWidgets.QDialogButtonBox.Ok) + self.buttonBox.setObjectName("buttonBox") + self.verticalLayout.addWidget(self.buttonBox) + + self.retranslateUi(AddClassifyDialog) + self.buttonBox.accepted.connect(AddClassifyDialog.accept) + self.buttonBox.rejected.connect(AddClassifyDialog.reject) + QtCore.QMetaObject.connectSlotsByName(AddClassifyDialog) + + def retranslateUi(self, AddClassifyDialog): + _translate = QtCore.QCoreApplication.translate + AddClassifyDialog.setWindowTitle( + _translate("AddClassifyDialog", "添加分类")) + self.label.setText(_translate("AddClassifyDialog", "分类名称")) + + +if __name__ == "__main__": + import sys + app = QtWidgets.QApplication(sys.argv) + AddClassifyDialog = QtWidgets.QDialog() + ui = Ui_AddClassifyDialog() + ui.setupUi(AddClassifyDialog) + AddClassifyDialog.show() + sys.exit(app.exec_()) diff --git a/deploy/shitu_index_manager/mod/ui_imageeditclassifydialog.py b/deploy/shitu_index_manager/mod/ui_imageeditclassifydialog.py new file mode 100644 index 0000000000000000000000000000000000000000..cce943c0fcb524f2fc9ff2e61e4ca73c3f7d6e29 --- /dev/null +++ b/deploy/shitu_index_manager/mod/ui_imageeditclassifydialog.py @@ -0,0 +1,75 @@ +# -*- coding: utf-8 -*- + +# Form implementation generated from reading ui file 'ui/ImageEditClassifyDialog.ui' +# +# Created by: PyQt5 UI code generator 5.15.5 +# +# WARNING: Any manual changes made to this file will be lost when pyuic5 is +# run again. Do not edit this file unless you know what you are doing. + +from PyQt5 import QtCore, QtGui, QtWidgets + + +class Ui_Dialog(object): + def setupUi(self, Dialog): + Dialog.setObjectName("Dialog") + Dialog.resize(414, 415) + Dialog.setMinimumSize(QtCore.QSize(0, 0)) + self.verticalLayout = QtWidgets.QVBoxLayout(Dialog) + self.verticalLayout.setObjectName("verticalLayout") + self.label = QtWidgets.QLabel(Dialog) + self.label.setObjectName("label") + self.verticalLayout.addWidget(self.label) + self.oldLineEdit = QtWidgets.QLineEdit(Dialog) + self.oldLineEdit.setEnabled(False) + self.oldLineEdit.setObjectName("oldLineEdit") + self.verticalLayout.addWidget(self.oldLineEdit) + self.label_2 = QtWidgets.QLabel(Dialog) + self.label_2.setObjectName("label_2") + self.verticalLayout.addWidget(self.label_2) + self.newLineEdit = QtWidgets.QLineEdit(Dialog) + self.newLineEdit.setEnabled(False) + self.newLineEdit.setObjectName("newLineEdit") + self.verticalLayout.addWidget(self.newLineEdit) + self.horizontalLayout = QtWidgets.QHBoxLayout() + self.horizontalLayout.setObjectName("horizontalLayout") + self.searchWordLineEdit = QtWidgets.QLineEdit(Dialog) + self.searchWordLineEdit.setObjectName("searchWordLineEdit") + self.horizontalLayout.addWidget(self.searchWordLineEdit) + self.searchButton = QtWidgets.QPushButton(Dialog) + self.searchButton.setObjectName("searchButton") + self.horizontalLayout.addWidget(self.searchButton) + self.verticalLayout.addLayout(self.horizontalLayout) + self.classifyListView = QtWidgets.QListView(Dialog) + self.classifyListView.setEnabled(True) + self.classifyListView.setMinimumSize(QtCore.QSize(400, 200)) + self.classifyListView.setObjectName("classifyListView") + self.verticalLayout.addWidget(self.classifyListView) + self.buttonBox = QtWidgets.QDialogButtonBox(Dialog) + self.buttonBox.setOrientation(QtCore.Qt.Horizontal) + self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel + | QtWidgets.QDialogButtonBox.Ok) + self.buttonBox.setObjectName("buttonBox") + self.verticalLayout.addWidget(self.buttonBox) + + self.retranslateUi(Dialog) + self.buttonBox.accepted.connect(Dialog.accept) + self.buttonBox.rejected.connect(Dialog.reject) + QtCore.QMetaObject.connectSlotsByName(Dialog) + + def retranslateUi(self, Dialog): + _translate = QtCore.QCoreApplication.translate + Dialog.setWindowTitle(_translate("Dialog", "编辑图像分类")) + self.label.setText(_translate("Dialog", "原分类")) + self.label_2.setText(_translate("Dialog", "新分类")) + self.searchButton.setText(_translate("Dialog", "查找")) + + +if __name__ == "__main__": + import sys + app = QtWidgets.QApplication(sys.argv) + Dialog = QtWidgets.QDialog() + ui = Ui_Dialog() + ui.setupUi(Dialog) + Dialog.show() + sys.exit(app.exec_()) diff --git a/deploy/shitu_index_manager/mod/ui_mainwindow.py b/deploy/shitu_index_manager/mod/ui_mainwindow.py new file mode 100644 index 0000000000000000000000000000000000000000..0f544ce1152bd947fd59c36595122fe4bc9fd5f0 --- /dev/null +++ b/deploy/shitu_index_manager/mod/ui_mainwindow.py @@ -0,0 +1,155 @@ +# -*- coding: utf-8 -*- + +# Form implementation generated from reading ui file 'ui/MainWindow.ui' +# +# Created by: PyQt5 UI code generator 5.15.5 +# +# WARNING: Any manual changes made to this file will be lost when pyuic5 is +# run again. Do not edit this file unless you know what you are doing. + +from PyQt5 import QtCore, QtGui, QtWidgets + + +class Ui_MainWindow(object): + def setupUi(self, MainWindow): + MainWindow.setObjectName("MainWindow") + MainWindow.resize(833, 538) + MainWindow.setMinimumSize(QtCore.QSize(0, 0)) + self.centralwidget = QtWidgets.QWidget(MainWindow) + self.centralwidget.setObjectName("centralwidget") + self.verticalLayout_3 = QtWidgets.QVBoxLayout(self.centralwidget) + self.verticalLayout_3.setObjectName("verticalLayout_3") + self.horizontalLayout_3 = QtWidgets.QHBoxLayout() + self.horizontalLayout_3.setObjectName("horizontalLayout_3") + self.appMenuBtn = QtWidgets.QToolButton(self.centralwidget) + self.appMenuBtn.setObjectName("appMenuBtn") + self.horizontalLayout_3.addWidget(self.appMenuBtn) + self.saveImageLibraryBtn = QtWidgets.QToolButton(self.centralwidget) + self.saveImageLibraryBtn.setObjectName("saveImageLibraryBtn") + self.horizontalLayout_3.addWidget(self.saveImageLibraryBtn) + self.addClassifyBtn = QtWidgets.QToolButton(self.centralwidget) + self.addClassifyBtn.setObjectName("addClassifyBtn") + self.horizontalLayout_3.addWidget(self.addClassifyBtn) + self.removeClassifyBtn = QtWidgets.QToolButton(self.centralwidget) + self.removeClassifyBtn.setObjectName("removeClassifyBtn") + self.horizontalLayout_3.addWidget(self.removeClassifyBtn) + spacerItem = QtWidgets.QSpacerItem(40, 20, + QtWidgets.QSizePolicy.Expanding, + QtWidgets.QSizePolicy.Minimum) + self.horizontalLayout_3.addItem(spacerItem) + self.imageScaleSlider = QtWidgets.QSlider(self.centralwidget) + self.imageScaleSlider.setMaximumSize(QtCore.QSize(400, 16777215)) + self.imageScaleSlider.setMinimum(1) + self.imageScaleSlider.setMaximum(8) + self.imageScaleSlider.setPageStep(2) + self.imageScaleSlider.setOrientation(QtCore.Qt.Horizontal) + self.imageScaleSlider.setObjectName("imageScaleSlider") + self.horizontalLayout_3.addWidget(self.imageScaleSlider) + self.verticalLayout_3.addLayout(self.horizontalLayout_3) + self.splitter = QtWidgets.QSplitter(self.centralwidget) + sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding, + QtWidgets.QSizePolicy.Expanding) + sizePolicy.setHorizontalStretch(0) + sizePolicy.setVerticalStretch(0) + sizePolicy.setHeightForWidth( + self.splitter.sizePolicy().hasHeightForWidth()) + self.splitter.setSizePolicy(sizePolicy) + self.splitter.setOrientation(QtCore.Qt.Horizontal) + self.splitter.setObjectName("splitter") + self.widget = QtWidgets.QWidget(self.splitter) + self.widget.setObjectName("widget") + self.verticalLayout_2 = QtWidgets.QVBoxLayout(self.widget) + self.verticalLayout_2.setContentsMargins(0, 0, 0, 0) + self.verticalLayout_2.setObjectName("verticalLayout_2") + self.horizontalLayout = QtWidgets.QHBoxLayout() + self.horizontalLayout.setObjectName("horizontalLayout") + self.searchClassifyHistoryCmb = QtWidgets.QComboBox(self.widget) + sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding, + QtWidgets.QSizePolicy.Fixed) + sizePolicy.setHorizontalStretch(0) + sizePolicy.setVerticalStretch(0) + sizePolicy.setHeightForWidth( + self.searchClassifyHistoryCmb.sizePolicy().hasHeightForWidth()) + self.searchClassifyHistoryCmb.setSizePolicy(sizePolicy) + self.searchClassifyHistoryCmb.setEditable(True) + self.searchClassifyHistoryCmb.setObjectName("searchClassifyHistoryCmb") + self.horizontalLayout.addWidget(self.searchClassifyHistoryCmb) + self.searchClassifyBtn = QtWidgets.QToolButton(self.widget) + self.searchClassifyBtn.setObjectName("searchClassifyBtn") + self.horizontalLayout.addWidget(self.searchClassifyBtn) + self.verticalLayout_2.addLayout(self.horizontalLayout) + self.classifyListView = QtWidgets.QListView(self.widget) + sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding, + QtWidgets.QSizePolicy.Expanding) + sizePolicy.setHorizontalStretch(0) + sizePolicy.setVerticalStretch(0) + sizePolicy.setHeightForWidth( + self.classifyListView.sizePolicy().hasHeightForWidth()) + self.classifyListView.setSizePolicy(sizePolicy) + self.classifyListView.setMinimumSize(QtCore.QSize(200, 0)) + self.classifyListView.setEditTriggers( + QtWidgets.QAbstractItemView.NoEditTriggers) + self.classifyListView.setObjectName("classifyListView") + self.verticalLayout_2.addWidget(self.classifyListView) + self.widget1 = QtWidgets.QWidget(self.splitter) + self.widget1.setObjectName("widget1") + self.verticalLayout = QtWidgets.QVBoxLayout(self.widget1) + self.verticalLayout.setContentsMargins(0, 0, 0, 0) + self.verticalLayout.setObjectName("verticalLayout") + self.horizontalLayout_2 = QtWidgets.QHBoxLayout() + self.horizontalLayout_2.setObjectName("horizontalLayout_2") + self.addImageBtn = QtWidgets.QToolButton(self.widget1) + self.addImageBtn.setObjectName("addImageBtn") + self.horizontalLayout_2.addWidget(self.addImageBtn) + self.removeImageBtn = QtWidgets.QToolButton(self.widget1) + self.removeImageBtn.setObjectName("removeImageBtn") + self.horizontalLayout_2.addWidget(self.removeImageBtn) + spacerItem1 = QtWidgets.QSpacerItem(40, 20, + QtWidgets.QSizePolicy.Expanding, + QtWidgets.QSizePolicy.Minimum) + self.horizontalLayout_2.addItem(spacerItem1) + self.verticalLayout.addLayout(self.horizontalLayout_2) + self.imageListWidget = QtWidgets.QListWidget(self.widget1) + sizePolicy = QtWidgets.QSizePolicy(QtWidgets.QSizePolicy.Expanding, + QtWidgets.QSizePolicy.Expanding) + sizePolicy.setHorizontalStretch(0) + sizePolicy.setVerticalStretch(0) + sizePolicy.setHeightForWidth( + self.imageListWidget.sizePolicy().hasHeightForWidth()) + self.imageListWidget.setSizePolicy(sizePolicy) + self.imageListWidget.setMinimumSize(QtCore.QSize(200, 0)) + self.imageListWidget.setStyleSheet( + "QListWidget::Item:hover{background:skyblue;padding-top:0px; padding-bottom:0px;}\n" + "QListWidget::item:selected{background:rgb(245, 121, 0); color:red;}" + ) + self.imageListWidget.setObjectName("imageListWidget") + self.verticalLayout.addWidget(self.imageListWidget) + self.verticalLayout_3.addWidget(self.splitter) + MainWindow.setCentralWidget(self.centralwidget) + self.statusbar = QtWidgets.QStatusBar(MainWindow) + self.statusbar.setObjectName("statusbar") + MainWindow.setStatusBar(self.statusbar) + + self.retranslateUi(MainWindow) + QtCore.QMetaObject.connectSlotsByName(MainWindow) + + def retranslateUi(self, MainWindow): + _translate = QtCore.QCoreApplication.translate + MainWindow.setWindowTitle(_translate("MainWindow", "识图图像库管理")) + self.appMenuBtn.setText(_translate("MainWindow", "...")) + self.saveImageLibraryBtn.setText(_translate("MainWindow", "...")) + self.addClassifyBtn.setText(_translate("MainWindow", "...")) + self.removeClassifyBtn.setText(_translate("MainWindow", "...")) + self.searchClassifyBtn.setText(_translate("MainWindow", "...")) + self.addImageBtn.setText(_translate("MainWindow", "...")) + self.removeImageBtn.setText(_translate("MainWindow", "...")) + + +if __name__ == "__main__": + import sys + app = QtWidgets.QApplication(sys.argv) + MainWindow = QtWidgets.QMainWindow() + ui = Ui_MainWindow() + ui.setupUi(MainWindow) + MainWindow.show() + sys.exit(app.exec_()) diff --git a/deploy/shitu_index_manager/mod/ui_newlibrarydialog.py b/deploy/shitu_index_manager/mod/ui_newlibrarydialog.py new file mode 100644 index 0000000000000000000000000000000000000000..dcfad9e1b217f0632c71049a2898a8c782644325 --- /dev/null +++ b/deploy/shitu_index_manager/mod/ui_newlibrarydialog.py @@ -0,0 +1,71 @@ +# -*- coding: utf-8 -*- + +# Form implementation generated from reading ui file 'ui/NewlibraryDialog.ui' +# +# Created by: PyQt5 UI code generator 5.15.5 +# +# WARNING: Any manual changes made to this file will be lost when pyuic5 is +# run again. Do not edit this file unless you know what you are doing. + +from PyQt5 import QtCore, QtGui, QtWidgets + + +class Ui_NewlibraryDialog(object): + def setupUi(self, NewlibraryDialog): + NewlibraryDialog.setObjectName("NewlibraryDialog") + NewlibraryDialog.resize(414, 230) + self.verticalLayout = QtWidgets.QVBoxLayout(NewlibraryDialog) + self.verticalLayout.setObjectName("verticalLayout") + self.label = QtWidgets.QLabel(NewlibraryDialog) + self.label.setObjectName("label") + self.verticalLayout.addWidget(self.label) + self.indexMethodCmb = QtWidgets.QComboBox(NewlibraryDialog) + self.indexMethodCmb.setEnabled(True) + self.indexMethodCmb.setObjectName("indexMethodCmb") + self.indexMethodCmb.addItem("") + self.indexMethodCmb.addItem("") + self.indexMethodCmb.addItem("") + self.verticalLayout.addWidget(self.indexMethodCmb) + self.resetCheckBox = QtWidgets.QCheckBox(NewlibraryDialog) + self.resetCheckBox.setObjectName("resetCheckBox") + self.verticalLayout.addWidget(self.resetCheckBox) + spacerItem = QtWidgets.QSpacerItem(20, 80, + QtWidgets.QSizePolicy.Minimum, + QtWidgets.QSizePolicy.Expanding) + self.verticalLayout.addItem(spacerItem) + self.buttonBox = QtWidgets.QDialogButtonBox(NewlibraryDialog) + self.buttonBox.setOrientation(QtCore.Qt.Horizontal) + self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel + | QtWidgets.QDialogButtonBox.Ok) + self.buttonBox.setObjectName("buttonBox") + self.verticalLayout.addWidget(self.buttonBox) + + self.retranslateUi(NewlibraryDialog) + self.indexMethodCmb.setCurrentIndex(0) + self.buttonBox.accepted.connect(NewlibraryDialog.accept) + self.buttonBox.rejected.connect(NewlibraryDialog.reject) + QtCore.QMetaObject.connectSlotsByName(NewlibraryDialog) + + def retranslateUi(self, NewlibraryDialog): + _translate = QtCore.QCoreApplication.translate + NewlibraryDialog.setWindowTitle( + _translate("NewlibraryDialog", "新建/重建 索引")) + self.label.setText(_translate("NewlibraryDialog", "索引方式")) + self.indexMethodCmb.setItemText( + 0, _translate("NewlibraryDialog", "HNSW32")) + self.indexMethodCmb.setItemText(1, + _translate("NewlibraryDialog", "FLAT")) + self.indexMethodCmb.setItemText(2, _translate("NewlibraryDialog", + "IVF")) + self.resetCheckBox.setText( + _translate("NewlibraryDialog", "重建索引,警告:会覆盖原索引")) + + +if __name__ == "__main__": + import sys + app = QtWidgets.QApplication(sys.argv) + NewlibraryDialog = QtWidgets.QDialog() + ui = Ui_NewlibraryDialog() + ui.setupUi(NewlibraryDialog) + NewlibraryDialog.show() + sys.exit(app.exec_()) diff --git a/deploy/shitu_index_manager/mod/ui_renameclassifydialog.py b/deploy/shitu_index_manager/mod/ui_renameclassifydialog.py new file mode 100644 index 0000000000000000000000000000000000000000..b4d1ab32a640b6919fd0f35f25d4627460347e14 --- /dev/null +++ b/deploy/shitu_index_manager/mod/ui_renameclassifydialog.py @@ -0,0 +1,63 @@ +# -*- coding: utf-8 -*- + +# Form implementation generated from reading ui file 'ui/RenameClassifyDialog.ui' +# +# Created by: PyQt5 UI code generator 5.15.5 +# +# WARNING: Any manual changes made to this file will be lost when pyuic5 is +# run again. Do not edit this file unless you know what you are doing. + +from PyQt5 import QtCore, QtGui, QtWidgets + + +class Ui_RenameClassifyDialog(object): + def setupUi(self, RenameClassifyDialog): + RenameClassifyDialog.setObjectName("RenameClassifyDialog") + RenameClassifyDialog.resize(342, 194) + self.verticalLayout = QtWidgets.QVBoxLayout(RenameClassifyDialog) + self.verticalLayout.setObjectName("verticalLayout") + self.oldlabel = QtWidgets.QLabel(RenameClassifyDialog) + self.oldlabel.setObjectName("oldlabel") + self.verticalLayout.addWidget(self.oldlabel) + self.oldNameLineEdit = QtWidgets.QLineEdit(RenameClassifyDialog) + self.oldNameLineEdit.setEnabled(False) + self.oldNameLineEdit.setObjectName("oldNameLineEdit") + self.verticalLayout.addWidget(self.oldNameLineEdit) + self.newlabel = QtWidgets.QLabel(RenameClassifyDialog) + self.newlabel.setObjectName("newlabel") + self.verticalLayout.addWidget(self.newlabel) + self.newNameLineEdit = QtWidgets.QLineEdit(RenameClassifyDialog) + self.newNameLineEdit.setObjectName("newNameLineEdit") + self.verticalLayout.addWidget(self.newNameLineEdit) + spacerItem = QtWidgets.QSpacerItem(20, 14, + QtWidgets.QSizePolicy.Minimum, + QtWidgets.QSizePolicy.Expanding) + self.verticalLayout.addItem(spacerItem) + self.buttonBox = QtWidgets.QDialogButtonBox(RenameClassifyDialog) + self.buttonBox.setOrientation(QtCore.Qt.Horizontal) + self.buttonBox.setStandardButtons(QtWidgets.QDialogButtonBox.Cancel + | QtWidgets.QDialogButtonBox.Ok) + self.buttonBox.setObjectName("buttonBox") + self.verticalLayout.addWidget(self.buttonBox) + + self.retranslateUi(RenameClassifyDialog) + self.buttonBox.accepted.connect(RenameClassifyDialog.accept) + self.buttonBox.rejected.connect(RenameClassifyDialog.reject) + QtCore.QMetaObject.connectSlotsByName(RenameClassifyDialog) + + def retranslateUi(self, RenameClassifyDialog): + _translate = QtCore.QCoreApplication.translate + RenameClassifyDialog.setWindowTitle( + _translate("RenameClassifyDialog", "重命名分类")) + self.oldlabel.setText(_translate("RenameClassifyDialog", "原名称")) + self.newlabel.setText(_translate("RenameClassifyDialog", "新名称")) + + +if __name__ == "__main__": + import sys + app = QtWidgets.QApplication(sys.argv) + RenameClassifyDialog = QtWidgets.QDialog() + ui = Ui_RenameClassifyDialog() + ui.setupUi(RenameClassifyDialog) + RenameClassifyDialog.show() + sys.exit(app.exec_()) diff --git a/deploy/shitu_index_manager/mod/ui_waitdialog.py b/deploy/shitu_index_manager/mod/ui_waitdialog.py new file mode 100644 index 0000000000000000000000000000000000000000..921ba9b75d64edc4cfaadf4a393abf7fba2a3e17 --- /dev/null +++ b/deploy/shitu_index_manager/mod/ui_waitdialog.py @@ -0,0 +1,49 @@ +# -*- coding: utf-8 -*- + +# Form implementation generated from reading ui file 'ui/WaitDialog.ui' +# +# Created by: PyQt5 UI code generator 5.15.5 +# +# WARNING: Any manual changes made to this file will be lost when pyuic5 is +# run again. Do not edit this file unless you know what you are doing. + +from PyQt5 import QtCore, QtGui, QtWidgets + + +class Ui_WaitDialog(object): + def setupUi(self, WaitDialog): + WaitDialog.setObjectName("WaitDialog") + WaitDialog.setWindowModality(QtCore.Qt.NonModal) + WaitDialog.resize(324, 78) + self.verticalLayout = QtWidgets.QVBoxLayout(WaitDialog) + self.verticalLayout.setObjectName("verticalLayout") + self.msgLabel = QtWidgets.QLabel(WaitDialog) + self.msgLabel.setObjectName("msgLabel") + self.verticalLayout.addWidget(self.msgLabel) + self.progressBar = QtWidgets.QProgressBar(WaitDialog) + self.progressBar.setMaximum(0) + self.progressBar.setProperty("value", -1) + self.progressBar.setObjectName("progressBar") + self.verticalLayout.addWidget(self.progressBar) + spacerItem = QtWidgets.QSpacerItem(20, 1, + QtWidgets.QSizePolicy.Minimum, + QtWidgets.QSizePolicy.Expanding) + self.verticalLayout.addItem(spacerItem) + + self.retranslateUi(WaitDialog) + QtCore.QMetaObject.connectSlotsByName(WaitDialog) + + def retranslateUi(self, WaitDialog): + _translate = QtCore.QCoreApplication.translate + WaitDialog.setWindowTitle(_translate("WaitDialog", "请等待")) + self.msgLabel.setText(_translate("WaitDialog", "正在更新索引库,请等待。。。")) + + +if __name__ == "__main__": + import sys + app = QtWidgets.QApplication(sys.argv) + WaitDialog = QtWidgets.QDialog() + ui = Ui_WaitDialog() + ui.setupUi(WaitDialog) + WaitDialog.show() + sys.exit(app.exec_()) diff --git a/deploy/shitu_index_manager/mod/utils.py b/deploy/shitu_index_manager/mod/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..2886522654998429cae3fc720d39343db41db75b --- /dev/null +++ b/deploy/shitu_index_manager/mod/utils.py @@ -0,0 +1,142 @@ +import os +import sys + +from PyQt5 import QtCore, QtGui, QtWidgets +import hashlib +import shutil +from mod import image_list_manager + + +def setMenu(menu: QtWidgets.QMenu, text: str, triggered): + """设置菜单""" + action = menu.addAction(text) + action.triggered.connect(triggered) + + +def fileMD5(file_path: str): + """计算文件的MD5值""" + md5 = hashlib.md5() + with open(file_path, 'rb') as f: + md5.update(f.read()) + return md5.hexdigest().lower() + + +def copyFile(from_path: str, to_path: str): + """复制文件""" + shutil.copyfile(from_path, to_path) + return os.path.exists(to_path) + + +def removeFile(file_path: str): + """删除文件""" + if os.path.exists(file_path): + os.remove(file_path) + return not os.path.exists(file_path) + + +def fileExtension(file_path: str): + """获取文件的扩展名""" + return os.path.splitext(file_path)[1] + + +def copyImageToDir(self, from_image_path: str, to_dir_path: str): + """复制图像文件到目标目录""" + if not os.path.exists(from_image_path) and not os.path.exists(to_dir_path): + return None + md5 = fileMD5(from_image_path) + file_ext = fileExtension(from_image_path) + new_path = os.path.join(to_dir_path, md5 + file_ext) + copyFile(from_image_path, new_path) + return new_path + + +def oneKeyImportFromFile(from_path: str, to_path: str): + """从其它图像库 from_path {image_list.txt} 导入到图像库 to_path {image_list.txt}""" + if not os.path.exists(from_path) or not os.path.exists(to_path): + return None + if from_path == to_path: + return None + from_mgr = image_list_manager.ImageListManager(file_path=from_path) + to_mgr = image_list_manager.ImageListManager(file_path=to_path) + return oneKeyImport(from_mgr=from_mgr, to_mgr=to_mgr) + + +def oneKeyImportFromDirs(from_dir: str, to_image_list_path: str): + """从其它图像库 from_dir 搜索子目录 导入到图像库 to_image_list_path""" + if not os.path.exists(from_dir) or not os.path.exists(to_image_list_path): + return None + if from_dir == os.path.dirname(to_image_list_path): + return None + from_mgr = image_list_manager.ImageListManager() + to_mgr = image_list_manager.ImageListManager( + file_path=to_image_list_path) + from_mgr.dirName = from_dir + sub_dir_list = os.listdir(from_dir) + for sub_dir in sub_dir_list: + real_sub_dir = os.path.join(from_dir, sub_dir) + if not os.path.isdir(real_sub_dir): + continue + img_list = os.listdir(real_sub_dir) + img_path = [] + for img in img_list: + real_img = os.path.join(real_sub_dir, img) + if not os.path.isfile(real_img): + continue + img_path.append("{}/{}".format(sub_dir, img)) + if len(img_path) == 0: + continue + from_mgr.addClassify(sub_dir) + from_mgr.resetImageList(sub_dir, img_path) + return oneKeyImport(from_mgr=from_mgr, to_mgr=to_mgr) + + +def oneKeyImport(from_mgr: image_list_manager.ImageListManager, + to_mgr: image_list_manager.ImageListManager): + """一键导入""" + count = 0 + for classify in from_mgr.classifyList: + img_list = from_mgr.realPathList(classify) + to_mgr.addClassify(classify) + to_img_list = to_mgr.imageList(classify) + new_img_list = [] + for img in img_list: + from_image_path = img + to_dir_path = os.path.join(to_mgr.dirName, "images") + md5 = fileMD5(from_image_path) + file_ext = fileExtension(from_image_path) + new_path = os.path.join(to_dir_path, md5 + file_ext) + if os.path.exists(new_path): + # 如果新文件 MD5 重复跳过后面的复制文件操作 + continue + copyFile(from_image_path, new_path) + new_img_list.append("images/" + md5 + file_ext) + count += 1 + to_img_list += new_img_list + to_mgr.resetImageList(classify, to_img_list) + to_mgr.writeFile() + return count + + +def newFile(file_path: str): + """创建文件""" + if os.path.exists(file_path): + return False + else: + with open(file_path, 'w') as f: + pass + return True + + +def isEmptyDir(dir_path: str): + """判断目录是否为空""" + return not os.listdir(dir_path) + + +def initLibrary(dir_path: str): + """初始化库""" + images_dir = os.path.join(dir_path, "images") + if not os.path.exists(images_dir): + os.makedirs(images_dir) + image_list_path = os.path.join(dir_path, "image_list.txt") + newFile(image_list_path) + return os.path.exists(dir_path) diff --git a/deploy/shitu_index_manager/resource/add_classify.png b/deploy/shitu_index_manager/resource/add_classify.png new file mode 100644 index 0000000000000000000000000000000000000000..0ba0d1d3f58654b3e52dbe8f4d537a15066670f0 Binary files /dev/null and b/deploy/shitu_index_manager/resource/add_classify.png differ diff --git a/deploy/shitu_index_manager/resource/add_image.png b/deploy/shitu_index_manager/resource/add_image.png new file mode 100644 index 0000000000000000000000000000000000000000..2a7493f79ecd36271501ebccbc21c56271510ee9 Binary files /dev/null and b/deploy/shitu_index_manager/resource/add_image.png differ diff --git a/deploy/shitu_index_manager/resource/app_icon.png b/deploy/shitu_index_manager/resource/app_icon.png new file mode 100644 index 0000000000000000000000000000000000000000..0991f667fb93676e9e1a882099813fe585d4a48c Binary files /dev/null and b/deploy/shitu_index_manager/resource/app_icon.png differ diff --git a/deploy/shitu_index_manager/resource/app_menu.png b/deploy/shitu_index_manager/resource/app_menu.png new file mode 100644 index 0000000000000000000000000000000000000000..d46180f45d184c5c3d88d58796d0096fead1b976 Binary files /dev/null and b/deploy/shitu_index_manager/resource/app_menu.png differ diff --git a/deploy/shitu_index_manager/resource/remove_classify.png b/deploy/shitu_index_manager/resource/remove_classify.png new file mode 100644 index 0000000000000000000000000000000000000000..51efb8a2a00c5767e45d6b532547ac963321cfbf Binary files /dev/null and b/deploy/shitu_index_manager/resource/remove_classify.png differ diff --git a/deploy/shitu_index_manager/resource/remove_image.png b/deploy/shitu_index_manager/resource/remove_image.png new file mode 100644 index 0000000000000000000000000000000000000000..057d3c20405754bbf51d38d73de36259e8ac12a4 Binary files /dev/null and b/deploy/shitu_index_manager/resource/remove_image.png differ diff --git a/deploy/shitu_index_manager/resource/save_image_Library.png b/deploy/shitu_index_manager/resource/save_image_Library.png new file mode 100644 index 0000000000000000000000000000000000000000..67e0a394a9aea56b83eb563f1b38ec36a9c724bc Binary files /dev/null and b/deploy/shitu_index_manager/resource/save_image_Library.png differ diff --git a/deploy/shitu_index_manager/resource/search_classify.png b/deploy/shitu_index_manager/resource/search_classify.png new file mode 100644 index 0000000000000000000000000000000000000000..bdd75d8556c8bd05a4df60ac000d2da332dd722c Binary files /dev/null and b/deploy/shitu_index_manager/resource/search_classify.png differ diff --git a/deploy/shitu_index_manager/ui/AddClassifyDialog.ui b/deploy/shitu_index_manager/ui/AddClassifyDialog.ui new file mode 100644 index 0000000000000000000000000000000000000000..bdc06f3aa942257d562f426bea5eb035812c5d91 --- /dev/null +++ b/deploy/shitu_index_manager/ui/AddClassifyDialog.ui @@ -0,0 +1,90 @@ + + + AddClassifyDialog + + + + 0 + 0 + 286 + 127 + + + + 添加分类 + + + true + + + + + + 分类名称 + + + + + + + + + + Qt::Vertical + + + + 20 + 11 + + + + + + + + Qt::Horizontal + + + QDialogButtonBox::Cancel|QDialogButtonBox::Ok + + + + + + + + + buttonBox + accepted() + AddClassifyDialog + accept() + + + 248 + 254 + + + 157 + 274 + + + + + buttonBox + rejected() + AddClassifyDialog + reject() + + + 316 + 260 + + + 286 + 274 + + + + + diff --git a/deploy/shitu_index_manager/ui/ImageEditClassifyDialog.ui b/deploy/shitu_index_manager/ui/ImageEditClassifyDialog.ui new file mode 100644 index 0000000000000000000000000000000000000000..d21624fd2defb41e676b1927a50fc66605f977a1 --- /dev/null +++ b/deploy/shitu_index_manager/ui/ImageEditClassifyDialog.ui @@ -0,0 +1,125 @@ + + + Dialog + + + + 0 + 0 + 414 + 415 + + + + + 0 + 0 + + + + 编辑图像分类 + + + + + + 原分类 + + + + + + + false + + + + + + + 新分类 + + + + + + + false + + + + + + + + + + + + 查找 + + + + + + + + + true + + + + 400 + 200 + + + + + + + + Qt::Horizontal + + + QDialogButtonBox::Cancel|QDialogButtonBox::Ok + + + + + + + + + buttonBox + accepted() + Dialog + accept() + + + 248 + 254 + + + 157 + 274 + + + + + buttonBox + rejected() + Dialog + reject() + + + 316 + 260 + + + 286 + 274 + + + + + diff --git a/deploy/shitu_index_manager/ui/MainWindow.ui b/deploy/shitu_index_manager/ui/MainWindow.ui new file mode 100644 index 0000000000000000000000000000000000000000..8d8808b36e80e0f90d1f398c8bba39ff3fa181b4 --- /dev/null +++ b/deploy/shitu_index_manager/ui/MainWindow.ui @@ -0,0 +1,212 @@ + + + MainWindow + + + + 0 + 0 + 833 + 538 + + + + + 0 + 0 + + + + 识图图像库管理 + + + + + + + + + ... + + + + + + + ... + + + + + + + ... + + + + + + + ... + + + + + + + Qt::Horizontal + + + + 40 + 20 + + + + + + + + + 400 + 16777215 + + + + 1 + + + 8 + + + 2 + + + Qt::Horizontal + + + + + + + + + + 0 + 0 + + + + Qt::Horizontal + + + + + + + + + + 0 + 0 + + + + true + + + + + + + ... + + + + + + + + + + 0 + 0 + + + + + 200 + 0 + + + + QAbstractItemView::NoEditTriggers + + + + + + + + + + + + + ... + + + + + + + ... + + + + + + + Qt::Horizontal + + + + 40 + 20 + + + + + + + + + + + 0 + 0 + + + + + 200 + 0 + + + + QListWidget::Item:hover{background:skyblue;padding-top:0px; padding-bottom:0px;} +QListWidget::item:selected{background:rgb(245, 121, 0); color:red;} + + + + + + + + + + + + + + diff --git a/deploy/shitu_index_manager/ui/NewlibraryDialog.ui b/deploy/shitu_index_manager/ui/NewlibraryDialog.ui new file mode 100644 index 0000000000000000000000000000000000000000..0df94eae2f0fd660d65eb7ef93fba99950629fb6 --- /dev/null +++ b/deploy/shitu_index_manager/ui/NewlibraryDialog.ui @@ -0,0 +1,116 @@ + + + NewlibraryDialog + + + + 0 + 0 + 414 + 230 + + + + 新建/重建 索引 + + + + + + 索引方式 + + + + + + + true + + + 0 + + + + HNSW32 + + + + + FLAT + + + + + IVF + + + + + + + + 重建索引,警告:会覆盖原索引 + + + + + + + Qt::Vertical + + + + 20 + 80 + + + + + + + + Qt::Horizontal + + + QDialogButtonBox::Cancel|QDialogButtonBox::Ok + + + + + + + + + buttonBox + accepted() + NewlibraryDialog + accept() + + + 248 + 254 + + + 157 + 274 + + + + + buttonBox + rejected() + NewlibraryDialog + reject() + + + 316 + 260 + + + 286 + 274 + + + + + diff --git a/deploy/shitu_index_manager/ui/RenameClassifyDialog.ui b/deploy/shitu_index_manager/ui/RenameClassifyDialog.ui new file mode 100644 index 0000000000000000000000000000000000000000..53ba8606a42ff7e541317001719e89b753805633 --- /dev/null +++ b/deploy/shitu_index_manager/ui/RenameClassifyDialog.ui @@ -0,0 +1,101 @@ + + + RenameClassifyDialog + + + + 0 + 0 + 342 + 194 + + + + 重命名分类 + + + + + + 原名称 + + + + + + + false + + + + + + + 新名称 + + + + + + + + + + Qt::Vertical + + + + 20 + 14 + + + + + + + + Qt::Horizontal + + + QDialogButtonBox::Cancel|QDialogButtonBox::Ok + + + + + + + + + buttonBox + accepted() + RenameClassifyDialog + accept() + + + 248 + 254 + + + 157 + 274 + + + + + buttonBox + rejected() + RenameClassifyDialog + reject() + + + 316 + 260 + + + 286 + 274 + + + + + diff --git a/deploy/shitu_index_manager/ui/WaitDialog.ui b/deploy/shitu_index_manager/ui/WaitDialog.ui new file mode 100644 index 0000000000000000000000000000000000000000..eaf62fde565a92ae2f4ca3c06010f6d7c42d4848 --- /dev/null +++ b/deploy/shitu_index_manager/ui/WaitDialog.ui @@ -0,0 +1,54 @@ + + + WaitDialog + + + Qt::NonModal + + + + 0 + 0 + 324 + 78 + + + + 请等待 + + + + + + 正在更新索引库,请等待。。。 + + + + + + + 0 + + + -1 + + + + + + + Qt::Vertical + + + + 20 + 1 + + + + + + + + + diff --git a/deploy/utils/predictor.py b/deploy/utils/predictor.py index 9a38ccd18981c1ddd5dfc75152fa1d31f71d2b06..948b1859870d622ad370de1935775b0179a606b7 100644 --- a/deploy/utils/predictor.py +++ b/deploy/utils/predictor.py @@ -60,8 +60,12 @@ class Predictor(object): config = Config(model_file, params_file) - if args.use_gpu: + if args.get("use_gpu", False): config.enable_use_gpu(args.gpu_mem, 0) + elif args.get("use_npu", False): + config.enable_npu() + elif args.get("use_xpu", False): + config.enable_xpu() else: config.disable_gpu() if args.enable_mkldnn: diff --git a/docs/en/PPShiTu/PPShiTuV2_introduction.md b/docs/en/PPShiTu/PPShiTuV2_introduction.md new file mode 100644 index 0000000000000000000000000000000000000000..bae44aea1aa4b3ebaa3f35408786d94ee91dc9d8 --- /dev/null +++ b/docs/en/PPShiTu/PPShiTuV2_introduction.md @@ -0,0 +1,250 @@ +## PP-ShiTuV2 Image Recognition System + +## Table of contents + +- [1. Introduction of PP-ShiTuV2 model and application scenarios](#1-introduction-of-pp-shituv2-model-and-application-scenarios) +- [2. Quick experience](#2-quick-experience) + - [2.1 Quick experience of PP-ShiTu android demo](#21-quick-experience-of-pp-shitu-android-demo) + - [2.2 Quick experience of command line code](#22-quick-experience-of-command-line-code) +- [3 Module introduction and training](#3-module-introduction-and-training) + - [3.1 Mainbody detection](#31-mainbody-detection) + - [3.2 Feature Extraction](#32-feature-extraction) + - [3.3 Vector Search](#33-vector-search) +- [4. Inference Deployment](#4-inference-deployment) + - [4.1 Inference model preparation](#41-inference-model-preparation) + - [4.1.1 Export the inference model from pretrained model](#411-export-the-inference-model-from-pretrained-model) + - [4.1.2 Download the inference model directly](#412-download-the-inference-model-directly) + - [4.2 Test data preparation](#42-test-data-preparation) + - [4.3 Inference based on Python inference engine](#43-inference-based-on-python-inference-engine) + - [4.3.1 single image prediction](#431-single-image-prediction) + - [4.3.2 multi images prediction](#432-multi-images-prediction) + - [4.3 Inference based on C++ inference engine](#43-inference-based-on-c-inference-engine) + - [4.4 Serving deployment](#44-serving-deployment) + - [4.5 Lite deployment](#45-lite-deployment) + - [4.6 Paddle2ONNX](#46-paddle2onnx) +- [references](#references) + +## 1. Introduction of PP-ShiTuV2 model and application scenarios + +PP-shituv2 is a practical lightweight general image recognition system improved on PP-ShitUV1. It is composed of three modules: mainbody detection, feature extraction and vector search. Compared with PP-ShiTuV1, PP-ShiTuV2 has higher recognition accuracy, stronger generalization and similar inference speed *. This paper mainly optimize in training dataset, feature extraction with better backbone network, loss function and training strategy, which significantly improved the retrieval performance of PP-ShiTuV2 in multiple practical application scenarios. + +
+ +
+ +The following table lists the relevant metric obtained by PP-ShiTuV2 with comparison to PP-ShiTuV1. + +| model | storage (mainbody detection + feature extraction) | product | +| :--------- | :------------------------------------------------ | :------- | +| | | recall@1 | +| PP-ShiTuV1 | 64(30+34)MB | 66.8% | +| PP-ShiTuV2 | 49(30+19) | 73.8% | + +**Note:** +- For the introduction of recall and mAP metric, please refer to [Retrieval Metric](../algorithm_introduction/reid.md). +- Latency is based on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz test, MKLDNN acceleration strategy is enabled, and the number of threads is 10. + +## 2. Quick experience + +### 2.1 Quick experience of PP-ShiTu android demo + +You can download and install the APP by scanning the QR code or [click the link](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk) + +
+ +Then save the following demo pictures to your phone: + +
+ +Open the installed APP, click the "**file recognition**" button below, select the above saved image, and you can get the following recognition results: + +
+ +### 2.2 Quick experience of command line code + +- First follow the commands below to install paddlepaddle and faiss + ```shell + # If your machine is installed with CUDA9 or CUDA10, please run the following command to install + python3.7 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple + + # If your machine is CPU, please run the following command to install + python3.7 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + + # install faiss database + python3.7 -m pip install faiss-cpu==1.7.1post2 + ``` + +- Then follow the command below to install the paddleclas whl package + ```shell + # Go to the root directory of PaddleClas + cd PaddleClas + + # install paddleclas + python3.7 setup.py install + ``` + +- Then execute the following command to download and decompress the demo data, and finally execute command to quick start image recognition + + ```shell + # Download and unzip the demo data + wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar + + # Execute the identification command + paddleclas \ + --model_name=PP-ShiTuV2 \ + --infer_imgs=./drink_dataset_v2.0/test_images/100.jpeg \ + --index_dir=./drink_dataset_v2.0/index/ \ + --data_file=./drink_dataset_v2.0/gallery/drink_label.txt + ``` + +## 3 Module introduction and training + +### 3.1 Mainbody detection + +Mainbody detection is a widely used detection technology. It refers to detecting the coordinate position of one or more objects in the image, and then cropping the corresponding area in the image for identification. Mainbody detection is the pre-procedure of the recognition task. The input image is recognized after mainbody detection, which can remove complex backgrounds and effectively improve the recognition accuracy. + +Taking into account the detection speed, model size, detection accuracy and other factors, the lightweight model `PicoDet-LCNet_x2_5` developed by PaddleDetection was finally selected as the mainbody detection model of PP-ShiTuV2 + +For details on the dataset, training, evaluation, inference, etc. of the mainbody detection model, please refer to the document: [picodet_lcnet_x2_5_640_mainbody](../../en/image_recognition_pipeline/mainbody_detection_en.md). + +### 3.2 Feature Extraction + +Feature extraction is a key part of image recognition. It is designed to convert the input image into a fixed-dimensional feature vector for subsequent [vector search](../../en/image_recognition_pipeline/vector_search_en.md) . Taking into account the speed of the feature extraction model, model size, feature extraction performance and other factors, the [`PPLCNetV2_base`](../../en/models/PP-LCNet_en.md) developed by PaddleClas was finally selected as the feature extraction network. Compared with `PPLCNet_x2_5` used by PP-ShiTuV1, `PPLCNetV2_base` basically maintains high classification accuracy and reduces inference time by 40%*. + +**Note:** *The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform. + +During the experiment, we found that we can make appropriate improvements to `PPLCNetV2_base` to achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and `FC` at the end of `PPLCNetV2_base`, change the stride of the last stage (RepDepthwiseSeparable) to 1. + +For details about the dataset, training, evaluation, inference, etc. of the feature extraction model, please refer to the document: [PPLCNetV2_base_ShiTu](../../en/image_recognition_pipeline/feature_extraction_en.md). + +### 3.3 Vector Search + +Vector Search technology is widely used in image recognition. Its' main goal is to calculate the similarity or distance of the feature vector in the established vector database for a given query vector, and return the similarity ranking result of the candidate vector. + +In the PP-ShiTuV2 recognition system, we use the [Faiss](https://github.com/facebookresearch/faiss) vector research open source library, which has good adaptability, easy installation, rich algorithms, It supports the advantages of both CPU and GPU. + +For the installation and use of the Faiss vector research tool in the PP-ShiTuV2 system, please refer to the document: [vector search](../../en/image_recognition_pipeline/vector_search_en.md). + +## 4. Inference Deployment + +### 4.1 Inference model preparation +Paddle Inference is the native inference database of Paddle, which enabled on the server and the cloud to provide high-performance inference capabilities. Compared to making predictions based on pre-trained models directly, Paddle Inference can use MKLDNN, CUDNN, and TensorRT for prediction acceleration to achieve better inference performance. For more introduction to Paddle Inference inference engine, please refer to [Paddle Inference official website tutorial](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html). + +When using Paddle Inference for model inference, the loaded model type is the inference model. This case provides two methods to obtain the inference model. If you want to get the same result as the document, please click [Download the inference model directly](#412-download-the-inference-model-directly). + +#### 4.1.1 Export the inference model from pretrained model +- Please refer to the document [Mainbody Detection Inference Model Preparation](../../en/image_recognition_pipeline/mainbody_detection_en.md), or refer to [4.1.2](#412-direct download-inference-model) + +- To export the weights of the feature extraction model, you can refer to the following commands: + ```shell + python3.7 tools/export_model.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ + -o Global.pretrained_model="https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams" \ + -o Global.save_inference_dir=deploy/models/GeneralRecognitionV2_PPLCNetV2_base` + ``` + After executing the script, the `GeneralRecognitionV2_PPLCNetV2_base` folder will be generated under `deploy/models/` with the following file structure: + + ```log + deploy/models/ + ├── GeneralRecognitionV2_PPLCNetV2_base + │ ├── inference.pdiparams + │ ├── inference.pdiparams.info + │ └── inference.pdmodel + ``` + +#### 4.1.2 Download the inference model directly + +[Section 4.1.1](#411-export-the-inference-model-from-pretrained-model) provides a method to export the inference model, here we provide the exported inference model, you can download the model to the specified location and decompress it by the following command experience. + +```shell +cd deploy/models + +# Download the mainbody detection inference model and unzip it +wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar + +# Download the feature extraction inference model and unzip it +wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1. +``` + +### 4.2 Test data preparation + +After preparing the mainbody detection and feature extraction models, you also need to prepare the test data as input. You can run the following commands to download and decompress the test data. + +```shell +# return to ./deploy +cd ../ + +# Download the test data drink_dataset_v2.0 and unzip it +wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar +``` + +### 4.3 Inference based on Python inference engine + +#### 4.3.1 single image prediction + +Then execute the following command to identify the single image `./drink_dataset_v2.0/test_images/100.jpeg`. + +```shell +# Execute the following command to predict with GPU +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg" + +# Execute the following command to predict with CPU +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg" -o Global.use_gpu=False +``` + +The final output is as follows. + +```log +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] +``` + +#### 4.3.2 multi images prediction + +If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can modify the corresponding configuration through the following -o parameter. + +```shell +# Use the command below to predict with GPU +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images" +# Use the following command to predict with CPU +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images" -o Global.use_gpu=False +``` + +The terminal will output the recognition results of all images in the folder, as shown below. + +```log +... +[{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}] +Inference: 120.39852142333984 ms per batch image +[{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}] +Inference: 32.045602798461914 ms per batch image +[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}] +Inference: 113.41428756713867 ms per batch image +[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}] +Inference: 122.04337120056152 ms per batch image +[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}] +Inference: 37.95266151428223 ms per batch image +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] +... +``` + +Where `bbox` represents the bounding box of the detected mainbody, `rec_docs` represents the most similar category to the detection object in the index database, and `rec_scores` represents the corresponding similarity. + +### 4.3 Inference based on C++ inference engine +PaddleClas provides an example of inference based on C++ prediction engine, you can refer to [Server-side C++ prediction](../../../deploy/cpp_shitu/readme_en.md) to complete the corresponding inference deployment. If you are using the Windows platform, you can refer to [Visual Studio 2019 Community CMake Compilation Guide](../inference_deployment/python_deploy_en.md) to complete the corresponding prediction database compilation and model prediction work. + +### 4.4 Serving deployment +Paddle Serving provides high-performance, flexible and easy-to-use industrial-grade online inference services. Paddle Serving supports RESTful, gRPC, bRPC and other protocols, and provides inference solutions in a variety of heterogeneous hardware and operating system environments. For more introduction to Paddle Serving, please refer to [Paddle Serving Code Repository](https://github.com/PaddlePaddle/Serving). + +PaddleClas provides an example of model serving deployment based on Paddle Serving. You can refer to [Model serving deployment](../inference_deployment/recognition_serving_deploy_en.md) to complete the corresponding deployment. + +### 4.5 Lite deployment +Paddle Lite is a high-performance, lightweight, flexible and easily extensible deep learning inference framework, positioned to support multiple hardware platforms including mobile, embedded and server. For more introduction to Paddle Lite, please refer to [Paddle Lite Code Repository](https://github.com/PaddlePaddle/Paddle-Lite). + +### 4.6 Paddle2ONNX +Paddle2ONNX supports converting PaddlePaddle model format to ONNX model format. The deployment of Paddle models to various inference engines can be completed through ONNX, including TensorRT/OpenVINO/MNN/TNN/NCNN, and other inference engines or hardware that support the ONNX open source format. For more introduction to Paddle2ONNX, please refer to [Paddle2ONNX Code Repository](https://github.com/PaddlePaddle/Paddle2ONNX). + +PaddleClas provides an example of converting an inference model to an ONNX model and making inference prediction based on Paddle2ONNX. You can refer to [Paddle2ONNX Model Conversion and Prediction](../../../deploy/paddle2onnx/readme_en.md) to complete the corresponding deployment work. + +## references +1. Schall, Konstantin, et al. "GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval." International Conference on Multimedia Modeling. Springer, Cham, 2022. +2. Luo, Hao, et al. "A strong baseline and batch normalization neck for deep person re-identification." IEEE Transactions on Multimedia 22.10 (2019): 2597-2609. diff --git a/docs/en/image_recognition_pipeline/feature_extraction_en.md b/docs/en/image_recognition_pipeline/feature_extraction_en.md index f86562a37416c406497cb3723d50dc02332e4e51..26216543733529f749fe1732fbb12585ffdcc874 100644 --- a/docs/en/image_recognition_pipeline/feature_extraction_en.md +++ b/docs/en/image_recognition_pipeline/feature_extraction_en.md @@ -12,12 +12,15 @@ - [4.4 Model Inference](#4.4) -## 1.Introduction + +## 1. Abstract Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning. -## 2.Network Structure + +## 2. Introduction + In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure: @@ -31,152 +34,239 @@ Functions of the above modules : - **Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss. -## 3.General Recognition Models - -In PP-Shitu, we have [PP_LCNet_x2_5](../models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](../../../ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in [General Recognition_configuration files](../../../ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets: - -| Datasets | Data Size | Class Number | Scenarios | URL | -| ------------ | --------- | ------------ | ------------------ | ------------------------------------------------------------ | -| Aliproduct | 2498771 | 50030 | Commodities | [URL](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | -| GLDv2 | 1580470 | 81313 | Landmarks | [URL](https://github.com/cvdfoundation/google-landmark) | -| VeRI-Wild | 277797 | 30671 | Vehicle | [URL](https://github.com/PKU-IMRE/VERI-Wild) | -| LogoDet-3K | 155427 | 3000 | Logo | [URL](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | -| iCartoonFace | 389678 | 5013 | Cartoon Characters | [URL](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) | -| SOP | 59551 | 11318 | Commodities | [URL](https://cvgl.stanford.edu/projects/lifted_struct/) | -| Inshop | 25882 | 3997 | Commodities | [URL](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | -| **Total** | **5M** | **185K** | ---- | ---- | - -The results are shown in the table below: - -| Model | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP | Inshop | Latency(ms) | -| ------------- | ---------- | --------- | ---------- | ------------ | ----- | ------ | ----------- | -| PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 | - -- Evaluation metric: `Recall@1` -- CPU of the speed evaluation machine: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`. -- Evaluation conditions for the speed metric: MKLDNN enabled, number of threads set to 10 -- Address of the pre-training model: [General recognition pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams) - - -## 4.Customized Feature Extraction - -Customized feature extraction refers to retraining the feature extraction model based on one's own task. It consists of four main steps: 1) data preparation, 2) model training, 3) model evaluation, and 4) model inference. - - -### 4.1 Data Preparation -To start with, customize your dataset based on the task (See [Format description](../data_preparation/recognition_dataset_en.md#1) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below: +## 3. Methods -``` - Head: - name: ArcMargin - embedding_size: 512 - class_num: 185341 #Number of class -``` +#### 3.1 Backbone -``` -Train: - dataset: - name: ImageNetDataset - image_root: ./dataset/ #The directory where the train dataset is located - cls_label_path: ./dataset/train_reg_all_data.txt #The address of label file for train dataset -``` +The Backbone part adopts [PP-LCNetV2_base](../models/PP-LCNetV2.md), which is based on `PPLCNet_V1`, including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement After several optimization points, the final classification accuracy is similar to `PPLCNet_x2_5`, and the inference delay is reduced by 40%*. During the experiment, we made appropriate improvements to `PPLCNetV2_base`, so that it can achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and ` at the end of `PPLCNetV2_base` FC`, change the stride of the last stage (RepDepthwiseSeparable) to 1. -``` - Query: - dataset: - name: VeriWild - image_root: ./dataset/Aliproduct/. #The directory where the query dataset is located - cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for query dataset -``` +**Note:** *The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform. -``` - Gallery: - dataset: - name: VeriWild - image_root: ./dataset/Aliproduct/ #The directory where the gallery dataset is located - cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for gallery dataset -``` +#### 3.2 Neck - -### 4.2 Model Training +We use [BN Neck](../../../ppcls/arch/gears/bnneck.py) to standardize each dimension of the features extracted by Backbone, reducing difficulty of optimizing metric learning loss and identification loss simultaneously. -- Single machine single card training +#### 3.3 Head -``` -export CUDA_VISIBLE_DEVICES=0 -python tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -``` +We use [FC Layer](../../../ppcls/arch/gears/fc.py) as the classification head to convert features into logits for classification loss. -- Single machine multi card training +#### 3.4 Loss -``` -export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -m paddle.distributed.launch \ - --gpus="0,1,2,3" tools/train.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -``` +We use [Cross entropy loss](../../../ppcls/loss/celoss.py) and [TripletAngularMarginLoss](../../../ppcls/loss/tripletangularmarginloss.py), and we improved the original TripletLoss(TriHard Loss), replacing the optimization objective from L2 Euclidean space to cosine space, adding a hard distance constraint between anchor and positive/negtive, so the generalization ability of the model is improved. For detailed configuration files, see [GeneralRecognitionV2_PPLCNetV2_base.yaml](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-77). -**Note:** The configuration file adopts `online evaluation` by default, if you want to speed up the training and remove `online evaluation`, just add `-o eval_during_train=False` after the above command. After training, the final model files `latest`, `best_model` and the training log file `train.log` will be generated under the directory output. Among them, `best_model` is utilized to store the best model under the current evaluation metrics while`latest` is adopted to store the latest generated model, making it convenient to resume the training from where it was interrupted. +#### 3.5 Data Augmentation -- Resumption of Training: +We consider that the object may rotate to a certain extent and can not maintain an upright state in real scenes, so we add an appropriate [random rotation](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117) in the data augmentation to improve the retrieval performance in real scenes. -``` -export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -m paddle.distributed.launch \ - --gpus="0,1,2,3" tools/train.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ - -o Global.checkpoint="output/RecModel/latest" -``` + - -### 4.3 Model Evaluation +## 4. Experimental + +We reasonably expanded and optimized the original training data, and finally used a summary of the following 17 public datasets: + +| Dataset | Data Amount | Number of Categories | Scenario | Dataset Address | +| :--------------------- | :---------: | :------------------: | :---------: | :-------------------------------------------------------------------------------------: | +| Aliproduct | 2498771 | 50030 | Commodities | [Address](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | +| GLDv2 | 1580470 | 81313 | Landmark | [address](https://github.com/cvdfoundation/google-landmark) | +| VeRI-Wild | 277797 | 30671 | Vehicles | [Address](https://github.com/PKU-IMRE/VERI-Wild) | +| LogoDet-3K | 155427 | 3000 | Logo | [Address](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| SOP | 59551 | 11318 | Commodities | [Address](https://cvgl.stanford.edu/projects/lifted_struct/) | +| Inshop | 25882 | 3997 | Commodities | [Address](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | +| bird400 | 58388 | 400 | birds | [address](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) | +| 104flows | 12753 | 104 | Flowers | [Address](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | +| Cars | 58315 | 112 | Vehicles | [Address](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) | +| Fashion Product Images | 44441 | 47 | Products | [Address](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) | +| flowerrecognition | 24123 | 59 | flower | [address](https://www.kaggle.com/datasets/aymenktari/flowerrecognition) | +| food-101 | 101000 | 101 | food | [address](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) | +| fruits-262 | 225639 | 262 | fruits | [address](https://www.kaggle.com/datasets/aelchimminut/fruits262) | +| inaturalist | 265213 | 1010 | natural | [address](https://github.com/visipedia/inat_comp/tree/master/2017) | +| indoor-scenes | 15588 | 67 | indoor | [address](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019) | +| Products-10k | 141931 | 9691 | Products | [Address](https://products-10k.github.io/) | +| CompCars | 16016 | 431 | Vehicles | [Address](http://​​​​​​http://ai.stanford.edu/~jkrause/cars/car_dataset.html​) | +| **Total** | **6M** | **192K** | - | - | + +The final model accuracy metrics are shown in the following table: + +| Model | Latency (ms) | Storage (MB) | product* | | Aliproduct | | VeRI-Wild | | LogoDet-3k | | iCartoonFace | | SOP | | Inshop | | gldv2 | | imdb_face | | iNat | | instre | | sketch | | sop | | +| :--------------------- | :----------- | :----------- | :------------------ | :--- | ---------- | ---- | --------- | ---- | ---------- | ---- | ------------ | ---- | -------- | --------- | ------ | -------- | ----- | -------- | --------- | -------- | ---- | -------- | ------ | -------- | ------ | -------- | --- | --- | +| | | | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mrecall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | +| PP-ShiTuV1_general_rec | 5.0 | 34 | 65.9 | 54.3 | 83.9 | 83.2 | 88.7 | 60.1 | 86.1 | 73.6 | | 50.4 | 27.9 | 9.5 | 97.6 | 90.3 | +| PP-ShiTuV2_general_rec | 6.1 | 19 | 73.7 | 61.0 | 84.2 | 83.3 | 87.8 | 68.8 | 88.0 | 63.2 | 53.6 | 27.5 | | 71.4 | 39.3 | 15.6 | 98.3 | 90.9 | + +*The product dataset is a dataset made to verify the generalization performance of PP-ShiTu, and all the data are not present in the training and testing sets. The data contains 7 major categories (cosmetics, landmarks, wine, watches, cars, sports shoes, beverages) and 250 subcategories. When testing, use the labels of 250 small classes for testing; the sop dataset comes from [GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122), which can be regarded as " SOP" dataset. +* Pre-trained model address: [general_PPLCNetV2_base_pretrained_v1.0.pdparams](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams) +* The evaluation metrics used are: `Recall@1` and `mAP` +* The CPU specific information of the speed test machine is: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz` +* The evaluation conditions of the speed indicator are: MKLDNN is turned on, and the number of threads is set to 10 + + + +## 5. Custom Feature Extraction + +Custom feature extraction refers to retraining the feature extraction model according to your own task. + +Based on the `GeneralRecognitionV2_PPLCNetV2_base.yaml` configuration file, the following describes the main four steps: 1) data preparation; 2) model training; 3) model evaluation; 4) model inference + + + +### 5.1 Data Preparation + +First you need to customize your own dataset based on the task. Please refer to [Dataset Format Description](../data_preparation/recognition_dataset.md) for the dataset format and file structure. + +After the preparation is complete, it is necessary to modify the content related to the data configuration in the configuration file, mainly including the path of the dataset and the number of categories. As is as shown below: + +- Modify the number of classes: + ```yaml + Head: + name: FC + embedding_size: *feat_dim + class_num: 192612 # This is the number of classes + weight_attr: + initializer: + name: Normal + std: 0.001 + bias_attr: False + ``` +- Modify the training dataset configuration: + ```yaml + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ # Here is the directory where the train dataset is located + cls_label_path: ./dataset/train_reg_all_data_v2.txt # Here is the path of the label file corresponding to the train dataset + relabel: True + ``` +- Modify the query data configuration in the evaluation dataset: + ```yaml + Query: + dataset: + name: VeriWild + image_root: ./dataset/Aliproduct/ # Here is the directory where the query dataset is located + cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the query dataset + ``` +- Modify the gallery data configuration in the evaluation dataset: + ```yaml + Gallery: + dataset: + name: VeriWild + image_root: ./dataset/Aliproduct/ # This is the directory where the gallery dataset is located + cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the gallery dataset + ``` + + + +### 5.2 Model training + +Model training mainly includes the starting training and restoring training from checkpoint + +- Single machine and single card training + ```shell + export CUDA_VISIBLE_DEVICES=0 + python3.7 tools/train.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml + ``` +- Single machine multi-card training + ```shell + export CUDA_VISIBLE_DEVICES=0,1,2,3 + python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml + ``` +**Notice:** +The online evaluation method is used by default in the configuration file. If you want to speed up the training, you can turn off the online evaluation function, just add `-o Global.eval_during_train=False` after the above scripts. + +After training, the final model files `latest.pdparams`, `best_model.pdarams` and the training log file `train.log` will be generated in the output directory. Among them, `best_model` saves the best model under the current evaluation index, and `latest` is used to save the latest generated model, which is convenient to resume training from the checkpoint when training task is interrupted. Training can be resumed from a checkpoint by adding `-o Global.checkpoint="path_to_resume_checkpoint"` to the end of the above training scripts, as shown below. + +- Single machine and single card checkpoint recovery training + ```shell + export CUDA_VISIBLE_DEVICES=0 + python3.7 tools/train.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ + -o Global.checkpoint="output/RecModel/latest" + ``` +- Single-machine multi-card checkpoint recovery training + ```shell + export CUDA_VISIBLE_DEVICES=0,1,2,3 + python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ + -o Global.checkpoint="output/RecModel/latest" + ``` + + + +### 5.3 Model Evaluation + +In addition to the online evaluation of the model during training, the evaluation program can also be started manually to obtain the specified model's accuracy metrics. - Single Card Evaluation - -``` -export CUDA_VISIBLE_DEVICES=0 -python tools/eval.py \ --c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ --o Global.pretrained_model="output/RecModel/best_model" -``` + ```shell + export CUDA_VISIBLE_DEVICES=0 + python3.7 tools/eval.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ + -o Global.pretrained_model="output/RecModel/best_model" + ``` - Multi Card Evaluation + ```shell + export CUDA_VISIBLE_DEVICES=0,1,2,3 + python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ + -o Global.pretrained_model="output/RecModel/best_model" + ``` +**Note:** Multi Card Evaluation is recommended. This method can quickly obtain the metric cross all the data by using multi-card parallel computing, which can speed up the evaluation. -``` -export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -m paddle.distributed.launch \ - --gpus="0,1,2,3" tools/eval.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ - -o Global.pretrained_model="output/RecModel/best_model" -``` - -**Recommendation:** It is suggested to employ multi-card evaluation, which can quickly obtain the feature set of the overall dataset using multi-card parallel computing, accelerating the evaluation process. + - -### 4.4 Model Inference +### 5.4 Model Inference -Two steps are included in the inference: 1)exporting the inference model; 2)obtaining the feature vector. +The inference process consists of two steps: 1) Export the inference model; 2) Model inference to obtain feature vectors -#### 4.4.1 Export Inference Model +#### 5.4.1 Export inference model -``` -python tools/export_model.py \ --c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ +First, you need to convert the `*.pdparams` model file into inference format. The conversion script is as follows. +```shell +python3.7 tools/export_model.py \ +-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ -o Global.pretrained_model="output/RecModel/best_model" ``` +The generated inference model is located in the `PaddleClas/inference` directory by default, which contains three files, `inference.pdmodel`, `inference.pdiparams`, `inference.pdiparams.info`. +Where `inference.pdmodel` is used to store the structure of the inference model, `inference.pdiparams` and `inference.pdiparams.info` are used to store parameter information related to the inference model. -The generated inference models are under the directory `inference`, which comprises three files, namely, `inference.pdmodel`、`inference.pdiparams`、`inference.pdiparams.info`. Among them, `inference.pdmodel` serves to store the structure of inference model while `inference.pdiparams` and `inference.pdiparams.info` are mobilized to store model-related parameters. +#### 5.4.2 Get feature vector -#### 4.4.2 Obtain Feature Vector +Use the inference model converted in the previous step to convert the input image into corresponding feature vector. The inference script is as follows. -``` +```shell cd deploy -python python/predict_rec.py \ +python3.7 python/predict_rec.py \ -c configs/inference_rec.yaml \ -o Global.rec_inference_model_dir="../inference" ``` +The resulting feature output format is as follows: + +```log +wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02 + 4.39076796e-02 -2.34078392e-02 -9.49947070e-03 1.28221214e-02 + 5.53947650e-02 1.01355985e-02 -1.06436480e-02 4.97181974e-02 + -2.21862812e-02 -1.75557341e-02 1.55848479e-02 -3.33278324e-03 + ... + -3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02] +``` + +In most cases, just getting the features may not meet the users' requirements. If you want to go further on the image recognition task, you can refer to the document [Vector Search](./vector_search.md). + + + +## 6. Summary + +As a key part of image recognition, the feature extraction module has a lot of points for improvement in the network structure and the the loss function. Different datasets have their own characteristics, such as person re-identification, commodity recognition, face recognition. According to these characteristics, the academic community has proposed various methods, such as PCB, MGN, ArcFace, CircleLoss, TripletLoss, etc., which focus on the ultimate goal of increasing the gap between classes and reducing the gap within classes, so as to make a retrieval model robust enough in most scenes. + + -The output format of the obtained features is shown in the figure below:![img](../../images/feature_extraction_output.png) +## 7. References -In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](./vector_search_en.md). +1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf) +2. [Bag of Tricks and A Strong Baseline for Deep Person Re-identification](https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf) diff --git a/docs/en/inference_deployment/whl_deploy_en.md b/docs/en/inference_deployment/whl_deploy_en.md index e2666458a27f55bdb44f5fcb2646ba9107e80163..7c94f6ded4a02548012f536e222ffebb84254c21 100644 --- a/docs/en/inference_deployment/whl_deploy_en.md +++ b/docs/en/inference_deployment/whl_deploy_en.md @@ -25,17 +25,16 @@ PaddleClas supports Python wheel package for prediction. At present, PaddleClas ## 1. Installation -* installing from pypi +* **[Recommended]** Installing from PyPI: ```bash -pip3 install paddleclas==2.2.1 +pip3 install paddleclas ``` -* build own whl package and install +* Please build and install locally if you need to use the develop branch of PaddleClas to experience the latest functions, or need to redevelop based on PaddleClas. The command is as follows: ```bash -python3 setup.py bdist_wheel -pip3 install dist/* +python3 setup.py install ``` diff --git a/docs/en/installation/install_paddleclas_en.md b/docs/en/installation/install_paddleclas_en.md index 2bd7d8173f643a723845947301a76c72aeae4714..c71d3516e246bdf93a217b868e907369a7b478a9 100644 --- a/docs/en/installation/install_paddleclas_en.md +++ b/docs/en/installation/install_paddleclas_en.md @@ -25,14 +25,14 @@ git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop ## 2. Install PaddleClas and requirements -It is recommanded that installing from PyPI: +* **[Recommended]** Installing from PyPI: ```shell pip install paddleclas ``` -PaddleClas dependencies are listed in file `requirements.txt`, you can use the following command to install the dependencies. +* Please build and install locally if you need to use the develop branch of PaddleClas to experience the latest functions, or need to redevelop based on PaddleClas. The command is as follows: -``` -pip install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple +```shell +python setup.py install ``` diff --git a/docs/en/quick_start/quick_start_recognition_en.md b/docs/en/quick_start/quick_start_recognition_en.md index 61c6f2309770e1b888712cc7919d93c9fcdf26b8..670ad03e80d8dd69ae2f283704ba9e7bd04444a6 100644 --- a/docs/en/quick_start/quick_start_recognition_en.md +++ b/docs/en/quick_start/quick_start_recognition_en.md @@ -1,296 +1,395 @@ # Quick Start of Recognition -This tutorial contains 3 parts: Environment Preparation, Image Recognition Experience, and Unknown Category Image Recognition Experience. +This document contains 2 parts: PP-ShiTu android demo quick start and PP-ShiTu PC demo quick start. -If the image category already exists in the image index database, then you can take a reference to chapter [Image Recognition Experience](#2),to complete the progress of image recognition;If you wish to recognize unknow category image, which is not included in the index database,you can take a reference to chapter [Unknown Category Image Recognition Experience](#3),to complete the process of creating an index to recognize it。 +If the image category already exists in the image index library, you can directly refer to the [Image Recognition Experience](#image recognition experience) chapter to complete the image recognition process; if you want to recognize images of unknown classes, that is, the image category did not exist in the index library before , then you can refer to the [Unknown Category Image Recognition Experience](#Unknown Category Image Recognition Experience) chapter to complete the process of indexing and recognition. ## Catalogue -* [1. Enviroment Preparation](#1) -* [2. Image Recognition Experience](#2) - * [2.1 Download and Unzip the Inference Model and Demo Data](#2.1) - * [2.2 Product Recognition and Retrieval](#2.2) - * [2.2.1 Single Image Recognition](#2.2.1) - * [2.2.2 Folder-based Batch Recognition](#2.2.2) -* [3. Unknown Category Image Recognition Experience](#3) - * [3.1 Prepare for the new images and labels](#3.1) - * [3.2 Build a new Index Library](#3.2) - * [3.3 Recognize the Unknown Category Images](#3.3) +- [1. PP-ShiTu android demo for quick start](#1-pp-shitu-android-demo-for-quick-start) + - [1.1 Install PP-ShiTu android demo](#11-install-pp-shitu-android-demo) + - [1.2 Feature Experience](#12-feature-experience) + - [1.2.1 Image Retrieval](#121-image-retrieval) + - [1.2.2 Update Index](#122-update-index) + - [1.2.3 Save Index](#123-save-index) + - [1.2.4 Initialize Index](#124-initialize-index) + - [1.2.5 Preview Index](#125-preview-index) + - [1.3 Feature Details](#13-feature-details) + - [1.3.1 Image Retrieval](#131-image-retrieval) + - [1.3.2 Update Index](#132-update-index) + - [1.3.3 Save Index](#133-save-index) + - [1.3.4 Initialize Index](#134-initialize-index) + - [1.3.5 Preview Index](#135-preview-index) +- [2. PP-ShiTu PC demo for quick start](#2-pp-shitu-pc-demo-for-quick-start) + - [2.1 Environment configuration](#21-environment-configuration) + - [2.2 Image recognition experience](#22-image-recognition-experience) + - [2.2.1 Download and unzip the inference model and demo data](#221-download-and-unzip-the-inference-model-and-demo-data) + - [2.2.2 Drink recognition and retrieval](#222-drink-recognition-and-retrieval) + - [2.2.2.1 single image recognition](#2221-single-image-recognition) + - [2.2.2.2 Folder-based batch recognition](#2222-folder-based-batch-recognition) + - [2.3 Image of Unknown categories recognition experience](#23-image-of-unknown-categories-recognition-experience) + - [2.3.1 Prepare new data and labels](#231-prepare-new-data-and-labels) + - [2.3.2 Create a new index database](#232-create-a-new-index-database) + - [2.3.3 Image recognition based on the new index database](#233-image-recognition-based-on-the-new-index-database) + - [2.4 List of server recognition models](#24-list-of-server-recognition-models) + - -## 1. Enviroment Preparation +## 1. PP-ShiTu android demo for quick start -* Installation:Please take a reference to [Quick Installation ](../installation/)to configure the PaddleClas environment. + -* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`. +### 1.1 Install PP-ShiTu android demo - ``` - cd deploy - ``` +You can download and install the APP by scanning the QR code or [click the link](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk) - -## 2. Image Recognition Experience +
-The detection model with the recognition inference model for the 4 directions (Logo, Cartoon Face, Vehicle, Product), the address for downloading the test data and the address of the corresponding configuration file are as follows. + -| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database | -| ------------ | ------------- | -------- | ------- | -------- | -| Generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - | -| Logo Recognition Model | Logo Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | [build_logo.yaml](../../../deploy/configs/build_logo.yaml) | -| Cartoon Face Recognition Model| Cartoon Face Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) | -| Vehicle Fine-Grained Classfication Model | Vehicle Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) | -| Product Recignition Model | Product Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) | -| Vehicle ReID Model | Vehicle ReID Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - | +### 1.2 Feature Experience +At present, the PP-ShiTu android demo has basic features such as image retrieval, add image to the index database, saving the index database, initializing the index database, and viewing the index database. Next, we will introduce how to experience these features. -| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database | -| ------------ | ------------- | -------- | ------- | -------- | -| Lightweight generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - | -| Lightweight generic recognition model | General Scenarios | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) | +#### 1.2.1 Image Retrieval +Click the "photo recognition" button below or the "file recognition" button, you can take an image or select an image, then wait a few seconds, main object in the image will be marked and the predicted class and inference time will be shown below the image. +Take the following image as an example: -Demo data in this tutorial can be downloaded here: [download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar). + +The retrieval results obtained are visualized as follows: -**Attention** -1. If you do not have wget installed on Windows, you can download the model by copying the link into your browser and unzipping it in the appropriate folder; for Linux or macOS users, you can right-click and copy the download link to download it via the `wget` command. -2. If you want to install `wget` on macOS, you can run the following command. -3. The predict config file of the lightweight generic recognition model and the config file to build index database are used for the config of product recognition model of server-side. You can modify the path of the model to complete the index building and prediction. + -```shell -# install homebrew -ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"; -# install wget -brew install wget -``` +#### 1.2.2 Update Index +Click the "photo upload" button above or the "file upload" button , you can take an image or select an image and enter the class name of the uploaded image (such as `keyboard`), click the "OK" button, then the feature vector and classname corresponding to the image will be added to the index database. -3. If you want to isntall `wget` on Windows, you can refer to [link](https://www.cnblogs.com/jeshy/p/10518062.html). If you want to install `tar` on Windows, you can refer to [link](https://www.cnblogs.com/chooperman/p/14190107.html). +#### 1.2.3 Save Index +Click the "save index" button above , you can save the current index database as `latest`. +#### 1.2.4 Initialize Index +Click the "initialize index" button above to initialize the current library to `original`. -* You can download and unzip the data and models by following the command below +#### 1.2.5 Preview Index +Click the "class preview" button to view it in the pop-up window. -```shell -mkdir models -cd models -# Download and unzip the inference model -wget {Models download link} && tar -xf {Name of the tar archive} -cd .. + -# Download the demo data and unzip -wget {Data download link} && tar -xf {Name of the tar archive} -``` +### 1.3 Feature Details + +#### 1.3.1 Image Retrieval +After selecting the image to be retrieved, firstly, the mainbody detection will be performed through the detection model to obtain the bounding box of ​the object in the image, and then the image will be cropped and is input into the feature extraction model to obtain the corresponding feature vector and retrieved in the index database, returns and displays the final search result. + +#### 1.3.2 Update Index +After selecting the picture to be stored, firstly, the mainbody detection will be performed through the detection model to obtain the bounding box of ​the object in the image, and then the image will be cropped and is input into the feature extraction model to obtain the corresponding feature vector, and then added into index database. + +#### 1.3.3 Save Index +Save the index database in the current program index database name of `latest`, and automatically switch to `latest`. The saving logic is similar to "Save As" in general software. If the current index is already `latest`, it will be automatically overwritten, or it will switch to `latest`. + +#### 1.3.4 Initialize Index +When initializing the index database, it will automatically switch the search index database to `original.index` and `original.txt`, and automatically delete `latest.index` and `latest.txt` (if exists). + +#### 1.3.5 Preview Index +One can preview it according to the instructions in [Function Experience - Preview Index](#125-preview-index). + + +## 2. PP-ShiTu PC demo for quick start + + + +### 2.1 Environment configuration + +* Installation: Please refer to the document [Environment Preparation](../installation/install_paddleclas.md) to configure the PaddleClas operating environment. + +* Go to the `deploy` run directory. All the content and scripts in this section need to be run in the `deploy` directory, you can enter the `deploy` directory with the following scripts. + + ```shell + cd deploy + ``` + - -### 2.1 Download and Unzip the Inference Model and Demo Data +### 2.2 Image recognition experience -Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands. +The lightweight general object detection model, lightweight general recognition model and configuration file are available in following table. + + + +| Model Introduction | Recommended Scenarios | Inference Model | Prediction Profile | +| ------------------------------------------ | --------------------- | ------------------ | ------------------------------------------------------------------------ | +| Lightweight General MainBody Detection Model | General Scene | [tar format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainMainBody_lite_v1.0_infer.tar ) \| [zip format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainMainBody_lite_v1.0_infer.zip) | - | +| Lightweight General Recognition Model | General Scene | [tar format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar) \| [zip format download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.zip) | [inference_general.yaml](../../../deploy/configs/inference_general.yaml) | + +Note: Since some decompression software has problems in decompressing the above `tar` format files, it is recommended that non-script line users download the `zip` format files and decompress them. `tar` format file is recommended to use the script `tar -xf xxx.tar`unzip. + +The demo data download path of this chapter is as follows: [drink_dataset_v2.0.tar (drink data)](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar), + +The following takes **drink_dataset_v2.0.tar** as an example to introduce the PP-ShiTu quick start process on the PC. Users can also download and decompress the data of other scenarios to experience: [22 scenarios data download](../../zh_CN/introduction/ppshitu_application_scenarios.md#22-下载解压场景库数据). + +If you want to experience the server object detection and the recognition model of each scene, you can refer to [2.4 Server recognition model list](#24-list-of-server-identification-models) + +**Notice** + +- If wget is not installed in the windows environment, you can install the `wget` and tar scripts according to the following steps, or you can copy the link to the browser to download the model, decompress it and place it in the corresponding directory. +- If the `wget` script is not installed in the macOS environment, you can run the following script to install it. + ```shell + # install homebrew + ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"; + # install wget + brew install wget + ``` +- If you want to install `wget` in the windows environment, you can refer to: [link](https://www.cnblogs.com/jeshy/p/10518062.html); if you want to install the `tar` script in the windows environment, you can refer to: [Link](https://www.cnblogs.com/chooperman/p/14190107.html). + + + +#### 2.2.1 Download and unzip the inference model and demo data + +Download the demo dataset and the lightweight subject detection and recognition model. The scripts are as follows. ```shell mkdir models cd models -# Download the generic detection inference model and unzip it -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar -# Download and unpack the inference model -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar -cd .. - -# Download the demo data and unzip it -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar && tar -xf recognition_demo_data_en_v1.1.tar +# Download the mainbody detection inference model and unzip it +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar +# Download the feature extraction inference model and unzip it +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar + +cd ../ +# Download demo data and unzip it +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar ``` -Once unpacked, the `recognition_demo_data_v1.1` folder should have the following file structure. +After decompression, the `drink_dataset_v2.0/` folder be structured as follows: -``` -├── recognition_demo_data_v1.1 -│ ├── gallery_cartoon -│ ├── gallery_logo -│ ├── gallery_product -│ ├── gallery_vehicle -│ ├── test_cartoon -│ ├── test_logo -│ ├── test_product -│ └── test_vehicle +```log +├── drink_dataset_v2.0/ +│ ├── gallery/ +│ ├── index/ +│ ├── index_all/ +│ └── test_images/ ├── ... ``` -here, original images to build index are in folder `gallery_xxx`, test images are in folder `test_xxx`. You can also access specific folder for more details. +The `gallery` folder stores the original images used to build the index database, `index` represents the index database constructed based on the original images, and the `test_images` folder stores the list of images for query. -The `models` folder should have the following file structure. +The `models` folder should be structured as follows: -``` -├── product_ResNet50_vd_aliproduct_v1.0_infer +```log +├── general_PPLCNetV2_base_pretrained_v1.0_infer │ ├── inference.pdiparams │ ├── inference.pdiparams.info │ └── inference.pdmodel -├── ppyolov2_r50vd_dcn_mainbody_v1.0_infer +├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer │ ├── inference.pdiparams │ ├── inference.pdiparams.info │ └── inference.pdmodel ``` -**Attention** -If you want to use the lightweight generic recognition model, you need to re-extract the features of the demo data and re-build the index. The way is as follows: +**Notice** + +If the general feature extraction model is changed, the index for demo data must be rebuild, as follows: ```shell -python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer +python3.7 python/build_gallery.py \ +-c configs/inference_general.yaml \ +-o Global.rec_inference_model_dir=./models/general_PPLCNetV2_base_pretrained_v1.0_infer ``` - -### 2.2 Product Recognition and Retrieval - -Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction). - -**Note:** `faiss` is used as search library. The installation method is as follows: + -``` -pip install faiss-cpu==1.7.1post2 -``` +#### 2.2.2 Drink recognition and retrieval -If error happens when using `import faiss`, please uninstall `faiss` and reinstall it, especially on `Windows`. +Take the drink recognition demo as an example to show the recognition and retrieval process. - +Note that this section will uses `faiss` as the retrieval tool, and the installation script is as follows: -#### 2.2.1 Single Image Recognition +```python +python3.7 -m pip install faiss-cpu==1.7.1post2 +``` -Run the following command to identify and retrieve the image `./recognition_demo_data_v1.1/test_product/daoxiangcunjinzhubing_6.jpg` for recognition and retrieval +If `faiss` cannot be importted, try reinstall it, especially for windows users. -```shell -# use the following command to predict using GPU. -python3.7 python/predict_system.py -c configs/inference_product.yaml -# use the following command to predict using CPU -python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.use_gpu=False -``` + +##### 2.2.2.1 single image recognition -The image to be retrieved is shown below. +Run the following script to recognize the image `./drink_dataset_v2.0/test_images/100.jpeg` -![](../../images/recognition/product_demo/query/daoxiangcunjinzhubing_6.jpg) +The images to be retrieved are as follows +![](../../images/recognition/drink_data_demo/test_images/100.jpeg) -The final output is shown below. +```shell +# Use the script below to make predictions using the GPU +python3.7 python/predict_system.py -c configs/inference_general.yaml -``` -[{'bbox': [287, 129, 497, 326], 'rec_docs': 'Daoxaingcun Golden Piggie Cake', 'rec_scores': 0.8309420347213745}, {'bbox': [99, 242, 313, 426], 'rec_docs': 'Daoxaingcun Golden Piggie Cake', 'rec_scores': 0.7245651483535767}] +# Use the following script to make predictions using the CPU +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False ``` +The final output is as follows. -where bbox indicates the location of the detected object, rec_docs indicates the labels corresponding to the label in the index dabase that are most similar to the detected object, and rec_scores indicates the corresponding confidence. +```log +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs' : '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] +``` +Where `bbox` represents the location of the detected object, `rec_docs` represents the most similar category to the detection box in the index database, and `rec_scores` represents the corresponding similarity. -The detection result is also saved in the folder `output`, for this image, the visualization result is as follows. +The visualization results of the recognition are saved in the `output` folder by default. For this image, the visualization of the recognition results is shown below. -![](../../images/recognition/product_demo/result/daoxiangcunjinzhubing_6_en.jpg) +![](../../images/recognition/drink_data_demo/output/100.jpeg) + - -#### 2.2.2 Folder-based Batch Recognition +##### 2.2.2.2 Folder-based batch recognition -If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter. +If you want to use multi images in the folder for prediction, you can modify the `Global.infer_imgs` field in the configuration file, or you can modify the corresponding configuration through the `-o` parameter below. ```shell -# using the following command to predict using GPU, you can append `-o Global.use_gpu=False` to predict using CPU. -python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/" +# Use the following script to use GPU for prediction, if you want to use CPU prediction, you can add -o Global.use_gpu=False after the script +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/" ``` +The recognition results of all images in the folder will be output in the terminal, as shown below. -The results on the screen are shown as following. - -``` +```log ... -[{'bbox': [37, 29, 123, 89], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.6163763999938965}, {'bbox': [153, 96, 235, 175], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5279821157455444}] -[{'bbox': [735, 562, 1133, 851], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5588355660438538}] -[{'bbox': [124, 50, 230, 129], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.6980369687080383}] -[{'bbox': [0, 0, 275, 183], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5818190574645996}] -[{'bbox': [400, 1179, 905, 1537], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.9814301133155823}, {'bbox': [295, 713, 820, 1046], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.9496176242828369}, {'bbox': [153, 236, 694, 614], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.8395382761955261}] -[{'bbox': [544, 4, 1482, 932], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5143815279006958}] +[{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}] +Inference: 120.39852142333984 ms per batch image +[{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}] +Inference: 32.045602798461914 ms per batch image +[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}] +Inference: 113.41428756713867 ms per batch image +[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}] +Inference: 122.04337120056152 ms per batch image +[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}] +Inference: 37.95266151428223 ms per batch image +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] ... ``` -All the visualization results are also saved in folder `output`. +Visualizations of recognition results for all images are also saved in the `output` folder. + +Furthermore, you can change the path of the recognition inference model by modifying the `Global.rec_inference_model_dir` field, and change the path of the index database by modifying the `IndexProcess.index_dir` field. + -Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_dir` field. +### 2.3 Image of Unknown categories recognition experience +Now we try to recognize the unseen image `./drink_dataset_v2.0/test_images/mosilian.jpeg` - -## 3. Recognize Images of Unknown Category +The images to be retrieved are as follows -To recognize the image `./recognition_demo_data_v1.1/test_product/anmuxi.jpg`, run the command as follows: +![](../../images/recognition/drink_data_demo/test_images/mosilian.jpeg) + +Execute the following identification script ```shell -python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/anmuxi.jpg" +# Use the following script to use GPU for prediction, if you want to use CPU prediction, you can add -o Global.use_gpu=False after the script +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg" ``` -The image to be retrieved is shown below. +It can be found that the output result is empty + +Since the default index database does not contain the unknown category's information, the recognition result here is wrong. At this time, we can achieve the image recognition of unknown classes by building a new index database. + +When the images in the index database cannot cover the scene we actually recognize, i.e. recognizing an image of an unknown category, we need to add a similar image(at least one) belong the unknown category to the index database. This process does not require re-training the model. Take `mosilian.jpeg` as an example, just follow the steps below to rebuild a new index database. + + + +#### 2.3.1 Prepare new data and labels -![](../../images/recognition/product_demo/query/anmuxi.jpg) +First, copy the image(s) belong to unknown category(except the query image) to the original image folder of the index database. Here we already put all the image data in the folder `drink_dataset_v2.0/gallery/`. -The output is empty. +Then we need to edit the text file that records the image path and label information. Here we already put the updated label information file in the `drink_dataset_v2.0/gallery/drink_label_all.txt` file. Comparing with the original `drink_dataset_v2.0/gallery/drink_label.txt` label file, it can be found that the index images of the bright and ternary series of milk have been added. -Since the index infomation is not included in the corresponding index databse, the recognition result is empty or not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database. +In each line of text, the first field represents the relative path of the image, and the second field represents the label information corresponding to the image, separated by the `\t` key (Note: some editors will automatically convert `tab` is `space`, in which case it will cause a file parsing error). -When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining. + - -### 3.1 Prepare for the new images and labels +#### 2.3.2 Create a new index database -First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows. +Build a new index database `index_all` with the following scripts. ```shell -cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./recognition_demo_data_/gallery_product/gallery/ +python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v2.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all" ``` -Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one. +The final constructed new index database is saved in the folder `./drink_dataset_v2.0/index_all`. For specific instructions on yaml `yaml`, please refer to [Vector Search Documentation](../image_recognition_pipeline/vector_search.md). + + + +#### 2.3.3 Image recognition based on the new index database + +To re-recognize the `mosilian.jpeg` image using the new index database, run the following scripts. ```shell -# copy the file -cp recognition_demo_data_v1.1/gallery_product/data_file.txt recognition_demo_data_v1.1/gallery_product/data_file_update.txt +# run the following script predict with GPU, if you want to use CPU, you can add -o Global.use_gpu=False after the script +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all" ``` -Then add some new lines into the new label file, which is shown as follows. +The output is as follows. -``` -gallery/anmuxi/001.jpg Anmuxi Ambrosial Yogurt -gallery/anmuxi/002.jpg Anmuxi Ambrosial Yogurt -gallery/anmuxi/003.jpg Anmuxi Ambrosial Yogurt -gallery/anmuxi/004.jpg Anmuxi Ambrosial Yogurt -gallery/anmuxi/005.jpg Anmuxi Ambrosial Yogurt -gallery/anmuxi/006.jpg Anmuxi Ambrosial Yogurt +```log +[{'bbox': [290, 297, 564, 919], 'rec_docs': 'Bright_Mosleyan', 'rec_scores': 0.59137374}] ``` -Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `tab` here. +The final recognition result is `光明_莫斯利安`, we can see the recognition result is correct now , and the visualization of the recognition result is shown below. +![](../../images/recognition/drink_data_demo/output/mosilian.jpeg) - -### 3.2 Build a new Index Base Library -Use the following command to build the index to accelerate the retrieval process after recognition. + -```shell -python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./recognition_demo_data_v1.1/gallery_product/data_file_update.txt" -o IndexProcess.index_dir="./recognition_demo_data_v1.1/gallery_product/index_update" -``` +### 2.4 List of server recognition models -Finally, the new index information is stored in the folder`./recognition_demo_data_v1.1/gallery_product/index_update`. Use the new index database for the above index. +At present, we recommend to use model in [Lightweight General Object Detection Model and Lightweight General Recognition Model](#22-image-recognition-experience) to get better test results. However, if you want to experience the general recognition model, general object detection model and other recognition model for server, the test data download path, and the corresponding configuration file path are as follows. +| Model Introduction | Recommended Scenarios | Inference Model | Prediction Profile | +| --------------------------------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | +| General Body Detection Model | General Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | +| Logo Recognition Model | Logo Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo. yaml](../../../deploy/configs/inference_logo.yaml) | +| Anime Character Recognition Model | Anime Character Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [ inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | +| Vehicle Subdivision Model | Vehicle Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle .yaml](../../../deploy/configs/inference_vehicle.yaml) | +| Product Recognition Model | Product Scene | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product. yaml](../../../deploy/configs/inference_product.yaml) | +| Vehicle ReID Model | Vehicle ReID Scene | [Model download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | [inference_vehicle .yaml](../../../deploy/configs/inference_vehicle.yaml) | - -### 3.3 Recognize the Unknown Category Images +The above models can be downloaded to the `deploy/models` folder by the following script for use in recognition tasks +```shell +cd ./deploy +mkdir -p models -To recognize the image `./recognition_demo_data_v1.1/test_product/anmuxi.jpg`, run the command as follows. +cd ./models +# Download the generic object detection model for server and unzip it +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar +# Download the generic recognition model and unzip it +wget {recognize model download link path} && tar -xf {name of compressed package} +``` + +Then use the following scripts to download the test data for other recognition scenario: ```shell -# using the following command to predict using GPU, you can append `-o Global.use_gpu=False` to predict using CPU. -python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/anmuxi.jpg" -o IndexProcess.index_dir="./recognition_demo_data_v1.1/gallery_product/index_update" +# Go back to the deploy directory +cd.. +# Download test data and unzip +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar && tar -xf recognition_demo_data_en_v1.1.tar ``` -The output is as follows: +After decompression, the `recognition_demo_data_v1.1` folder should have the following file structure: -``` -[{'bbox': [243, 80, 523, 522], 'rec_docs': 'Anmuxi Ambrosial Yogurt', 'rec_scores': 0.5570770502090454}] +```log +├── recognition_demo_data_v1.1 +│ ├── gallery_cartoon +│ ├── gallery_logo +│ ├── gallery_product +│ ├── gallery_vehicle +│ ├── test_cartoon +│ ├── test_logo +│ ├── test_product +│ └── test_vehicle +├── ... ``` -The final recognition result is `Anmuxi Ambrosial Yogurt`, which is corrrect, the visualization result is as follows. +After downloading the model and test data according to the above steps, you can re-build the index database and test the relevant recognition model. -![](../../images/recognition/product_demo/result/anmuxi_en.jpg) - +* For more introduction to object detection, please refer to: [Object Detection Tutorial Document](../image_recognition_pipeline/mainbody_detection.md); for the introduction of feature extraction, please refer to: [Feature Extraction Tutorial Document](../image_recognition_pipeline/feature_extraction.md); for the introduction to vector search, please refer to: [vector search tutorial document](../image_recognition_pipeline/vector_search.md). diff --git a/docs/images/deep_hash/DCH.png b/docs/images/deep_hash/DCH.png new file mode 100644 index 0000000000000000000000000000000000000000..63cf004dac575c4b1badeca8883d9371c6a45eb9 Binary files /dev/null and b/docs/images/deep_hash/DCH.png differ diff --git a/docs/images/deep_hash/DSHSD.png b/docs/images/deep_hash/DSHSD.png new file mode 100644 index 0000000000000000000000000000000000000000..b4d0406990f7280d191c4d2f23be12a968d7b350 Binary files /dev/null and b/docs/images/deep_hash/DSHSD.png differ diff --git a/docs/images/deep_hash/LCDSH.png b/docs/images/deep_hash/LCDSH.png new file mode 100644 index 0000000000000000000000000000000000000000..4717283981f518a7764e28fe7c66102ee4c47ca0 Binary files /dev/null and b/docs/images/deep_hash/LCDSH.png differ diff --git a/docs/images/det/PaddleDetection_config.png b/docs/images/det/PaddleDetection_config.png index d18932b66cc148b7796fe4b319ad9eb82c2a2868..53248227535c9089ca1f70fbba0d69f224fd8a54 100644 Binary files a/docs/images/det/PaddleDetection_config.png and b/docs/images/det/PaddleDetection_config.png differ diff --git a/docs/images/ppshitu_application_scenarios/100sports.jpg b/docs/images/ppshitu_application_scenarios/100sports.jpg new file mode 100644 index 0000000000000000000000000000000000000000..baca0eb2e2a57abfa3c90c89e95ffb59ebbde994 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/100sports.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/104flowers.jpeg b/docs/images/ppshitu_application_scenarios/104flowers.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..e420fcc655836e9bf3e214c7561ccd39b71333bf Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/104flowers.jpeg differ diff --git a/docs/images/ppshitu_application_scenarios/AID.jpg b/docs/images/ppshitu_application_scenarios/AID.jpg new file mode 100644 index 0000000000000000000000000000000000000000..569c4c426b37ae4947e376e8d084e31ff896a69f Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/AID.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/AnimalImageDataset.jpg b/docs/images/ppshitu_application_scenarios/AnimalImageDataset.jpg new file mode 100644 index 0000000000000000000000000000000000000000..da88a61b9c8201a136316ce719193d65cf99e918 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/AnimalImageDataset.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Ball.jpg b/docs/images/ppshitu_application_scenarios/Ball.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6d8e2ff079b9aaf6fd89eb273eed7522d47a4311 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Ball.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Bird400.jpg b/docs/images/ppshitu_application_scenarios/Bird400.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c86abd202132e63965492d05147c6a6d2acdccbf Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Bird400.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Boat.jpg b/docs/images/ppshitu_application_scenarios/Boat.jpg new file mode 100644 index 0000000000000000000000000000000000000000..f3510422648326766252a83ed29b36ecdd934f4d Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Boat.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Butterfly.jpg b/docs/images/ppshitu_application_scenarios/Butterfly.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b17373a74008ec5c40e57d060dce67b06620660c Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Butterfly.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/DogBreeds.jpg b/docs/images/ppshitu_application_scenarios/DogBreeds.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4d82a2ca2213b420b5cb372ffbf388d527bc065d Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/DogBreeds.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/FashionProductsImage.jpg b/docs/images/ppshitu_application_scenarios/FashionProductsImage.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4e6a7df8d359063cdec0e499e5c8a89f1630b287 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/FashionProductsImage.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Garbage12.jpg b/docs/images/ppshitu_application_scenarios/Garbage12.jpg new file mode 100644 index 0000000000000000000000000000000000000000..1543820443254f63a9a307c771e3479ba1b0a9f9 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Garbage12.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Gemstones.jpg b/docs/images/ppshitu_application_scenarios/Gemstones.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c025330aeaf27483d4f79cf852a6b8e6cc1e38c5 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Gemstones.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Logo3K.jpg b/docs/images/ppshitu_application_scenarios/Logo3K.jpg new file mode 100644 index 0000000000000000000000000000000000000000..465ac49a7ddded3a8f4ab66d0657aae1988eec1c Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Logo3K.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/MusicInstruments.jpg b/docs/images/ppshitu_application_scenarios/MusicInstruments.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0a6c02b1103d987e59c7fe8d7aa27b9ef6abf474 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/MusicInstruments.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Paris.jpg b/docs/images/ppshitu_application_scenarios/Paris.jpg new file mode 100644 index 0000000000000000000000000000000000000000..724ae3a85389adeaffbfc4014938c733383a5bb5 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Paris.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Pokemon.png b/docs/images/ppshitu_application_scenarios/Pokemon.png new file mode 100644 index 0000000000000000000000000000000000000000..ca7d438a9c86d305931a3a572cc06da7226bd80f Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Pokemon.png differ diff --git a/docs/images/ppshitu_application_scenarios/Shoes.jpeg b/docs/images/ppshitu_application_scenarios/Shoes.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..d788e679cf3a46e5f836d9c5cb84d165d3623c53 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Shoes.jpeg differ diff --git a/docs/images/ppshitu_application_scenarios/TreeNuts.jpg b/docs/images/ppshitu_application_scenarios/TreeNuts.jpg new file mode 100644 index 0000000000000000000000000000000000000000..3211a8d0c76da60b7aeb340a62f0804c6cc43513 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/TreeNuts.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Vechicles.jpg b/docs/images/ppshitu_application_scenarios/Vechicles.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c22686864fa8128067ed6acb24c234afe97a5271 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Vechicles.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/Veg200.jpg b/docs/images/ppshitu_application_scenarios/Veg200.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d64331217966f354b0de2befaa3be575a03dd8d7 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/Veg200.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/WeatherImageRecognition.jpg b/docs/images/ppshitu_application_scenarios/WeatherImageRecognition.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c0636d7f9e2e3dc08c56050008a010e6a4ce4b54 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/WeatherImageRecognition.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/WildEdiblePlants.jpg b/docs/images/ppshitu_application_scenarios/WildEdiblePlants.jpg new file mode 100644 index 0000000000000000000000000000000000000000..feb7cbb09750dec537ecb40dc5faaa9576afca40 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/WildEdiblePlants.jpg differ diff --git a/docs/images/ppshitu_application_scenarios/systerm_result.jpg b/docs/images/ppshitu_application_scenarios/systerm_result.jpg new file mode 100644 index 0000000000000000000000000000000000000000..ab9bc1cb523d62dcd36864ef3487bbcff42e2ae3 Binary files /dev/null and b/docs/images/ppshitu_application_scenarios/systerm_result.jpg differ diff --git a/docs/images/quick_start/android_demo/PPShiTu_qrcode.png b/docs/images/quick_start/android_demo/PPShiTu_qrcode.png new file mode 100644 index 0000000000000000000000000000000000000000..6b75ce4f3d460b6bb49e8c3eb13af963002eac83 Binary files /dev/null and b/docs/images/quick_start/android_demo/PPShiTu_qrcode.png differ diff --git a/docs/images/quick_start/android_demo/android_nongfu_spring.JPG b/docs/images/quick_start/android_demo/android_nongfu_spring.JPG new file mode 100644 index 0000000000000000000000000000000000000000..9c1d4ffc87082dfb73c5634f3d5454f72f8c1d98 Binary files /dev/null and b/docs/images/quick_start/android_demo/android_nongfu_spring.JPG differ diff --git a/docs/images/quick_start/android_demo/baocunxiugai_100.png b/docs/images/quick_start/android_demo/baocunxiugai_100.png new file mode 100644 index 0000000000000000000000000000000000000000..94f7ba7f16b418a5a461ab3d903a976b4fd86120 Binary files /dev/null and b/docs/images/quick_start/android_demo/baocunxiugai_100.png differ diff --git a/docs/images/quick_start/android_demo/bendishangchuan_100.png b/docs/images/quick_start/android_demo/bendishangchuan_100.png new file mode 100644 index 0000000000000000000000000000000000000000..a9263a32e88ef3d9b29268874ac5beb87f836e63 Binary files /dev/null and b/docs/images/quick_start/android_demo/bendishangchuan_100.png differ diff --git a/docs/images/quick_start/android_demo/bendishibie_100.png b/docs/images/quick_start/android_demo/bendishibie_100.png new file mode 100644 index 0000000000000000000000000000000000000000..ee1715d8aed47eeb36f90848416353ce6624a22e Binary files /dev/null and b/docs/images/quick_start/android_demo/bendishibie_100.png differ diff --git a/docs/images/quick_start/android_demo/leibiechaxun_100.png b/docs/images/quick_start/android_demo/leibiechaxun_100.png new file mode 100644 index 0000000000000000000000000000000000000000..f8e71b864cd21896795ffdaf3869e1b098e0a5bb Binary files /dev/null and b/docs/images/quick_start/android_demo/leibiechaxun_100.png differ diff --git a/docs/images/quick_start/android_demo/paizhaoshangchuan_100.png b/docs/images/quick_start/android_demo/paizhaoshangchuan_100.png new file mode 100644 index 0000000000000000000000000000000000000000..3f3b5544b303353b1ccf486bba207ad4bfe027e5 Binary files /dev/null and b/docs/images/quick_start/android_demo/paizhaoshangchuan_100.png differ diff --git a/docs/images/quick_start/android_demo/paizhaoshibie_100.png b/docs/images/quick_start/android_demo/paizhaoshibie_100.png new file mode 100644 index 0000000000000000000000000000000000000000..13df35310c415273abd3577df2c38d2ff82fedb9 Binary files /dev/null and b/docs/images/quick_start/android_demo/paizhaoshibie_100.png differ diff --git a/docs/images/quick_start/android_demo/reset_100.png b/docs/images/quick_start/android_demo/reset_100.png new file mode 100644 index 0000000000000000000000000000000000000000..93f6f6223a890bd442bf9a6dd5bbb5ba2514138c Binary files /dev/null and b/docs/images/quick_start/android_demo/reset_100.png differ diff --git a/docs/images/recognition/drink_data_demo/output/mosilian.jpeg b/docs/images/recognition/drink_data_demo/output/mosilian.jpeg index ca7f2dbb0575001ff79b81a9fad2827cbd5261cb..909ddd91fb36e2fddb794ef96e62059dc331370d 100644 Binary files a/docs/images/recognition/drink_data_demo/output/mosilian.jpeg and b/docs/images/recognition/drink_data_demo/output/mosilian.jpeg differ diff --git a/docs/images/recognition/drink_data_demo/test_images/100.jpeg b/docs/images/recognition/drink_data_demo/test_images/100.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..e5f845ed2842f67352ffda26380b59a35d0449f9 Binary files /dev/null and b/docs/images/recognition/drink_data_demo/test_images/100.jpeg differ diff --git a/docs/images/shitu_index_manager/all_menu.png b/docs/images/shitu_index_manager/all_menu.png new file mode 100644 index 0000000000000000000000000000000000000000..768041a77d89e76c77fdc445bfb1bafd7e64d1fa Binary files /dev/null and b/docs/images/shitu_index_manager/all_menu.png differ diff --git a/docs/images/shitu_index_manager/creat_images.png b/docs/images/shitu_index_manager/creat_images.png new file mode 100644 index 0000000000000000000000000000000000000000..8a2d7914c754b4b9d810ed3ac094bbae57dc0758 Binary files /dev/null and b/docs/images/shitu_index_manager/creat_images.png differ diff --git a/docs/images/shitu_index_manager/image_operation.png b/docs/images/shitu_index_manager/image_operation.png new file mode 100644 index 0000000000000000000000000000000000000000..86457fe5e439d793cd7920a746e9500f758f9bf5 Binary files /dev/null and b/docs/images/shitu_index_manager/image_operation.png differ diff --git a/docs/images/shitu_index_manager/main_page.png b/docs/images/shitu_index_manager/main_page.png new file mode 100644 index 0000000000000000000000000000000000000000..2b69ca7f4efb12ff1724a2a2598a8b7f8b0ca241 Binary files /dev/null and b/docs/images/shitu_index_manager/main_page.png differ diff --git a/docs/images/shitu_index_manager/menu.png b/docs/images/shitu_index_manager/menu.png new file mode 100644 index 0000000000000000000000000000000000000000..90dd4c709a85b76f3f284069c04ff74d1ed25720 Binary files /dev/null and b/docs/images/shitu_index_manager/menu.png differ diff --git a/docs/images/shituv2.gif b/docs/images/shituv2.gif new file mode 100644 index 0000000000000000000000000000000000000000..5a9a4b84232f813ea866e5c0f283c6768823374b Binary files /dev/null and b/docs/images/shituv2.gif differ diff --git a/docs/images/structure.jpg b/docs/images/structure.jpg index 1d7f5a17b5377606a0e69a6fb45fdab1652af5d0..14948bd4214f4cb321e9a6b9819f2a42ba3ac385 100644 Binary files a/docs/images/structure.jpg and b/docs/images/structure.jpg differ diff --git a/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md b/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md new file mode 100644 index 0000000000000000000000000000000000000000..bac4f8acbaf6da54e1bb15a44d151a9a37a20769 --- /dev/null +++ b/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md @@ -0,0 +1,254 @@ +## PP-ShiTu V2图像识别系统 + +## 目录 + +- [1. PP-ShiTu V2模型和应用场景介绍](#1-pp-shituv2模型和应用场景介绍) +- [2. 模型快速体验](#2-模型快速体验) + - [2.1 PP-ShiTu android demo 快速体验](#21-pp-shitu-android-demo-快速体验) + - [2.2 命令行代码快速体验](#22-命令行代码快速体验) +- [3 模块介绍与训练](#3-模块介绍与训练) + - [3.1 主体检测](#31-主体检测) + - [3.2 特征提取](#32-特征提取) + - [3.3 向量检索](#33-向量检索) +- [4. 推理部署](#4-推理部署) + - [4.1 推理模型准备](#41-推理模型准备) + - [4.1.1 基于训练得到的权重导出 inference 模型](#411-基于训练得到的权重导出-inference-模型) + - [4.1.2 直接下载 inference 模型](#412-直接下载-inference-模型) + - [4.2 测试数据准备](#42-测试数据准备) + - [4.3 基于 Python 预测引擎推理](#43-基于-python-预测引擎推理) + - [4.3.1 预测单张图像](#431-预测单张图像) + - [4.3.2 基于文件夹的批量预测](#432-基于文件夹的批量预测) + - [4.4 基于 C++ 预测引擎推理](#44-基于-c-预测引擎推理) + - [4.5 服务化部署](#45-服务化部署) + - [4.6 端侧部署](#46-端侧部署) + - [4.7 Paddle2ONNX 模型转换与预测](#47-paddle2onnx-模型转换与预测) +- [参考文献](#参考文献) + +## 1. PP-ShiTuV2模型和应用场景介绍 + +PP-ShiTuV2 是基于 PP-ShiTuV1 改进的一个实用轻量级通用图像识别系统,由主体检测、特征提取、向量检索三个模块构成,相比 PP-ShiTuV1 具有更高的识别精度、更强的泛化能力以及相近的推理速度*。主要针对训练数据集、特征提取两个部分进行优化,使用了更优的骨干网络、损失函数与训练策略,使得 PP-ShiTuV2 在多个实际应用场景上的检索性能有显著提升。 + +**本文档提供了用户使用 PaddleClas 的 PP-ShiTuV2 图像识别方案进行快速构建轻量级、高精度、可落地的图像识别pipeline。该pipeline可以广泛应用于商场商品识别场景、安防人脸或行人识别场景、海量图像检索过滤等场景中。** + +
+ +
+ +下表列出了 PP-ShiTuV2 用不同的模型结构与训练策略所得到的相关指标, + +| 模型 | 存储(主体检测+特征提取) | product | +| :--------- | :---------------------- | :------------------ | +| | | recall@1 | +| PP-ShiTuV1 | 64(30+34)MB | 66.8% | +| PP-ShiTuV2 | 49(30+19) | 73.8% | + +**注:** +- recall及mAP指标的介绍可以参考 [常用指标](../algorithm_introduction/reid.md#22-常用指标)。 +- 延时是基于 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 测试得到,开启 MKLDNN 加速策略,线程数为10。 + +## 2. 模型快速体验 + +### 2.1 PP-ShiTu android demo 快速体验 + +可以通过扫描二维码或者 [点击链接](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk) 下载并安装APP + +
+ +然后将以下体验图片保存到手机上: + +
+ +打开安装好的APP,点击下方“**本地识别**”按钮,选择上面这张保存的图片,再点击确定,就能得到如下识别结果: + +
+ +更详细的说明参考[PP-ShiTu android demo功能说明](https://github.com/weisy11/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_recognition.md) + +### 2.2 命令行代码快速体验 + +- 首先按照以下命令,安装paddlepaddle和faiss + ```shell + # 如果您的机器安装的是 CUDA9 或 CUDA10,请运行以下命令安装 + python3.7 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple + + # 如果您的机器是CPU,请运行以下命令安装 + python3.7 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple + + # 安装 faiss 库 + python3.7 -m pip install faiss-cpu==1.7.1post2 + ``` + +- 然后按照以下命令,安装paddleclas whl包 + ```shell + # 进入到PaddleClas根目录下 + cd PaddleClas + + # 安装paddleclas + python3.7 setup.py install + ``` + +- 然后执行以下命令下载并解压好demo数据,最后执行一行命令体验图像识别 + + ```shell + # 下载并解压demo数据 + wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar + + # 执行识别命令 + paddleclas \ + --model_name=PP-ShiTuV2 \ + --infer_imgs=./drink_dataset_v2.0/test_images/100.jpeg \ + --index_dir=./drink_dataset_v2.0/index/ \ + --data_file=./drink_dataset_v2.0/gallery/drink_label.txt + ``` + +## 3 模块介绍与训练 + +### 3.1 主体检测 + +主体检测是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来进行识别。主体检测是识别任务的前序步骤,输入图像经过主体检测后再进行识别,可以过滤复杂背景,有效提升识别精度。 + +考虑到检测速度、模型大小、检测精度等因素,最终选择 PaddleDetection 自研的轻量级模型 `PicoDet-LCNet_x2_5` 作为 PP-ShiTuV2 的主体检测模型 + +主体检测模型的数据集、训练、评估、推理等详细信息可以参考文档:[picodet_lcnet_x2_5_640_mainbody](../image_recognition_pipeline/mainbody_detection.md)。 + +### 3.2 特征提取 + +特征提取是图像识别中的关键一环,它的作用是将输入的图片转化为固定维度的特征向量,用于后续的 [向量检索](./vector_search.md) 。考虑到特征提取模型的速度、模型大小、特征提取性能等因素,最终选择 PaddleClas 自研的 [`PPLCNetV2_base`](../models/PP-LCNetV2.md) 作为特征提取网络。相比 PP-ShiTuV1 所使用的 `PPLCNet_x2_5`, `PPLCNetV2_base` 基本保持了较高的分类精度,并减少了40%的推理时间*。 + +**注:** *推理环境基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。 + +在实验过程中我们也发现可以对 `PPLCNetV2_base` 进行适当的改进,在保持速度基本不变的情况下,让其在识别任务中得到更高的性能,包括:去掉 `PPLCNetV2_base` 末尾的 `ReLU` 和 `FC`、将最后一个 stage(RepDepthwiseSeparable) 的 stride 改为1。 + +特征提取模型的数据集、训练、评估、推理等详细信息可以参考文档:[PPLCNetV2_base_ShiTu](../image_recognition_pipeline/feature_extraction.md)。 + +### 3.3 向量检索 + +向量检索技术在图像识别、图像检索中应用比较广泛。其主要目标是对于给定的查询向量,在已经建立好的向量库中进行特征向量的相似度或距离计算,返回候选向量的相似度排序结果。 + +在 PP-ShiTuV2 识别系统中,我们使用了 [Faiss](https://github.com/facebookresearch/faiss) 向量检索开源库对此部分进行支持,其具有适配性好、安装方便、算法丰富、同时支持CPU与GPU的优点。 + +PP-ShiTuV2 系统中关于 Faiss 向量检索库的安装及使用可以参考文档:[vector search](../image_recognition_pipeline/vector_search.md)。 + +## 4. 推理部署 + +### 4.1 推理模型准备 +Paddle Inference 是飞桨的原生推理库, 作用于服务器端和云端,提供高性能的推理能力。相比于直接基于预训练模型进行预测,Paddle Inference可使用 MKLDNN、CUDNN、TensorRT 进行预测加速,从而实现更优的推理性能。更多关于 Paddle Inference 推理引擎的介绍,可以参考 [Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。 + +当使用 Paddle Inference 推理时,加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法,如果希望得到和文档相同的结果,请选择 [直接下载 inference 模型](#412-直接下载-inference-模型) 的方式。 + +#### 4.1.1 基于训练得到的权重导出 inference 模型 +- 主体检测模型权重导出请参考文档 [主体检测推理模型准备](../image_recognition_pipeline/mainbody_detection.md#41-推理模型准备),或者参照 [4.1.2](#412-直接下载-inference-模型) 直接下载解压即可。 + +- 特征提取模型权重导出可以参考以下命令: + ```shell + python3.7 tools/export_model.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ + -o Global.pretrained_model="https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams" \ + -o Global.save_inference_dir=deploy/models/GeneralRecognitionV2_PPLCNetV2_base` + ``` + 执行完该脚本后会在 `deploy/models/` 下生成 `GeneralRecognitionV2_PPLCNetV2_base` 文件夹,具有如下文件结构: + + ```log + deploy/models/ + ├── GeneralRecognitionV2_PPLCNetV2_base + │ ├── inference.pdiparams + │ ├── inference.pdiparams.info + │ └── inference.pdmodel + ``` + +#### 4.1.2 直接下载 inference 模型 + +[4.1.1 小节](#411-基于训练得到的权重导出-inference-模型) 提供了导出 inference 模型的方法,此处提供我们导出好的 inference 模型,可以按以下命令,下载模型到指定位置解压进行体验。 + +```shell +cd deploy/models + +# 下载主体检测inference模型并解压 +wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar + +# 下载特征提取inference模型并解压 +wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1. +``` + +### 4.2 测试数据准备 + +准备好主体检测、特征提取模型之后,还需要准备作为输入的测试数据,可以执行以下命令下载并解压测试数据。 + +```shell +# 返回deploy +cd ../ + +# 下载测试数据drink_dataset_v2.0,并解压 +wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar +``` + +### 4.3 基于 Python 预测引擎推理 + +#### 4.3.1 预测单张图像 + +然后执行以下命令对单张图像 `./drink_dataset_v2.0/test_images/100.jpeg` 进行识别。 + +```shell +# 执行下面的命令使用 GPU 进行预测 +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg" + +# 执行下面的命令使用 CPU 进行预测 +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/100.jpeg" -o Global.use_gpu=False +``` + +最终输出结果如下。 + +```log +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] +``` + +#### 4.3.2 基于文件夹的批量预测 + +如果希望预测文件夹内的图像,可以直接修改配置文件中的 Global.infer_imgs 字段,也可以通过下面的 -o 参数修改对应的配置。 + +```shell +# 使用下面的命令使用 GPU 进行预测 +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images" +# 使用下面的命令使用 CPU 进行预测 +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images" -o Global.use_gpu=False +``` + +终端中会输出该文件夹内所有图像的分类结果,如下所示。 + +```log +... +[{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}] +Inference: 120.39852142333984 ms per batch image +[{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}] +Inference: 32.045602798461914 ms per batch image +[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}] +Inference: 113.41428756713867 ms per batch image +[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}] +Inference: 122.04337120056152 ms per batch image +[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}] +Inference: 37.95266151428223 ms per batch image +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] +... +``` + +其中 `bbox` 表示检测出的主体所在位置,`rec_docs` 表示索引库中与检测框最为相似的类别,`rec_scores` 表示对应的相似度。 + +### 4.4 基于 C++ 预测引擎推理 +PaddleClas 提供了基于 C++ 预测引擎推理的示例,您可以参考 [服务器端 C++ 预测](../../../deploy/cpp_shitu/readme.md) 来完成相应的推理部署。如果您使用的是 Windows 平台,可以参考 [基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md) 完成相应的预测库编译和模型预测工作。 + +### 4.5 服务化部署 +Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍,可以参考 [Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。 + +PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例,您可以参考 [模型服务化部署](../inference_deployment/recognition_serving_deploy.md) 来完成相应的部署工作。 + +### 4.6 端侧部署 +Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架,定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍,可以参考 [Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。 + +### 4.7 Paddle2ONNX 模型转换与预测 +Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署,包括TensorRT/OpenVINO/MNN/TNN/NCNN,以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍,可以参考 [Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。 + +PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例,您可以参考 [Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md) 来完成相应的部署工作。 + +## 参考文献 +1. Schall, Konstantin, et al. "GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval." International Conference on Multimedia Modeling. Springer, Cham, 2022. +2. Luo, Hao, et al. "A strong baseline and batch normalization neck for deep person re-identification." IEEE Transactions on Multimedia 22.10 (2019): 2597-2609. diff --git a/docs/zh_CN/advanced_tutorials/knowledge_distillation.md b/docs/zh_CN/advanced_tutorials/knowledge_distillation.md index 5a35843767136a367e03b28ef080daace77648d6..43fa60623a84c2db9726f9ef880f2c70cd08064e 100644 --- a/docs/zh_CN/advanced_tutorials/knowledge_distillation.md +++ b/docs/zh_CN/advanced_tutorials/knowledge_distillation.md @@ -16,6 +16,7 @@ - [1.2.5 DKD](#1.2.5) - [1.2.6 DIST](#1.2.6) - [1.2.7 MGD](#1.2.7) + - [1.2.8 WSL](#1.2.8) - [2. 使用方法](#2) - [2.1 环境配置](#2.1) - [2.2 数据准备](#2.2) @@ -399,7 +400,7 @@ DKD将蒸馏中常用的 KD Loss 进行了解耦成为Target Class Knowledge Dis | 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 | | --- | --- | --- | --- | --- | | baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - | -| AFD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - | +| DKD | ResNet18 | [resnet34_distill_resnet18_dkd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_dkd.yaml) | 72.59%(**+1.79%**) | - | ##### 1.2.5.2 DKD 配置 @@ -533,7 +534,7 @@ Loss: | 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 | | --- | --- | --- | --- | --- | | baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - | -| MGD | ResNet18 | [resnet34_distill_resnet18_dist.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - | +| MGD | ResNet18 | [resnet34_distill_resnet18_mgd.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_mgd.yaml) | 71.86%(**+1.06%**) | - | ##### 1.2.7.2 MGD 配置 @@ -583,6 +584,73 @@ Loss: weight: 1.0 ``` + + +#### 1.2.8 WSL + +##### 1.2.8.1 WSL 算法介绍 + +论文信息: + + +> [Rethinking Soft Labels For Knowledge Distillation: A Bias-variance Tradeoff Perspective](https://arxiv.org/abs/2102.0650) +> +> Helong Zhou, Liangchen Song, Jiajie Chen, Ye Zhou, Guoli Wang, Junsong Yuan, Qian Zhang +> +> ICLR, 2021 + +WSL (Weighted Soft Labels) 损失函数根据教师模型与学生模型关于真值标签的 CE Loss 比值,对每个样本的 KD Loss 分别赋予权重。若学生模型相对教师模型在某个样本上预测结果更好,则对该样本赋予较小的权重。该方法简单、有效,使各个样本的权重可自适应调节,提升了蒸馏精度。 + +在ImageNet1k公开数据集上,效果如下所示。 + +| 策略 | 骨干网络 | 配置文件 | Top-1 acc | 下载链接 | +| --- | --- | --- | --- | --- | +| baseline | ResNet18 | [ResNet18.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet18.yaml) | 70.8% | - | +| WSL | ResNet18 | [resnet34_distill_resnet18_wsl.yaml](../../../ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml) | 72.23%(**+1.43%**) | - | + + +##### 1.2.8.2 WSL 配置 + +WSL 配置如下所示。在模型构建Arch字段中,需要同时定义学生模型与教师模型,教师模型固定参数,且需要加载预训练模型。在损失函数Loss字段中,需要定义`DistillationGTCELoss`(学生与真值标签之间的CE loss)以及`DistillationWSLLoss`(学生与教师之间的WSL loss),作为训练的损失函数。 + + +```yaml +# model architecture +Arch: + name: "DistillationModel" + # if not null, its lengths should be same as models + pretrained_list: + # if not null, its lengths should be same as models + freeze_params_list: + - True + - False + models: + - Teacher: + name: ResNet34 + pretrained: True + + - Student: + name: ResNet18 + pretrained: False + + infer_model_name: "Student" + + +# loss function config for traing/eval process +Loss: + Train: + - DistillationGTCELoss: + weight: 1.0 + model_names: ["Student"] + - DistillationWSLLoss: + weight: 2.5 + model_name_pairs: [["Student", "Teacher"]] + temperature: 2 + Eval: + - CELoss: + weight: 1.0 +``` + ## 2. 模型训练、评估和预测 diff --git a/docs/zh_CN/advanced_tutorials/theseus_layer.md b/docs/zh_CN/advanced_tutorials/theseus_layer.md index 56f2c9717ef2ee61fffbe2e7e46780c488f357b2..b0006ed15fc0e007d1061bff686644b516ff624c 100644 --- a/docs/zh_CN/advanced_tutorials/theseus_layer.md +++ b/docs/zh_CN/advanced_tutorials/theseus_layer.md @@ -69,14 +69,14 @@ MobileNetV1 │   . │   . │   └── blocks12 (DepthwiseSeparable).............("blocks[12]") -│      ├── depthwise_conv (ConvBNLayer)..........("blocks[0].depthwise_conv") -│      │   ├── conv (nn.Conv2D)..................("blocks[0].depthwise_conv.conv") -│      │   ├── bn (nn.BatchNorm).................("blocks[0].depthwise_conv.bn") -│      │   └── relu (nn.ReLU)....................("blocks[0].depthwise_conv.relu") -│      └── pointwise_conv (ConvBNLayer)..........("blocks[0].pointwise_conv") -│      ├── conv (nn.Conv2D)..................("blocks[0].pointwise_conv.conv") -│      ├── bn (nn.BatchNorm).................("blocks[0].pointwise_conv.bn") -│      └── relu (nn.ReLU)....................("blocks[0].pointwise_conv.relu") +│      ├── depthwise_conv (ConvBNLayer)..........("blocks[12].depthwise_conv") +│      │   ├── conv (nn.Conv2D)..................("blocks[12].depthwise_conv.conv") +│      │   ├── bn (nn.BatchNorm).................("blocks[12].depthwise_conv.bn") +│      │   └── relu (nn.ReLU)....................("blocks[12].depthwise_conv.relu") +│      └── pointwise_conv (ConvBNLayer)..........("blocks[12].pointwise_conv") +│      ├── conv (nn.Conv2D)..................("blocks[12].pointwise_conv.conv") +│      ├── bn (nn.BatchNorm).................("blocks[12].pointwise_conv.bn") +│      └── relu (nn.ReLU)....................("blocks[12].pointwise_conv.relu") │ ├── avg_pool (nn.AdaptiveAvgPool2D)...............("avg_pool") │ @@ -94,7 +94,7 @@ MobileNetV1 ## 3. 方法说明 -PaddleClas 提供的 backbone 网络均基于图像分类数据集训练得到,因此网络的尾部带有用于分类的全连接层,而在特定任务场景下,需要去掉分类的全连接层。在部分下游任务中,例如目标检测场景,需要获取到网络中间层的输出结果,也可能需要对网络的中间层进行修改,因此 `TheseusLayer` 提供了 3 个接口函数用于实现不同的修改功能。 +PaddleClas 提供的 backbone 网络均基于图像分类数据集训练得到,因此网络的尾部带有用于分类的全连接层,而在特定任务场景下,需要去掉分类的全连接层。在部分下游任务中,例如目标检测场景,需要获取到网络中间层的输出结果,也可能需要对网络的中间层进行修改,因此 `TheseusLayer` 提供了 3 个接口函数用于实现不同的修改功能。下面基于 PaddleClas whl 进行说明,首先需要安装 PaddleClas:`pip install paddleclas`。 @@ -122,7 +122,6 @@ def stop_after(self, stop_layer_name: str) -> bool: 以 `MobileNetV1` 网络为例,参数 `stop_layer_name` 为 `"blocks[0].depthwise_conv.conv"`,具体效果可以参考下方代码案例进行尝试。 ```python -# cd or pip install paddleclas to import paddleclas import paddleclas net = paddleclas.MobileNetV1() @@ -168,7 +167,6 @@ def update_res( import numpy as np import paddle -# cd or pip install paddleclas to import paddleclas import paddleclas np_input = np.zeros((1, 3, 224, 224)) @@ -186,8 +184,8 @@ print("The result returned by update_res(): ", res) output = net(pd_input) print("The output's keys of processed net: ", output.keys()) -# The output's keys of net: dict_keys(['output', 'blocks[0]', 'blocks[2]', 'blocks[4]', 'blocks[10]']) -# 网络前向输出 output 为 dict 类型对象,其中,output["output"] 为网络最终输出,output["blocks[0]"] 等为网络中间层输出结果 +# The output's keys of net: dict_keys(['logits', 'blocks[0]', 'blocks[2]', 'blocks[4]', 'blocks[10]']) +# 网络前向输出 output 为 dict 类型对象,其中,output["logits"] 为网络最终输出,output["blocks[0]"] 等为网络中间层输出结果 ``` 除了通过调用方法 `update_res()` 的方式之外,也同样可以在实例化网络对象时,通过指定参数 `return_patterns` 实现相同效果: @@ -241,7 +239,6 @@ def upgrade_sublayer(self, ```python from paddle import nn -# cd or pip install paddleclas to import paddleclas import paddleclas # 该函数必须有两个形参 diff --git a/docs/zh_CN/algorithm_introduction/ImageNet_models.md b/docs/zh_CN/algorithm_introduction/ImageNet_models.md index 61a4b9822bde2d2d5d71c9e4ed51401c44647819..dfc55c35f51a572931d697a797dfbff093c6ae56 100644 --- a/docs/zh_CN/algorithm_introduction/ImageNet_models.md +++ b/docs/zh_CN/algorithm_introduction/ImageNet_models.md @@ -354,24 +354,24 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 | 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | time(ms)
bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | |------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------| -| ViT_small_
patch16_224 | 0.7769 | 0.9342 | 3.71 | 9.05 | 16.72 | 9.41 | 48.60 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_small_patch16_224_infer.tar) | -| ViT_base_
patch16_224 | 0.8195 | 0.9617 | 6.12 | 14.84 | 28.51 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_224_infer.tar) | +| ViT_small_
patch16_224 | 0.7553 | 0.9211 | 3.71 | 9.05 | 16.72 | 9.41 | 48.60 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_small_patch16_224_infer.tar) | +| ViT_base_
patch16_224 | 0.8187 | 0.9618 | 6.12 | 14.84 | 28.51 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_224_infer.tar) | | ViT_base_
patch16_384 | 0.8414 | 0.9717 | 14.15 | 48.38 | 95.06 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_384_infer.tar) | | ViT_base_
patch32_384 | 0.8176 | 0.9613 | 4.94 | 13.43 | 24.08 | 12.66 | 88.19 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch32_384_infer.tar) | -| ViT_large_
patch16_224 | 0.8323 | 0.9650 | 15.53 | 49.50 | 94.09 | 59.65 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_224_infer.tar) | +| ViT_large_
patch16_224 | 0.8303 | 0.9655 | 15.53 | 49.50 | 94.09 | 59.65 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_224_infer.tar) | |ViT_large_
patch16_384| 0.8513 | 0.9736 | 39.51 | 152.46 | 304.06 | 174.70 | 304.12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_384_infer.tar) | |ViT_large_
patch32_384| 0.8153 | 0.9608 | 11.44 | 36.09 | 70.63 | 44.24 | 306.48 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch32_384_infer.tar) | | 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | time(ms)
bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | |------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------| -| DeiT_tiny_
patch16_224 | 0.718 | 0.910 | 3.61 | 3.94 | 6.10 | 1.07 | 5.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_patch16_224_infer.tar) | -| DeiT_small_
patch16_224 | 0.796 | 0.949 | 3.61 | 6.24 | 10.49 | 4.24 | 21.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_patch16_224_infer.tar) | -| DeiT_base_
patch16_224 | 0.817 | 0.957 | 6.13 | 14.87 | 28.50 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_224_infer.tar) | -| DeiT_base_
patch16_384 | 0.830 | 0.962 | 14.12 | 48.80 | 97.60 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_384_infer.tar) | -| DeiT_tiny_
distilled_patch16_224 | 0.741 | 0.918 | 3.51 | 4.05 | 6.03 | 1.08 | 5.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_distilled_patch16_224_infer.tar) | -| DeiT_small_
distilled_patch16_224 | 0.809 | 0.953 | 3.70 | 6.20 | 10.53 | 4.26 | 22.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_distilled_patch16_224_infer.tar) | -| DeiT_base_
distilled_patch16_224 | 0.831 | 0.964 | 6.17 | 14.94 | 28.58 | 16.93 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_224_infer.tar) | -| DeiT_base_
distilled_patch16_384 | 0.851 | 0.973 | 14.12 | 48.76 | 97.09 | 49.43 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_384_infer.tar) | +| DeiT_tiny_
patch16_224 | 0.7208 | 0.9112 | 3.61 | 3.94 | 6.10 | 1.07 | 5.68 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_patch16_224_infer.tar) | +| DeiT_small_
patch16_224 | 0.7982 | 0.9495 | 3.61 | 6.24 | 10.49 | 4.24 | 21.97 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_patch16_224_infer.tar) | +| DeiT_base_
patch16_224 | 0.8180 | 0.9558 | 6.13 | 14.87 | 28.50 | 16.85 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_224_infer.tar) | +| DeiT_base_
patch16_384 | 0.8289 | 0.9624 | 14.12 | 48.80 | 97.60 | 49.35 | 86.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_384_infer.tar) | +| DeiT_tiny_
distilled_patch16_224 | 0.7449 | 0.9192 | 3.51 | 4.05 | 6.03 | 1.08 | 5.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_distilled_patch16_224_infer.tar) | +| DeiT_small_
distilled_patch16_224 | 0.8117 | 0.9538 | 3.70 | 6.20 | 10.53 | 4.26 | 22.36 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_distilled_patch16_224_infer.tar) | +| DeiT_base_
distilled_patch16_224 | 0.8330 | 0.9647 | 6.17 | 14.94 | 28.58 | 16.93 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_224_infer.tar) | +| DeiT_base_
distilled_patch16_384 | 0.8520 | 0.9720 | 14.12 | 48.76 | 97.09 | 49.43 | 87.18 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_384_infer.tar) | @@ -426,14 +426,14 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 | 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | time(ms)
bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 6.59 | 9.68 | 16.32 | 4.35 | 28.26 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar) | -| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 12.54 | 17.07 | 28.08 | 8.51 | 49.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_small_patch4_window7_224_infer.tar) | -| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 13.37 | 23.53 | 39.11 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) | -| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 19.52 | 64.56 | 123.30 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) | -| SwinTransformer_base_patch4_window7_224[1] | 0.8487 | 0.9746 | 13.53 | 23.46 | 39.13 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) | -| SwinTransformer_base_patch4_window12_384[1] | 0.8642 | 0.9807 | 19.65 | 64.72 | 123.42 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) | -| SwinTransformer_large_patch4_window7_224[1] | 0.8596 | 0.9783 | 15.74 | 38.57 | 71.49 | 34.02 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window7_224_22kto1k_infer.tar) | -| SwinTransformer_large_patch4_window12_384[1] | 0.8719 | 0.9823 | 32.61 | 116.59 | 223.23 | 99.97 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window12_384_22kto1k_infer.tar) | +| SwinTransformer_tiny_patch4_window7_224 | 0.8110 | 0.9549 | 6.59 | 9.68 | 16.32 | 4.35 | 28.26 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar) | +| SwinTransformer_small_patch4_window7_224 | 0.8321 | 0.9622 | 12.54 | 17.07 | 28.08 | 8.51 | 49.56 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_small_patch4_window7_224_infer.tar) | +| SwinTransformer_base_patch4_window7_224 | 0.8337 | 0.9643 | 13.37 | 23.53 | 39.11 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) | +| SwinTransformer_base_patch4_window12_384 | 0.8417 | 0.9674 | 19.52 | 64.56 | 123.30 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) | +| SwinTransformer_base_patch4_window7_224[1] | 0.8516 | 0.9748 | 13.53 | 23.46 | 39.13 | 15.13 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) | +| SwinTransformer_base_patch4_window12_384[1] | 0.8634 | 0.9798 | 19.65 | 64.72 | 123.42 | 44.45 | 87.70 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) | +| SwinTransformer_large_patch4_window7_224[1] | 0.8619 | 0.9788 | 15.74 | 38.57 | 71.49 | 34.02 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window7_224_22kto1k_infer.tar) | +| SwinTransformer_large_patch4_window12_384[1] | 0.8706 | 0.9814 | 32.61 | 116.59 | 223.23 | 99.97 | 196.43 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window12_384_22kto1k_infer.tar) | [1]:基于 ImageNet22k 数据集预训练,然后在 ImageNet1k 数据集迁移学习得到。 @@ -446,7 +446,7 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 | 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | time(ms)
bs=8 | FLOPs(M) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | | LeViT_128S | 0.7598 | 0.9269 | | | | 281 | 7.42 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128S_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128S_infer.tar) | -| LeViT_128 | 0.7810 | 0.9371 | | | | 365 | 8.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128_infer.tar) | +| LeViT_128 | 0.7810 | 0.9372 | | | | 365 | 8.87 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128_infer.tar) | | LeViT_192 | 0.7934 | 0.9446 | | | | 597 | 10.61 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_192_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_192_infer.tar) | | LeViT_256 | 0.8085 | 0.9497 | | | | 1049 | 18.45 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_256_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_256_infer.tar) | | LeViT_384 | 0.8191 | 0.9551 | | | | 2234 | 38.45 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_384_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_384_infer.tar) | @@ -461,12 +461,12 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 | 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | time(ms)
bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| pcpvt_small | 0.8082 | 0.9552 | 7.32 | 10.51 | 15.27 |3.67 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_small_infer.tar) | -| pcpvt_base | 0.8242 | 0.9619 | 12.20 | 16.22 | 23.16 | 6.44 | 43.83 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_base_infer.tar) | -| pcpvt_large | 0.8273 | 0.9650 | 16.47 | 22.90 | 32.73 | 9.50 | 60.99 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_large_infer.tar) | -| alt_gvt_small | 0.8140 | 0.9546 | 6.94 | 9.01 | 12.27 |2.81 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_small_infer.tar) | -| alt_gvt_base | 0.8294 | 0.9621 | 9.37 | 15.02 | 24.54 | 8.34 | 56.07 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_base_infer.tar) | -| alt_gvt_large | 0.8331 | 0.9642 | 11.76 | 22.08 | 35.12 | 14.81 | 99.27 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_large_infer.tar) | +| pcpvt_small | 0.8115 | 0.9567 | 7.32 | 10.51 | 15.27 |3.67 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_small_infer.tar) | +| pcpvt_base | 0.8268 | 0.9627 | 12.20 | 16.22 | 23.16 | 6.44 | 43.83 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_base_infer.tar) | +| pcpvt_large | 0.8306 | 0.9659 | 16.47 | 22.90 | 32.73 | 9.50 | 60.99 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_large_infer.tar) | +| alt_gvt_small | 0.8177 | 0.9557 | 6.94 | 9.01 | 12.27 |2.81 | 24.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_small_infer.tar) | +| alt_gvt_base | 0.8315 | 0.9629 | 9.37 | 15.02 | 24.54 | 8.34 | 56.07 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_base_infer.tar) | +| alt_gvt_large | 0.8364 | 0.9651 | 11.76 | 22.08 | 35.12 | 14.81 | 99.27 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_large_infer.tar) | **注**:与 Reference 的精度差异源于数据预处理不同。 @@ -551,13 +551,13 @@ ViT(Vision Transformer) 与 DeiT(Data-efficient Image Transformers)系列模 | 模型 | Top-1 Acc | Top-5 Acc | time(ms)
bs=1 | time(ms)
bs=4 | time(ms)
bs=8 | FLOPs(G) | Params(M) | 预训练模型下载地址 | inference模型下载地址 | | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| PVT_V2_B0 | 0.705 | 0.902 | - | - | - | 0.53 | 3.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B0_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B0_infer.tar) | -| PVT_V2_B1 | 0.787 | 0.945 | - | - | - | 2.0 | 14.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B1_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B1_infer.tar) | -| PVT_V2_B2 | 0.821 | 0.960 | - | - | - | 3.9 | 25.4 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_infer.tar) | -| PVT_V2_B2_Linear | 0.821 | 0.961 | - | - | - | 3.8 | 22.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_Linear_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_Linear_infer.tar) | -| PVT_V2_B3 | 0.831 | 0.965 | - | - |- | 6.7 | 45.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B3_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B3_infer.tar) | -| PVT_V2_B4 | 0.836 | 0.967 | - | - | - | 9.8 | 62.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B4_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B4_infer.tar) | -| PVT_V2_B5 | 0.837 | 0.966 | - | - | - | 11.4 | 82.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B5_infer.tar) | +| PVT_V2_B0 | 0.7052 | 0.9016 | - | - | - | 0.53 | 3.7 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B0_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B0_infer.tar) | +| PVT_V2_B1 | 0.7869 | 0.9450 | - | - | - | 2.0 | 14.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B1_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B1_infer.tar) | +| PVT_V2_B2 | 0.8206 | 0.9599 | - | - | - | 3.9 | 25.4 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_infer.tar) | +| PVT_V2_B2_Linear | 0.8205 | 0.9605 | - | - | - | 3.8 | 22.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B2_Linear_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B2_Linear_infer.tar) | +| PVT_V2_B3 | 0.8310 | 0.9648 | - | - |- | 6.7 | 45.2 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B3_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B3_infer.tar) | +| PVT_V2_B4 | 0.8361 | 0.9666 | - | - | - | 9.8 | 62.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B4_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B4_infer.tar) | +| PVT_V2_B5 | 0.8374 | 0.9662 | - | - | - | 11.4 | 82.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/PVT_V2_B5_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PVT_V2_B5_infer.tar) | diff --git a/docs/zh_CN/algorithm_introduction/deep_hashing_introduction.md b/docs/zh_CN/algorithm_introduction/deep_hashing_introduction.md new file mode 100644 index 0000000000000000000000000000000000000000..71b2e38155954634930467bc04bdb6bf031ed189 --- /dev/null +++ b/docs/zh_CN/algorithm_introduction/deep_hashing_introduction.md @@ -0,0 +1,63 @@ +# Deep Hashing算法介绍 +---- +## 目录 + +* [1. 简介](#1) +* [2. 算法介绍](#2) + * [2.1 DCH](#2.1) + * [2.2 DSHSD](#2.2) + * [2.3 LCDSH](#2.3) +* [3. 快速体验](#3) +* [4. 总结及建议](#4) + + +## 1. 简介 + +最近邻搜索是指在数据库中查找与查询数据距离最近的点,在计算机视觉、推荐系统、机器学习等领域中广泛使用。在PP-ShiTu中,输入图像经过主体检测模型去掉背景后,再经过特征提取模型提取特征,之后经过检索得到检索图像等类别。在这个过程中,一般来说,提取的特征是float32数据类型。当离线特征库中存储的feature比较多时,就占用较大的存储空间,同时检索过程也会变慢。如果利用哈希编码将特征由float32转成0或者1表示的二值特征,那么不仅降低存储空间,同时也能大大加快检索速度。 + + +## 2. 算法介绍 + +目前PaddleClas中,主要复现了三种DeepHash的方法,分别是:[DCH](http://ise.thss.tsinghua.edu.cn/~mlong/doc/deep-cauchy-hashing-cvpr18.pdf),[DSHSD](https://ieeexplore.ieee.org/document/8648432/), [LCDSH](https://www.ijcai.org/Proceedings/2017/0499.pdf)。以下做简要介绍。 + + +## 2.1 DCH + +此方法基于柯西分布,提出一种成对的交叉熵损失函数,能够较好的得到紧密的hamming特征。在多个数据集上取得较好的结果。详见[论文](http://ise.thss.tsinghua.edu.cn/~mlong/doc/deep-cauchy-hashing-cvpr18.pdf)。方法示意图如下: + +
+ +
+ + +## 2.2 DSHSD + +DSHSD主要创新点在于,在保证分布一致性的情况下消除差异。首先,作者利用平滑投影函数来放松离散约束,而不是使用任何量化正则化器,其中平滑量是可调整的。其次,在平滑投影和特征分布之间建立数学联系,以保持分布的一致性。进而提出了一种多语义信息融合方法,使hash码学习后能够保留更多的语义信息,从而加快训练收敛速度。其方法在在CIFAR-10、NUS-WIDE和ImageNet数据集上的大量实验表现良好。具体可查看[论文](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8648432)。 + +
+ +
+ + +## 2.3 LCDSH + +LCDSH是一种局部约束深度监督哈希算法。该方案通过学习图像对之间的相似特征使得,哈希码保持了DCNN特征的分布,从而有利于准确的图像检索。具体可查看[论文](https://www.ijcai.org/Proceedings/2017/0499.pdf)。 + +
+ +
+ + +## 3. 快速体验 + +这个三个哈希算法的配置文件具体位置: +`DCH`: ppcls/configs/DeepHash/DCH.yaml +`DSHSD`: ppcls/configs/DeepHash/DSHSD.yaml +`LCDSH`: ppcls/configs/DeepHash/LCDSH.yaml + +具体训练方法,请参考[分类模型训练文档](../models_training/classification.md) + + +## 4. 总结及建议 + +不同的DeepHash方法,具有不同特性。可以分别对不同的哈希方法进行尝试,选取最合适自己数据集的方法。 diff --git a/docs/zh_CN/algorithm_introduction/reid.md b/docs/zh_CN/algorithm_introduction/reid.md index f8c8705ac59e9950b14587730c971b81e81f48b3..1affe41824e772606f3231b19bac0d20210ba570 100644 --- a/docs/zh_CN/algorithm_introduction/reid.md +++ b/docs/zh_CN/algorithm_introduction/reid.md @@ -344,7 +344,7 @@ PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模 #### 5.1 方法总结与对比 -上述算法能快速地迁移至多数的ReID模型中,能进一步提升ReID模型的性能。 +上述算法能快速地迁移至多数的ReID模型中(参考 [PP-ShiTuV2](../PPShiTu/PPShiTuV2_introduction.md) ),能进一步提升ReID模型的性能, #### 5.2 使用建议/FAQ diff --git a/docs/zh_CN/image_recognition_pipeline/deep_hashing.md b/docs/zh_CN/image_recognition_pipeline/deep_hashing.md new file mode 100644 index 0000000000000000000000000000000000000000..03413dac84e4b9974d03ad230a04e13171d8c01f --- /dev/null +++ b/docs/zh_CN/image_recognition_pipeline/deep_hashing.md @@ -0,0 +1,70 @@ +# 哈希编码 + +最近邻搜索是指在数据库中查找与查询数据距离最近的点,在计算机视觉、推荐系统、机器学习等领域中广泛使用。在`PP-ShiTu`中,输入图像经过主体检测模型去掉背景后,再经过特征提取模型提取特征,之后经过检索得到输入图像的类别。在这个过程中,一般来说,提取的特征是`float32`数据类型。当离线特征库中存储的`feature`比较多时,就占用较大的存储空间,同时检索过程也会变慢。如果利用`哈希编码`将特征由`float32`转成`0`或者`1`表示的二值特征,那么不仅降低存储空间,同时也能大大加快检索速度。 + +哈希编码,主要用在`PP-ShiTu`的**特征提取模型**部分,将模型输出特征直接二值化。即训练特征提取模型时,将模型的输出映射到二值空间。 + +注意,由于使用二值特征表示图像特征,精度可能会下降,请根据实际情况,酌情使用。 + + +## 目录 + +- [1. 特征模型二值特征训练](#1) + - [1.1 PP-ShiTu特征提取模型二值训练](#1.1) + - [1.2 其他特征模型二值训练](#1.2) +- [2. 检索算法配置](#2) + + + +## 1. 特征模型二值特征训练 + + + +注意,此模块目前只支持`PP-ShiTuV1`,`PP-ShiTuV2`暂未适配。 + +### 1.1 PP-ShiTu特征提取模型二值训练 + +PP-ShiTu特征提取模型二值特征模型,配置文件位于`ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_binary.yaml`,相关训练方法如下。 + +```shell +# 单卡 GPU +python3.7 tools/train.py \ +-c ./ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_binary.yaml \ +-o Arch.Backbone.pretrained=True \ +-o Global.device=gpu + +# 多卡 GPU +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3.7 -m paddle.distributed.launch tools/train.py \ +-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_binary.yaml \ +-o Arch.Backbone.pretrained=True \ +-o Global.device=gpu +``` + +其中`数据准备`、`模型评估`等,请参考[此文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/models_training/recognition.md)。 + + + +### 1.2 其他特征模型二值训练 + +其他二值特征训练模型的配置文件位于`ppcls/configs/DeepHash/`文件夹下,此文件夹下的相关配置文件主要是复现相关`deep hashing`相关算法。包括:`DCH, DSHSD, LCDSH`三种算法。这三种算法相关介绍,详见[Deep Hashing相关算法介绍](../algorithm_introduction/deep_hashing_introduction.md)。 + +相关训练方法,请参考[分类模型训练文档](../models_training/classification.md)。 + + + +## 2. 检索算法配置 + +在PP-ShiTu中使用二值特征,部署及离线推理配置请参考`deploy/configs/inference_general_binary.yaml`。配置文件中相关参数介绍请参考[向量检索文档](./vector_search.md). + +其中需值得注意的是,二值检索相关配置应设置如下: + +```yaml +IndexProcess: + index_method: "FLAT" # supported: HNSW32, IVF, Flat + delimiter: "\t" + dist_type: "hamming" + hamming_radius: 100 +``` + +其中`hamming_radius`可以根据自己实际精度要求,适当调节。 diff --git a/docs/zh_CN/image_recognition_pipeline/feature_extraction.md b/docs/zh_CN/image_recognition_pipeline/feature_extraction.md index 368abc3da9856c8d9232819aef3b43f0ef66735d..f037dc25a72f2d0f452194fe15879b8c7edced90 100644 --- a/docs/zh_CN/image_recognition_pipeline/feature_extraction.md +++ b/docs/zh_CN/image_recognition_pipeline/feature_extraction.md @@ -1,4 +1,4 @@ -简体中文|[English](../../en/image_recognition_pipeline/feature_extraction_en.md) +简体中文 | [English](../../en/image_recognition_pipeline/feature_extraction_en.md) # 特征提取 ## 目录 @@ -6,10 +6,11 @@ - [1. 摘要](#1-摘要) - [2. 介绍](#2-介绍) - [3. 方法](#3-方法) - - [3.1 Backbone](#31-backbone) - - [3.2 Neck](#32-neck) - - [3.3 Head](#33-head) - - [3.4 Loss](#34-loss) + - [3.1 Backbone](#31-backbone) + - [3.2 Neck](#32-neck) + - [3.3 Head](#33-head) + - [3.4 Loss](#34-loss) + - [3.5 Data Augmentation](#35-data-augmentation) - [4. 实验部分](#4-实验部分) - [5. 自定义特征提取](#5-自定义特征提取) - [5.1 数据准备](#51-数据准备) @@ -35,56 +36,76 @@ ![](../../images/feature_extraction_framework.png) 图中各个模块的功能为: -- **Backbone**: 用于提取输入图像初步特征的骨干网络,一般由配置文件中的 [`Backbone`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L26-L29) 以及 [`BackboneStopLayer`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L30-L31) 字段共同指定。 -- **Neck**: 用以特征增强及特征维度变换。可以是一个简单的 FC Layer,用来做特征维度变换;也可以是较复杂的 FPN 结构,用以做特征增强,一般由配置文件中的 [`Neck`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L32-L35)字段指定。 -- **Head**: 用来将 feature 转化为 logits,让模型在训练阶段能以分类任务的形式进行训练。除了常用的 FC Layer 外,还可以替换为 cosmargin, arcmargin, circlemargin 等模块,一般由配置文件中的 [`Head`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L36-L41)字段指定。 -- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式,可以方便地将 Classification Loss 和 Metric learning Loss 组合在一起,一般由配置文件中的 [`Loss`](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml#L44-L50)字段指定。 +- **Backbone**: 用于提取输入图像初步特征的骨干网络,一般由配置文件中的 [Backbone](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L33-L37) 以及 [BackboneStopLayer](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L38-L39) 字段共同指定。 +- **Neck**: 用以特征增强及特征维度变换。可以是一个简单的 FC Layer,用来做特征维度变换;也可以是较复杂的 FPN 结构,用以做特征增强,一般由配置文件中的 [Neck](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L40-L51) 字段指定。 +- **Head**: 用来将 `Neck` 的输出 feature 转化为 logits,让模型在训练阶段能以分类任务的形式进行训练。除了常用的 FC Layer 外,还可以替换为 [CosMargin](../../../ppcls/arch/gears/cosmargin.py), [ArcMargin](../../../ppcls/arch/gears/arcmargin.py), [CircleMargin](../../../ppcls/arch/gears/circlemargin.py) 等模块,一般由配置文件中的 [Head](`../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L52-L60) 字段指定。 +- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式,可以方便地将 Classification Loss 和 Metric learning Loss 组合在一起,一般由配置文件中的 [Loss](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-L77) 字段指定。 ## 3. 方法 -### 3.1 Backbone +#### 3.1 Backbone -Backbone 部分采用了 [PP_LCNet_x2_5](../models/PP-LCNet.md),其针对Intel CPU端的性能优化探索了多个有效的结构设计方案,最终实现了在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。 +Backbone 部分采用了 [PP-LCNetV2_base](../models/PP-LCNetV2.md),其在 `PPLCNet_V1` 的基础上,加入了包括Rep 策略、PW 卷积、Shortcut、激活函数改进、SE 模块改进等多个优化点,使得最终分类精度与 `PPLCNet_x2_5` 相近,且推理延时减少了40%*。在实验过程中我们对 `PPLCNetV2_base` 进行了适当的改进,在保持速度基本不变的情况下,让其在识别任务中得到更高的性能,包括:去掉 `PPLCNetV2_base` 末尾的 `ReLU` 和 `FC`、将最后一个 stage(RepDepthwiseSeparable) 的 stride 改为1。 -### 3.2 Neck -Neck 部分采用了 [FC Layer](../../../ppcls/arch/gears/fc.py),对 Backbone 抽取得到的特征进行降维,减少了特征存储的成本与计算量。 +**注:** *推理环境基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。 -### 3.3 Head +#### 3.2 Neck -Head 部分选用 [ArcMargin](../../../ppcls/arch/gears/arcmargin.py),在训练时通过指定margin,增大同类特征之间的角度差异再进行分类,进一步提升抽取特征的表征能力。 +Neck 部分采用了 [BN Neck](../../../ppcls/arch/gears/bnneck.py),对 Backbone 抽取得到的特征的每个维度进行标准化操作,减少了同时优化度量学习损失函数和分类损失函数的难度,加快收敛速度。 -### 3.4 Loss +#### 3.3 Head -Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训练时以分类任务的损失函数来指导网络进行优化。详细的配置文件见[通用识别配置文件](../../../ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml)。 +Head 部分选用 [FC Layer](../../../ppcls/arch/gears/fc.py),使用分类头将 feature 转换成 logits 供后续计算分类损失。 - +#### 3.4 Loss -## 4. 实验部分 +Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py) 和 [TripletAngularMarginLoss](../../../ppcls/loss/tripletangularmarginloss.py),在训练时以分类损失和基于角度的三元组损失来指导网络进行优化。我们基于原始的 TripletLoss (困难三元组损失)进行了改进,将优化目标从 L2 欧几里得空间更换成余弦空间,并加入了 anchor 与 positive/negtive 之间的硬性距离约束,让训练与测试的目标更加接近,提升模型的泛化能力。详细的配置文件见 [GeneralRecognitionV2_PPLCNetV2_base.yaml](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-77)。 -训练数据为如下 7 个公开数据集的汇总: +#### 3.5 Data Augmentation -| 数据集 | 数据量 | 类别数 | 场景 | 数据集地址 | -| :----------: | :-----: | :------: | :------: | :--------------------------------------------------------------------------: | -| Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | -| GLDv2 | 1580470 | 81313 | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) | -| VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild) | -| LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | -| iCartoonFace | 389678 | 5013 | 动漫人物 | [地址](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) | -| SOP | 59551 | 11318 | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) | -| Inshop | 25882 | 3997 | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | -| **Total** | **5M** | **185K** | ---- | ---- | +我们考虑到实际相机拍摄时目标主体可能出现一定的旋转而不一定能保持正立状态,因此我们在数据增强中加入了适当的 [随机旋转增强](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117),以提升模型在真实场景中的检索能力。 -最终的模型效果如下表所示: + -| 模型 | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP | Inshop | Latency(ms) | -| :-----------------------------: | :--------: | :-------: | :--------: | :----------: | :---: | :----: | :---------: | -| GeneralRecognition_PPLCNet_x2_5 | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 | +## 4. 实验部分 -* 预训练模型地址:[通用识别预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams) -* 采用的评测指标为:`Recall@1` +我们对原有的训练数据进行了合理扩充与优化,最终使用如下 17 个公开数据集的汇总: + +| 数据集 | 数据量 | 类别数 | 场景 | 数据集地址 | +| :--------------------- | :-----: | :------: | :---: | :----------------------------------------------------------------------------------: | +| Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | +| GLDv2 | 1580470 | 81313 | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) | +| VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild) | +| LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| SOP | 59551 | 11318 | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) | +| Inshop | 25882 | 3997 | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | +| bird400 | 58388 | 400 | 鸟类 | [地址](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) | +| 104flows | 12753 | 104 | 花类 | [地址](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | +| Cars | 58315 | 112 | 车辆 | [地址](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) | +| Fashion Product Images | 44441 | 47 | 商品 | [地址](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) | +| flowerrecognition | 24123 | 59 | 花类 | [地址](https://www.kaggle.com/datasets/aymenktari/flowerrecognition) | +| food-101 | 101000 | 101 | 食物 | [地址](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) | +| fruits-262 | 225639 | 262 | 水果 | [地址](https://www.kaggle.com/datasets/aelchimminut/fruits262) | +| inaturalist | 265213 | 1010 | 自然 | [地址](https://github.com/visipedia/inat_comp/tree/master/2017) | +| indoor-scenes | 15588 | 67 | 室内 | [地址](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019) | +| Products-10k | 141931 | 9691 | 商品 | [地址](https://products-10k.github.io/) | +| CompCars | 16016 | 431 | 车辆 | [地址](http://​​​​​​http://ai.stanford.edu/~jkrause/cars/car_dataset.html​) | +| **Total** | **6M** | **192K** | - | - | + +最终的模型精度指标如下表所示: + +| 模型 | 延时(ms) | 存储(MB) | product* | | Aliproduct | | VeRI-Wild | | LogoDet-3k | | iCartoonFace | | SOP | | Inshop | | gldv2 | | imdb_face | | iNat | | instre | | sketch | | sop | | +| :--------------------- | :------- | :------- | :------------------ | :--- | ---------- | ---- | --------- | ---- | ---------- | ---- | ------------ | ---- | -------- | ---- | -------- | ---- | -------- | ---- | --------- | ---- | -------- | ---- | -------- | ---- | -------- | ---- | -------- | ---- | +| | | | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | recall@1 | mAP | +| PP-ShiTuV1_general_rec | 5.0 | 34 | 65.9 | 54.3 | 83.9 | 83.2 | 88.7 | 60.1 | 86.1 | 73.6 | 84.1 | 72.3 | 79.7 | 58.6 | 89.1 | 69.4 | 98.2 | 91.6 | 28.8 | 8.42 | 12.6 | 6.1 | 72.0 | 50.4 | 27.9 | 9.5 | 97.6 | 90.3 | +| PP-ShiTuV2_general_rec | 6.1 | 19 | 73.7 | 61.0 | 84.2 | 83.3 | 87.8 | 68.8 | 88.0 | 63.2 | 53.6 | 27.5 | 77.6 | 55.3 | 90.8 | 74.3 | 98.1 | 90.5 | 35.9 | 11.2 | 38.6 | 23.9 | 87.7 | 71.4 | 39.3 | 15.6 | 98.3 | 90.9 | + +* product数据集是为了验证PP-ShiTu的泛化性能而制作的数据集,所有的数据都没有在训练和测试集中出现。该数据包含7个大类(化妆品、地标、红酒、手表、车、运动鞋、饮料),250个小类。测试时,使用250个小类的标签进行测试;sop数据集来自[GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122),可视为“SOP”数据集的子集。 +* 预训练模型地址:[general_PPLCNetV2_base_pretrained_v1.0.pdparams](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams) +* 采用的评测指标为:`Recall@1` 与 `mAP` * 速度评测机器的 CPU 具体信息为:`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz` * 速度指标的评测条件为: 开启 MKLDNN, 线程数设置为 10 @@ -94,47 +115,52 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 自定义特征提取,是指依据自己的任务,重新训练特征提取模型。 -下面基于`GeneralRecognition_PPLCNet_x2_5.yaml`配置文件,介绍主要的四个步骤:1)数据准备;2)模型训练;3)模型评估;4)模型推理 - +下面基于 `GeneralRecognitionV2_PPLCNetV2_base.yaml` 配置文件,介绍主要的四个步骤:1)数据准备;2)模型训练;3)模型评估;4)模型推理 ### 5.1 数据准备 -首先需要基于任务定制自己的数据集。数据集格式与文件结构详见[数据集格式说明](../data_preparation/recognition_dataset.md)。 +首先需要基于任务定制自己的数据集。数据集格式与文件结构详见 [数据集格式说明](../data_preparation/recognition_dataset.md)。 准备完毕之后还需要在配置文件中修改数据配置相关的内容, 主要包括数据集的地址以及类别数量。对应到配置文件中的位置如下所示: - 修改类别数: ```yaml - Head: - name: ArcMargin - embedding_size: 512 - class_num: 185341 # 此处表示类别数 + Head: + name: FC + embedding_size: *feat_dim + class_num: 192612 # 此处表示类别数 + weight_attr: + initializer: + name: Normal + std: 0.001 + bias_attr: False ``` - 修改训练数据集配置: ```yaml - Train: - dataset: - name: ImageNetDataset - image_root: ./dataset/ # 此处表示train数据所在的目录 - cls_label_path: ./dataset/train_reg_all_data.txt # 此处表示train数据集label文件的地址 + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ # 此处表示train数据集所在的目录 + cls_label_path: ./dataset/train_reg_all_data_v2.txt # 此处表示train数据集对应标注文件的地址 + relabel: True ``` - 修改评估数据集中query数据配置: ```yaml - Query: - dataset: - name: VeriWild - image_root: ./dataset/Aliproduct/ # 此处表示query数据集所在的目录 - cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示query数据集label文件的地址 + Query: + dataset: + name: VeriWild + image_root: ./dataset/Aliproduct/ # 此处表示query数据集所在的目录 + cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示query数据集对应标注文件的地址 ``` - 修改评估数据集中gallery数据配置: ```yaml - Gallery: - dataset: - name: VeriWild - image_root: ./dataset/Aliproduct/ # 此处表示gallery数据集所在的目录 - cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示gallery数据集label文件的地址 + Gallery: + dataset: + name: VeriWild + image_root: ./dataset/Aliproduct/ # 此处表示gallery数据集所在的目录 + cls_label_path: ./dataset/Aliproduct/val_list.txt # 此处表示gallery数据集对应标注文件的地址 ``` @@ -147,14 +173,14 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ```shell export CUDA_VISIBLE_DEVICES=0 python3.7 tools/train.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml ``` - 单机多卡训练 ```shell export CUDA_VISIBLE_DEVICES=0,1,2,3 - python3.7 -m paddle.distributed.launch \ - --gpus="0,1,2,3" tools/train.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml + python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml ``` **注意:** 配置文件中默认采用`在线评估`的方式,如果你想加快训练速度,可以关闭`在线评估`功能,只需要在上述命令的后面,增加 `-o Global.eval_during_train=False`。 @@ -165,15 +191,15 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ```shell export CUDA_VISIBLE_DEVICES=0 python3.7 tools/train.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ -o Global.checkpoint="output/RecModel/latest" ``` - 单机多卡断点恢复训练 ```shell export CUDA_VISIBLE_DEVICES=0,1,2,3 - python3.7 -m paddle.distributed.launch \ - --gpus="0,1,2,3" tools/train.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ + python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ -o Global.checkpoint="output/RecModel/latest" ``` @@ -187,16 +213,16 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 ```shell export CUDA_VISIBLE_DEVICES=0 python3.7 tools/eval.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ -o Global.pretrained_model="output/RecModel/best_model" ``` - 多卡评估 ```shell export CUDA_VISIBLE_DEVICES=0,1,2,3 - python3.7 -m paddle.distributed.launch \ - --gpus="0,1,2,3" tools/eval.py \ - -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ + python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \ + tools/eval.py \ + -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ -o Global.pretrained_model="output/RecModel/best_model" ``` **注:** 建议使用多卡评估。该方式可以利用多卡并行计算快速得到全部数据的特征,能够加速评估的过程。 @@ -212,7 +238,7 @@ Loss 部分选用 [Cross entropy loss](../../../ppcls/loss/celoss.py),在训 首先需要将 `*.pdparams` 模型文件转换成 inference 格式,转换命令如下。 ```shell python3.7 tools/export_model.py \ --c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \ +-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \ -o Global.pretrained_model="output/RecModel/best_model" ``` 生成的推理模型默认位于 `PaddleClas/inference` 目录,里面包含三个文件,分别为 `inference.pdmodel`、`inference.pdiparams`、`inference.pdiparams.info`。 @@ -228,10 +254,18 @@ python3.7 python/predict_rec.py \ -c configs/inference_rec.yaml \ -o Global.rec_inference_model_dir="../inference" ``` -得到的特征输出格式如下图所示: -![](../../images/feature_extraction_output.png) +得到的特征输出格式如下所示: + +```log +wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02 + 4.39076796e-02 -2.34078392e-02 -9.49947070e-03 1.28221214e-02 + 5.53947650e-02 1.01355985e-02 -1.06436480e-02 4.97181974e-02 + -2.21862812e-02 -1.75557341e-02 1.55848479e-02 -3.33278324e-03 + ... + -3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02] +``` -在实际使用过程中,仅仅得到特征可能并不能满足业务需求。如果想进一步通过特征检索来进行图像识别,可以参照文档[向量检索](./vector_search.md)。 +在实际使用过程中,仅仅得到特征可能并不能满足业务需求。如果想进一步通过特征检索来进行图像识别,可以参照文档 [向量检索](./vector_search.md)。 @@ -244,4 +278,4 @@ python3.7 python/predict_rec.py \ ## 7. 参考文献 1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf) -2. [ArcFace: Additive Angular Margin Loss for Deep Face Recognition](https://arxiv.org/abs/1801.07698) +2. [Bag of Tricks and A Strong Baseline for Deep Person Re-identification](https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf) diff --git a/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md b/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md index 828fdf4f1f017d524aa9ebea1f1a409dee0eaf43..5434a464c56cb20bf203a98261ac834101263efd 100644 --- a/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md +++ b/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md @@ -12,7 +12,6 @@ - [1. 数据集](#1) - [2. 模型选择](#2) - [2.1 轻量级主体检测模型](#2.1) - - [2.2 服务端主体检测模型](#2.2) - [3. 模型训练](#3) - [3.1 环境准备](#3.1) - [3.2 数据准备](#3.2) @@ -45,14 +44,13 @@ ## 2. 模型选择 -目标检测方法种类繁多,比较常用的有两阶段检测器(如 FasterRCNN 系列等);单阶段检测器(如 YOLO、SSD 等);anchor-free 检测器(如 PicoDet、FCOS 等)。PaddleDetection 中针对服务端使用场景,自研了 PP-YOLO 系列模型;针对端侧(CPU 和移动端等)使用场景,自研了 PicoDet 系列模型,在服务端和端侧均处于业界较为领先的水平。 +目标检测方法种类繁多,比较常用的有两阶段检测器(如 FasterRCNN 系列等);单阶段检测器(如 YOLO、SSD 等);anchor-free 检测器(如 PicoDet、FCOS 等)。在主体检测中,我们使用[PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/picodet)系列模型,其在CPU端与移动端,速度较快、精度较好,处于较为领先的业界水平。 -基于上述研究,PaddleClas 中提供了 2 个通用主体检测模型,为轻量级与服务端主体检测模型,分别适用于端侧场景以及服务端场景。下面的表格中给出了在上述 5 个数据集上的平均 mAP 以及它们的模型大小、预测速度对比信息。 +基于上述研究,PaddleClas 中提供了 1 个通用主体检测模型,既轻量级主体检测模型,分别适用于端侧场景以及服务端场景。下面的表格中给出了在上述 5 个数据集上的平均 mAP 以及它们的模型大小、预测速度对比信息。 -| 模型 | 模型结构 | 预训练模型下载地址 | inference 模型下载地址 | mAP | inference 模型大小(MB) | 单张图片预测耗时(不包含预处理)(ms) | -| ------------------ | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----- | ---------------------- | ---------------------------------- | -| 轻量级主体检测模型 | PicoDet | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [tar 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) [zip 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | 40.1% | 30.1 | 29.8 | -| 服务端主体检测模型 | PP-YOLOv2 | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams) | [tar 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) [zip 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.zip) | 42.5% | 210.5 | 466.6 | +| 模型 | 模型结构 | 预训练模型下载地址 | inference 模型下载地址 | mAP | inference 模型大小(MB) | +| ------------------ | -------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----- | ---------------------- | +| 轻量级主体检测模型 | PicoDet | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [tar 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) [zip 格式文件地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | 41.5% | 30.1 | * 注意 * 由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar xf xxx.tar` 解压。 @@ -65,37 +63,16 @@ PicoDet 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出,是一个适用于 CPU 或者移动端场景的目标检测算法。具体地,它融合了下面一系列优化算法。 -- [ATSS](https://arxiv.org/abs/1912.02424) -- [Generalized Focal Loss](https://arxiv.org/abs/2006.04388) +- [VFL](https://arxiv.org/abs/2008.13367) + [GFL](https://arxiv.org/abs/2006.04388) +- 新的PAN Neck结构 - 余弦学习率策略 - Cycle-EMA -- 轻量级检测 head +- [ATSS](https://arxiv.org/abs/1912.02424)及[SimOTA](https://arxiv.org/abs/2107.08430) 标签分配策略 -更多关于 PicoDet 的优化细节与 benchmark 可以参考 [PicoDet 系列模型介绍](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md)。 +更多关于 PicoDet 的优化细节与 benchmark 可以参考 [PicoDet 系列模型介绍](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet)。 -在轻量级主体检测任务中,为了更好地兼顾检测速度与效果,我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络,同时将训练与预测的图像尺度修改为了 640x640,其余配置与 [picodet_lcnet_1_5x_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/more_config/picodet_lcnet_1_5x_416_coco.yml) 完全一致。将数据集更换为自定义的主体检测数据集,进行训练,最终得到检测模型。 - - - -### 2.2 服务端主体检测模型 - -PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出,从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对 yolov3 模型进行深度优化,最终在“速度-精度”方面达到了业界领先的水平。具体地,优化的策略如下。 - -- 更优的骨干网络: ResNet50vd-DCN -- 更大的训练 batch size: 8 GPUs,每 GPU batch_size=24,对应调整学习率和迭代轮数 -- [Drop Block](https://arxiv.org/abs/1810.12890) -- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) -- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) -- [Grid Sensitive](https://arxiv.org/abs/2004.10934) -- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) -- [CoordConv](https://arxiv.org/abs/1807.03247) -- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) -- 更优的预训练模型 - -更多关于 PP-YOLO 的详细介绍可以参考:[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)。 - -在服务端主体检测任务中,为了保证检测效果,我们使用 ResNet50vd-DCN 作为检测模型的骨干网络,使用配置文件 [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml),更换为自定义的主体检测数据集,进行训练,最终得到检测模型。 +在轻量级主体检测任务中,为了更好地兼顾检测速度与效果,我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络,同时将训练与预测的图像尺度修改为了 640x640,其余配置与 [picodet_l_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/picodet/picodet_l_416_coco.yml) 完全一致。将数据集更换为自定义的主体检测数据集,进行训练,最终得到检测模型。 @@ -112,19 +89,20 @@ PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) ```shell cd git clone https://github.com/PaddlePaddle/PaddleDetection.git - cd PaddleDetection +# 切换到2.3分支 +git checkout release/2.3 # 安装其他依赖 pip install -r requirements.txt ``` -更多安装教程,请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md) +更多安装教程,请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/docs/tutorials/INSTALL_cn.md) ### 3.2 数据准备 -对于自定义数据集,首先需要将自己的数据集修改为 COCO 格式,可以参考[自定义检测数据集教程](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md)制作 COCO 格式的数据集。 +对于自定义数据集,首先需要将自己的数据集修改为 COCO 格式,可以参考[自定义检测数据集教程](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/docs/tutorials/PrepareDataSet.md)制作 COCO 格式的数据集。 主体检测任务中,所有的检测框均属于前景,在这里需要将标注文件中,检测框的 `category_id` 修改为 1,同时将整个标注文件中的 `categories` 映射表修改为下面的格式,即整个类别映射表中只包含`前景`类别。 @@ -136,22 +114,20 @@ pip install -r requirements.txt ### 3.3 配置文件改动和说明 -我们使用 `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` 配置进行训练,配置文件摘要如下: +我们使用 [mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml) 配置进行训练,配置文件摘要如下: ![](../../images/det/PaddleDetection_config.png) -从上图看到 `ppyolov2_r50vd_dcn_365e_coco.yml` 配置需要依赖其他的配置文件,这些配置文件的含义如下: +从上图看到 `mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml` 配置需要依赖其他的配置文件,这些配置文件的含义如下: ``` -coco_detection.yml:主要说明了训练数据和验证数据的路径 - runtime.yml:主要说明了公共的运行参数,比如是否使用 GPU、每多少个 epoch 存储 checkpoint 等 -optimizer_365e.yml:主要说明了学习率和优化器的配置 +optimizer_100e.yml:主要说明了学习率和优化器的配置 -ppyolov2_r50vd_dcn.yml:主要说明模型和主干网络的情况 +picodet_esnet.yml:主要说明模型和主干网络的情况 -ppyolov2_reader.yml:主要说明数据读取器配置,如 batch size,并发加载子进程数等,同时包含读取后预处理操作,如 resize、数据增强等等 +picodet_640_reader.yml:主要说明数据读取器配置,如 batch size,并发加载子进程数等,同时包含读取后预处理操作,如 resize、数据增强等等 ``` 在主体检测任务中,需要将 `datasets/coco_detection.yml` 中的 `num_classes` 参数修改为 1(只有 1 个前景类别),同时将训练集和测试集的路径修改为自定义数据集的路径。 @@ -169,14 +145,14 @@ PaddleDetection 提供了单卡/多卡训练模式,满足用户多种训练需 ```bash # windows 和 Mac 下不需要执行该命令 export CUDA_VISIBLE_DEVICES=0 -python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml +python tools/train.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml ``` * GPU 多卡训练 ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/legacy_model/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --eval ``` --eval:表示边训练边验证。 @@ -188,7 +164,7 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy ```bash export CUDA_VISIBLE_DEVICES=0 # 指定 pretrain_weights 参数,加载通用的主体检测预训练模型 -python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pretrain_weights=https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams +python tools/train.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams ``` * 模型恢复训练 @@ -197,10 +173,14 @@ python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pret ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval -r output/ppyolov2_r50vd_dcn_365e_coco/10000 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --eval -r output/picodet_lcnet_x2_5_640_mainbody/20 ``` -注意:如果遇到 "`Out of memory error`" 问题, 尝试在 `ppyolov2_reader.yml` 文件中调小 `batch_size`,同时等比例调小学习率。 +注意: + +- `-r`命令中最后`20`表示从第20个epoch保存的权重开始训练,使用时确保`20.pdparams 20.pdopt`文件存在。请根据实际自行修改 + +- 如果遇到 "`Out of memory error`" 问题, 尝试在 `picodet_640_reader.yml` 文件中调小 `batch_size`,同时等比例调小学习率。 @@ -210,12 +190,13 @@ python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppy ```bash export CUDA_VISIBLE_DEVICES=0 -python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final +python tools/infer.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/picodet_lcnet_x2_5_640_mainbody/model_final ``` `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果 `keep_top_k` 表示设置输出目标的最大数量,默认值为 100,用户可以根据自己的实际情况进行设定。 + ## 4. 模型推理部署 @@ -224,16 +205,16 @@ python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer 执行导出模型脚本: ```bash -python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams +python tools/export_model.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml --output_dir=./inference -o weights=output/picodet_lcnet_x2_5_640_mainbody/model_final.pdparams ``` -预测模型会导出到 `inference/ppyolov2_r50vd_dcn_365e_coco` 目录下,分别为 `infer_cfg.yml` (预测不需要), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` 。 +预测模型会导出到 `inference/picodet_lcnet_x2_5_640_mainbody` 目录下,分别为 `infer_cfg.yml` (预测不需要), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` 。 注意: `PaddleDetection` 导出的 inference 模型的文件格式为 `model.xxx`,这里如果希望与 PaddleClas 的 inference 模型文件格式保持一致,需要将其 `model.xxx` 文件修改为 `inference.xxx` 文件,用于后续主体检测的预测部署。 -更多模型导出教程,请参考: [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/EXPORT_MODEL.md) +更多模型导出教程,请参考: [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/EXPORT_MODEL.md) -最终,目录 `inference/ppyolov2_r50vd_dcn_365e_coco` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件,其中 `inference.pdiparams` 为保存的 inference 模型权重文件,`inference.pdmodel` 为保存的 inference 模型结构文件。 +最终,目录 `inference/picodet_lcnet_x2_5_640_mainbody` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件,其中 `inference.pdiparams` 为保存的 inference 模型权重文件,`inference.pdmodel` 为保存的 inference 模型结构文件。 ### 4.2 基于python预测引擎推理 @@ -244,7 +225,7 @@ python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml ### 4.3 其他推理方式 -其他推理方法,如C++推理部署、PaddleServing部署等请参考[检测模型推理部署](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/README.md)。 +其他推理方法,如C++推理部署、PaddleServing部署等请参考[检测模型推理部署](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/deploy/README.md)。 ### FAQ diff --git a/docs/zh_CN/inference_deployment/export_model.md b/docs/zh_CN/inference_deployment/export_model.md index 4e2d98e9310602b4df7c0bedee32be88b7cf8fef..e7c204112d742bc8426c9e212b7b47ba232944c8 100644 --- a/docs/zh_CN/inference_deployment/export_model.md +++ b/docs/zh_CN/inference_deployment/export_model.md @@ -46,7 +46,7 @@ python tools/export_model.py \ ## 3. 主体检测模型导出 -主体检测模型的导出,可以参考[主题检测介绍](../image_recognition_pipeline/mainbody_detection.md)。 +主体检测模型的导出,可以参考[主体检测介绍](../image_recognition_pipeline/mainbody_detection.md)。 ## 4. 识别模型导出 diff --git a/docs/zh_CN/inference_deployment/lite_shitu.md b/docs/zh_CN/inference_deployment/lite_shitu.md new file mode 100644 index 0000000000000000000000000000000000000000..24effa306da769b65f988221b67422b73d0d3fca --- /dev/null +++ b/docs/zh_CN/inference_deployment/lite_shitu.md @@ -0,0 +1,414 @@ +# PP-ShiTu在Paddle-Lite端侧部署 + +本教程将介绍基于[Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 在移动端部署PaddleClas PP-ShiTu模型的详细步骤。 + +Paddle Lite是飞桨轻量化推理引擎,为手机、IoT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。 + +## 目录 + +- [1. 环境准备](#1) + - [1.1 准备交叉编译环境](#1.1) + - [1.2 准备预测库](#1.2) + +- [2. 编译流程](#2) + - [2.1 模型准备](#2.1) + - [2.1.1 使用PaddleClase提供的推理模型](#2.1.1) + - [2.1.2 使用其他模型](#2.1.2) + - [2.1.2.1 安装paddle_lite_opt工具](#2.1.2.1) + - [2.1.2.2 转换示例](#2.1.2.2) + - [2.2 生成新的索引库](#2.2) + - [2.2.1 数据集环境配置](#2.2.1) + - [2.2.2 生成新的index文件](#2.2.2) + - [2.3 将yaml文件转换成json文件](#2.3) + - [2.4 index字典转换](#2.4) + - [2.5 与手机联调](#2.5) +- [FAQ](#FAQ) + + + +## 1. 环境准备 + +### 运行准备 +- 电脑(编译Paddle Lite) +- 安卓手机(armv7或armv8) + + + +### 1.1 准备交叉编译环境 +交叉编译环境用于编译 Paddle Lite 和 PaddleClas 的PP-ShiTu Lite demo。 +支持多种开发环境,不同开发环境的编译流程请参考对应文档,请确保安装完成Java jdk、Android NDK(R17以上)。 + +1. [Docker](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#docker) +2. [Linux](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#linux) +3. [MAC OS](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html#mac-os) + +```shell +# 配置完成交叉编译环境后,更新环境变量 +# for docker、Linux +source ~/.bashrc +# for Mac OS +source ~/.bash_profile +``` + + + +### 1.2 准备预测库 + +预测库有两种获取方式: +1. [**建议**]直接下载,预测库下载链接如下: + |平台| 架构 | 预测库下载链接| + |-|-|-| + |Android| arm7 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv7.clang.c++_static.with_extra.with_cv.tar.gz) | + | Android | arm8 | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv.tar.gz) | + | Android | arm8(FP16) | [inference_lite_lib](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip) | + +**注意**:1. 如果是从 Paddle-Lite [官方文档](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc)下载的预测库,注意选择`with_extra=ON,with_cv=ON`的下载链接。2. 目前只提供Android端demo,IOS端demo可以参考[Paddle-Lite IOS demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/master/PaddleLite-ios-demo) + + +2. 编译Paddle-Lite得到预测库,Paddle-Lite的编译方式如下: +```shell +git clone https://github.com/PaddlePaddle/Paddle-Lite.git +cd Paddle-Lite +# 如果使用编译方式,建议使用develop分支编译预测库 +git checkout develop +# FP32 +./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON +# FP16 +./lite/tools/build_android.sh --arch=armv8 --toolchain=clang --with_cv=ON --with_extra=ON --with_arm82_fp16=ON +``` + +**注意**:编译Paddle-Lite获得预测库时,需要打开`--with_cv=ON --with_extra=ON`两个选项,`--arch`表示`arm`版本,这里指定为armv8,更多编译命令介绍请参考[链接](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/arm_cpu.html)。 + +直接下载预测库并解压后,可以得到`inference_lite_lib.android.armv8.clang.c++_static.with_extra.with_cv/`文件夹,通过编译Paddle-Lite得到的预测库位于`Paddle-Lite/build.lite.android.armv8.gcc/inference_lite_lib.android.armv8/`文件夹下。 +预测库的文件目录如下: + +``` +inference_lite_lib.android.armv8/ +|-- cxx C++ 预测库和头文件 +| |-- include C++ 头文件 +| | |-- paddle_api.h +| | |-- paddle_image_preprocess.h +| | |-- paddle_lite_factory_helper.h +| | |-- paddle_place.h +| | |-- paddle_use_kernels.h +| | |-- paddle_use_ops.h +| | `-- paddle_use_passes.h +| `-- lib C++预测库 +| |-- libpaddle_api_light_bundled.a C++静态库 +| `-- libpaddle_light_api_shared.so C++动态库 +|-- java Java预测库 +| |-- jar +| | `-- PaddlePredictor.jar +| |-- so +| | `-- libpaddle_lite_jni.so +| `-- src +|-- demo C++和Java示例代码 +| |-- cxx C++ 预测库demo +| `-- java Java 预测库demo +``` + + + +## 2 编译流程 + + + +### 2.1 模型准备 + +PaddleClas 提供了转换并优化后的推理模型,可以直接参考下方 2.1.1 小节进行下载。如果需要使用其他模型,请参考后续 2.1.2 小节自行转换并优化模型。 + + + +#### 2.1.1 使用PaddleClas提供的推理模型 + +```shell +# 进入lite_ppshitu目录 +cd $PaddleClas/deploy/lite_shitu +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/lite/ppshitu_lite_models_v1.2.tar +tar -xf ppshitu_lite_models_v1.2.tar +rm -f ppshitu_lite_models_v1.2.tar +``` + + + +#### 2.1.2 使用其他模型 + +Paddle-Lite 提供了多种策略来自动优化原始的模型,其中包括量化、子图融合、混合调度、Kernel优选等方法,使用Paddle-Lite的`opt`工具可以自动对inference模型进行优化,目前支持两种优化方式,优化后的模型更轻量,模型运行速度更快。 + +**注意**:如果已经准备好了 `.nb` 结尾的模型文件,可以跳过此步骤。 + + + +##### 2.1.2.1 安装paddle_lite_opt工具 + +安装`paddle_lite_opt`工具有如下两种方法: + +1. [**建议**]pip安装paddlelite并进行转换 + ```shell + pip install paddlelite==2.10rc + ``` + +2. 源码编译Paddle-Lite生成`paddle_lite_opt`工具 + + 模型优化需要Paddle-Lite的`opt`可执行文件,可以通过编译Paddle-Lite源码获得,编译步骤如下: + ```shell + # 如果准备环境时已经clone了Paddle-Lite,则不用重新clone Paddle-Lite + git clone https://github.com/PaddlePaddle/Paddle-Lite.git + cd Paddle-Lite + git checkout develop + # 启动编译 + ./lite/tools/build.sh build_optimize_tool + ``` + + 编译完成后,`opt`文件位于`build.opt/lite/api/`下,可通过如下方式查看`opt`的运行选项和使用方式; + ```shell + cd build.opt/lite/api/ + ./opt + ``` + + `opt`的使用方式与参数与上面的`paddle_lite_opt`完全一致。 + +之后使用`paddle_lite_opt`工具可以进行inference模型的转换。`paddle_lite_opt`的部分参数如下: + +|选项|说明| +|-|-| +|--model_file|待优化的PaddlePaddle模型(combined形式)的网络结构文件路径| +|--param_file|待优化的PaddlePaddle模型(combined形式)的权重文件路径| +|--optimize_out_type|输出模型类型,目前支持两种类型:protobuf和naive_buffer,其中naive_buffer是一种更轻量级的序列化/反序列化实现,默认为naive_buffer| +|--optimize_out|优化模型的输出路径| +|--valid_targets|指定模型可执行的backend,默认为arm。目前可支持x86、arm、opencl、npu、xpu,可以同时指定多个backend(以空格分隔),Model Optimize Tool将会自动选择最佳方式。如果需要支持华为NPU(Kirin 810/990 Soc搭载的达芬奇架构NPU),应当设置为npu, arm| + +更详细的`paddle_lite_opt`工具使用说明请参考[使用opt转化模型文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/opt/opt_bin.html) + +`--model_file`表示inference模型的model文件地址,`--param_file`表示inference模型的param文件地址;`optimize_out`用于指定输出文件的名称(不需要添加`.nb`的后缀)。直接在命令行中运行`paddle_lite_opt`,也可以查看所有参数及其说明。 + + + +##### 2.1.2.2 转换示例 + +下面介绍使用`paddle_lite_opt`完成主体检测模型和识别模型的预训练模型,转成inference模型,最终转换成Paddle-Lite的优化模型的过程。 + +1. 转换主体检测模型 + +```shell +# 当前目录为 $PaddleClas/deploy/lite_shitu +# $code_path需替换成相应的运行目录,可以根据需要,将$code_path设置成需要的目录 +export code_path=~ +cd $code_path +git clone https://github.com/PaddlePaddle/PaddleDetection.git +# 进入PaddleDetection根目录 +cd PaddleDetection +# 切换到2.3分支 +git checkout release/2.3 +# 将预训练模型导出为inference模型 +python tools/export_model.py -c configs/picodet/application/mainbody_detection/picodet_lcnet_x2_5_640_mainbody.yml -o weights=https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody.pdparams export_post_process=False --output_dir=inference +# 将inference模型转化为Paddle-Lite优化模型 +paddle_lite_opt --model_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdmodel --param_file=inference/picodet_lcnet_x2_5_640_mainbody/model.pdiparams --optimize_out=inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det +# 将转好的模型复制到lite_shitu目录下 +cd $PaddleClas/deploy/lite_shitu +mkdir models +cp $code_path/PaddleDetection/inference/picodet_lcnet_x2_5_640_mainbody/mainbody_det.nb $PaddleClas/deploy/lite_shitu/models +``` + +2. 转换识别模型 + +```shell +# 识别模型下载 +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar +# 解压模型 +tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar +# 转换为Paddle-Lite模型 +paddle_lite_opt --model_file=general_PPLCNet_x2_5_lite_v1.0_infer/inference.pdmodel --param_file=general_PPLCNet_x2_5_lite_v1.0_infer/inference.pdiparams --optimize_out=general_PPLCNet_x2_5_lite_v1.0_infer/rec +# 将模型文件拷贝到lite_shitu下 +cp general_PPLCNet_x2_5_lite_v1.0_infer/rec.nb deploy/lite_shitu/models/ +``` + +**注意**:`--optimize_out` 参数为优化后模型的保存路径,无需加后缀`.nb`;`--model_file` 参数为模型结构信息文件的路径,`--param_file` 参数为模型权重信息文件的路径,请注意文件名。 + + + +### 2.2 生成新的检索库 + +由于lite 版本的检索库用的是`faiss1.5.3`版本,与新版本不兼容,因此需要重新生成index库 + + + +#### 2.2.1 数据及环境配置 + +```shell +# 进入PaddleClas根目录 +cd $PaddleClas +# 安装PaddleClas +python setup.py install +cd deploy +# 下载瓶装饮料数据集 +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar +rm -rf drink_dataset_v1.0.tar +rm -rf drink_dataset_v1.0/index + +# 安装1.5.3版本的faiss +pip install faiss-cpu==1.5.3 + +# 下载通用识别模型,可替换成自己的inference model +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar +tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar +rm -rf general_PPLCNet_x2_5_lite_v1.0_infer.tar +``` + + + +#### 2.2.2 生成新的index文件 + +```shell +# 生成新的index库,注意指定好识别模型的路径,同时将index_mothod修改成Flat,HNSW32和IVF在此版本中可能存在bug,请慎重使用。 +# 如果使用自己的识别模型,对应的修改inference model的目录 +python python/build_gallery.py -c configs/inference_drink.yaml -o Global.rec_inference_model_dir=general_PPLCNet_x2_5_lite_v1.0_infer -o IndexProcess.index_method=Flat + +# 进入到lite_shitu目录 +cd lite_shitu +mv ../drink_dataset_v1.0 . +``` + + + +### 2.3 将yaml文件转换成json文件 + +```shell +# 如果测试单张图像,路径使用相对路径 +python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_path images/demo.jpeg +# or +# 如果测试多张图像 +python generate_json_config.py --det_model_path ppshitu_lite_models_v1.2/mainbody_PPLCNet_x2_5_640_v1.2_lite.nb --rec_model_path ppshitu_lite_models_v1.2/general_PPLCNet_x2_5_lite_v1.2_infer.nb --img_dir images +# 执行完成后,会在lit_shitu下生成shitu_config.json配置文件 +``` + + + +### 2.4 index字典转换 + +由于python的检索库字典,使用`pickle`进行的序列化存储,导致C++不方便读取,因此需要进行转换 + +```shell + +# 转化id_map.pkl为id_map.txt +python transform_id_map.py -c ../configs/inference_drink.yaml +``` +转换成功后,会在`IndexProcess.index_dir`目录下生成`id_map.txt`。 + + + +### 2.5 与手机联调 + +首先需要进行一些准备工作。 +1. 准备一台arm8的安卓手机,如果编译的预测库是armv7,则需要arm7的手机,并修改Makefile中`ARM_ABI=arm7`。 +2. 电脑上安装ADB工具,用于调试。 ADB安装方式如下: + + 2.1. MAC电脑安装ADB: + + ```shell + brew cask install android-platform-tools + ``` + 2.2. Linux安装ADB + ```shell + sudo apt update + sudo apt install -y wget adb + ``` + 2.3. Window安装ADB + + win上安装需要去谷歌的安卓平台下载ADB软件包进行安装:[链接](https://developer.android.com/studio) + +3. 手机连接电脑后,开启手机`USB调试`选项,选择`文件传输`模式,在电脑终端中输入: + +```shell +adb devices +``` +如果有device输出,则表示安装成功,如下所示: +``` +List of devices attached +744be294 device +``` + +4. 编译lite部署代码生成移动端可执行文件 + +```shell +cd $PaddleClas/deploy/lite_shitu +# ${lite prediction library path}下载的Paddle-Lite库路径 +inference_lite_path=${lite prediction library path}/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.with_cv/ +mkdir $inference_lite_path/demo/cxx/ppshitu_lite + +cp -r * $inference_lite_path/demo/cxx/ppshitu_lite +cd $inference_lite_path/demo/cxx/ppshitu_lite + +# 执行编译,等待完成后得到可执行文件main +make ARM_ABI=arm8 +#如果是arm7,则执行 make ARM_ABI = arm7 (或者在Makefile中修改该项) +``` + +5. 准备优化后的模型、预测库文件、测试图像。 + +```shell +mkdir deploy +# 移动的模型路径要和之前生成的json文件中模型路径一致 +mv ppshitu_lite_models_v1.2 deploy/ +mv drink_dataset_v1.0 deploy/ +mv images deploy/ +mv shitu_config.json deploy/ +cp pp_shitu deploy/ + +# 将C++预测动态库so文件复制到deploy文件夹中 +cp ../../../cxx/lib/libpaddle_light_api_shared.so deploy/ +``` + +执行完成后,deploy文件夹下将有如下文件格式: + +```shell +deploy/ +|-- ppshitu_lite_models_v1.1/ +| |--mainbody_PPLCNet_x2_5_640_quant_v1.1_lite.nb 优化后的主体检测模型文件 +| |--general_PPLCNet_x2_5_lite_v1.1_infer.nb 优化后的识别模型文件 +|-- images/ +| |--demo.jpg 图片文件 +|-- drink_dataset_v1.0/ 瓶装饮料demo数据 +| |--index 检索index目录 +|-- pp_shitu 生成的移动端执行文件 +|-- shitu_config.json 执行时参数配置文件 +|-- libpaddle_light_api_shared.so Paddle-Lite库文件 +``` + +**注意:** +* `shitu_config.json` 包含了目标检测的超参数,请按需进行修改 + +6. 启动调试,上述步骤完成后就可以使用ADB将文件夹 `deploy/` push到手机上运行,步骤如下: + +```shell +# 将上述deploy文件夹push到手机上 +adb push deploy /data/local/tmp/ + +adb shell +cd /data/local/tmp/deploy +export LD_LIBRARY_PATH=/data/local/tmp/deploy:$LD_LIBRARY_PATH + +# 修改权限为可执行 +chmod 777 pp_shitu +# 执行程序 +./pp_shitu shitu_config.json +``` + +如果对代码做了修改,则需要重新编译并push到手机上。 + +运行效果如下: +``` +images/demo.jpeg: + result0: bbox[344, 98, 527, 593], score: 0.811656, label: 红牛-强化型 + result1: bbox[0, 0, 600, 600], score: 0.729664, label: 红牛-强化型 +``` + + + +## FAQ + +Q1:如果想更换模型怎么办,需要重新按照流程走一遍吗? +A1:如果已经走通了上述步骤,更换模型只需要替换 `.nb` 模型文件即可,同时要注意修改下配置文件中的 `.nb` 文件路径以及类别映射文件(如有必要)。 + +Q2:换一个图测试怎么做? +A2:替换 deploy 下的测试图像为你想要测试的图像,并重新生成json配置文件(或者直接修改图像路径),使用 ADB 再次 push 到手机上即可。 diff --git a/docs/zh_CN/inference_deployment/python_deploy.md b/docs/zh_CN/inference_deployment/python_deploy.md index 22b871344b782098ef9ded562cc7f2ce4277f790..06b3b67061b114590ae13290d07bef4569ac13d0 100644 --- a/docs/zh_CN/inference_deployment/python_deploy.md +++ b/docs/zh_CN/inference_deployment/python_deploy.md @@ -1,7 +1,5 @@ # Python 预测推理 ---- - 首先请参考文档[环境准备](../installation/install_paddleclas.md)配置运行环境。 ## 目录 @@ -13,47 +11,50 @@ - [2.3 PP-ShiTu PipeLine推理](#2.3) + ## 1. 图像分类推理 首先请参考文档[模型导出](./export_model.md)准备 inference 模型,然后进入 PaddleClas 的 `deploy` 目录下: ```shell -cd /path/to/PaddleClas/deploy +cd PaddleClas/deploy ``` 使用以下命令进行预测: ```shell -python python/predict_cls.py -c configs/inference_cls.yaml +python3.7 python/predict_cls.py -c configs/inference_cls.yaml ``` 在配置文件 `configs/inference_cls.yaml` 中有以下字段用于配置预测参数: -* `Global.infer_imgs`:待预测的图片文件路径; -* `Global.inference_model_dir`:inference 模型文件所在目录,该目录下需要有文件 `inference.pdmodel` 和 `inference.pdiparams` 两个文件; -* `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认为 `False`; +* `Global.infer_imgs`:待预测的图片文件(夹)路径; +* `Global.inference_model_dir`:inference 模型文件所在文件夹的路径,该文件夹下需要有文件 `inference.pdmodel` 和 `inference.pdiparams` 两个文件; * `Global.use_gpu`:是否使用 GPU 预测,默认为 `True`; * `Global.enable_mkldnn`:是否启用 `MKL-DNN` 加速库,默认为 `False`。注意 `enable_mkldnn` 与 `use_gpu` 同时为 `True` 时,将忽略 `enable_mkldnn`,而使用 GPU 预测; * `Global.use_fp16`:是否启用 `FP16`,默认为 `False`; +* `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认为 `False`; * `PreProcess`:用于数据预处理配置; * `PostProcess`:由于后处理配置; -* `PostProcess.Topk.class_id_map_file`:数据集 label 的映射文件,默认为 `./utils/imagenet1k_label_list.txt`,该文件为 PaddleClas 所使用的 ImageNet 数据集 label 映射文件。 +* `PostProcess.Topk.class_id_map_file`:数据集 label 的映射文件,默认为 `../ppcls/utils/imagenet1k_label_list.txt`,该文件为 PaddleClas 所使用的 ImageNet 数据集 label 映射文件。 **注意**: -* 如果使用 VisionTransformer 系列模型,如 `DeiT_***_384`, `ViT_***_384` 等,请注意模型的输入数据尺寸,部分模型需要修改参数: `PreProcess.resize_short=384`, `PreProcess.resize=384`。 +* 如果使用 VisionTransformer 系列模型,如 `DeiT_***_384`, `ViT_***_384` 等,请注意模型的输入数据尺寸,该类模型需要修改参数: `PreProcess.resize_short=384`, `PreProcess.resize=384`。 * 如果你希望提升评测模型速度,使用 GPU 评测时,建议开启 TensorRT 加速预测,使用 CPU 评测时,建议开启 MKL-DNN 加速预测。 + ## 2. PP-ShiTu模型推理 -PP-ShiTu整个Pipeline包含三部分:主体检测、特提取模型、特征检索。其中主体检测、特征模型可以单独推理使用。单独主体检测详见[2.1](#2.1),特征提取模型单独推理详见[2.2](#2.2), PP-ShiTu整体推理详见[2.3](#2.3)。 +PP-ShiTu整个Pipeline包含三部分:主体检测、特征提取模型、特征检索。其中主体检测模型、特征提取模型可以单独推理使用。单独使用主体检测详见[主体检测模型推理](#2.1),特征提取模型单独推理详见[特征提取模型推理](#2.2), PP-ShiTu整体推理详见[PP-ShiTu PipeLine推理](#2.3)。 + ### 2.1 主体检测模型推理 进入 PaddleClas 的 `deploy` 目录下: ```shell -cd /path/to/PaddleClas/deploy +cd PaddleClas/deploy ``` 准备 PaddleClas 提供的主体检测 inference 模型: @@ -61,28 +62,28 @@ cd /path/to/PaddleClas/deploy ```shell mkdir -p models # 下载通用检测 inference 模型并解压 -wget -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar -tar -xf ./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar -C ./models/ +wget -nc -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar +tar -xf ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar -C ./models/ ``` 使用以下命令进行预测: ```shell -python python/predict_det.py -c configs/inference_det.yaml +python3.7 python/predict_det.py -c configs/inference_det.yaml ``` 在配置文件 `configs/inference_det.yaml` 中有以下字段用于配置预测参数: * `Global.infer_imgs`:待预测的图片文件路径; * `Global.use_gpu`: 是否使用 GPU 预测,默认为 `True`。 - + ### 2.2 特征提取模型推理 -下面以商品特征提取为例,介绍特征提取模型推理。首先进入 PaddleClas 的 `deploy` 目录下: +下面以商品图片的特征提取为例,介绍特征提取模型推理。首先进入 PaddleClas 的 `deploy` 目录下: ```shell -cd /path/to/PaddleClas/deploy +cd PaddleClas/deploy ``` 准备 PaddleClas 提供的商品特征提取 inference 模型: @@ -90,13 +91,24 @@ cd /path/to/PaddleClas/deploy ```shell mkdir -p models # 下载商品特征提取 inference 模型并解压 -wget -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar -tar -xf ./models/product_ResNet50_vd_aliproduct_v1.0_infer.tar -C ./models/ +wget -nc -P ./models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar +tar -xf ./models/general_PPLCNetV2_base_pretrained_v1.0_infer.tar -C ./models/ +``` + +使用以下命令进行预测: + +```shell +python3.7 python/predict_rec.py -c configs/inference_rec.yaml ``` 上述预测命令可以得到一个 512 维的特征向量,直接输出在在命令行中。 +在配置文件 `configs/inference_det.yaml` 中有以下字段用于配置预测参数: +* `Global.infer_imgs`:待预测的图片文件路径; +* `Global.use_gpu`: 是否使用 GPU 预测,默认为 `True`。 + + ### 2.3. PP-ShiTu PipeLine推理 -主体检测、特征提取和向量检索的串联预测,可以参考图像识别[快速体验](../quick_start/quick_start_recognition.md)。 +主体检测、特征提取和向量检索的串联预测,可以参考[图像识别快速开始](../quick_start/quick_start_recognition.md)。 diff --git a/docs/zh_CN/inference_deployment/recognition_serving_deploy.md b/docs/zh_CN/inference_deployment/recognition_serving_deploy.md index f823f7f284a6179a78d8bb61c027d17259674acb..b5554cf5ece467e4c2508ef88e1ca5a6df7640cd 100644 --- a/docs/zh_CN/inference_deployment/recognition_serving_deploy.md +++ b/docs/zh_CN/inference_deployment/recognition_serving_deploy.md @@ -14,6 +14,7 @@ - [4. FAQ](#4-faq) + ## 1. 简介 [Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻松部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。 @@ -21,6 +22,7 @@ 该部分以 HTTP 预测服务部署为例,介绍怎样在 PaddleClas 中使用 PaddleServing 部署模型服务。目前只支持 Linux 平台部署,暂不支持 Windows 平台。 + ## 2. Serving 安装 Serving 官网推荐使用 docker 安装并部署 Serving 环境。首先需要拉取 docker 环境并创建基于 Serving 的 docker。 @@ -59,12 +61,12 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD * 如果安装速度太慢,可以通过 `-i https://pypi.tuna.tsinghua.edu.cn/simple` 更换源,加速安装过程。 * 其他环境配置安装请参考:[使用Docker安装Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md) - - + ## 3. 图像识别服务部署 使用 PaddleServing 做图像识别服务化部署时,**需要将保存的多个 inference 模型都转换为 Serving 模型**。 下面以 PP-ShiTu 中的超轻量图像识别模型为例,介绍图像识别服务的部署。 + ### 3.1 模型转换 @@ -79,8 +81,8 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD mkdir models cd models # 下载并解压通用识别模型 - wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar - tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar + wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar + tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar # 下载并解压通用检测模型 wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar @@ -89,37 +91,26 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ```shell # 转换通用识别模型 python3.7 -m paddle_serving_client.convert \ - --dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \ + --dirname ./general_PPLCNetV2_base_pretrained_v1.0_infer/ \ --model_filename inference.pdmodel \ --params_filename inference.pdiparams \ - --serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \ - --serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/ + --serving_server ./general_PPLCNetV2_base_pretrained_v1.0_serving/ \ + --serving_client ./general_PPLCNetV2_base_pretrained_v1.0_client/ ``` 上述命令的参数含义与[#3.1 模型转换](#3.1)相同 - 通用识别 inference 模型转换完成后,会在当前文件夹多出 `general_PPLCNet_x2_5_lite_v1.0_serving/` 和 `general_PPLCNet_x2_5_lite_v1.0_client/` 的文件夹,具备如下结构: + 通用识别 inference 模型转换完成后,会在当前文件夹多出 `general_PPLCNetV2_base_pretrained_v1.0_serving/` 和 `general_PPLCNetV2_base_pretrained_v1.0_client/` 的文件夹,具备如下结构: ```shell - ├── general_PPLCNet_x2_5_lite_v1.0_serving/ + ├── general_PPLCNetV2_base_pretrained_v1.0_serving/ │ ├── inference.pdiparams │ ├── inference.pdmodel │ ├── serving_server_conf.prototxt │ └── serving_server_conf.stream.prototxt │ - └── general_PPLCNet_x2_5_lite_v1.0_client/ + └── general_PPLCNetV2_base_pretrained_v1.0_client/ ├── serving_client_conf.prototxt └── serving_client_conf.stream.prototxt ``` -- 转换通用检测 inference 模型为 Serving 模型: - ```shell - # 转换通用检测模型 - python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \ - --model_filename inference.pdmodel \ - --params_filename inference.pdiparams \ - --serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \ - --serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/ - ``` - 上述命令的参数含义与[#3.1 模型转换](#3.1)相同 - - 识别推理模型转换完成后,会在当前文件夹多出 `general_PPLCNet_x2_5_lite_v1.0_serving/` 和 `general_PPLCNet_x2_5_lite_v1.0_client/` 的文件夹。分别修改 `general_PPLCNet_x2_5_lite_v1.0_serving/` 和 `general_PPLCNet_x2_5_lite_v1.0_client/` 目录下的 `serving_server_conf.prototxt` 中的 `alias` 名字: 将 `fetch_var` 中的 `alias_name` 改为 `features`。 修改后的 `serving_server_conf.prototxt` 内容如下 + 接下来分别修改 `general_PPLCNetV2_base_pretrained_v1.0_serving/` 和 `general_PPLCNetV2_base_pretrained_v1.0_client/` 目录下的 `serving_server_conf.prototxt` 中的 `alias` 名字: 将 `fetch_var` 中的 `alias_name` 改为 `features`。修改后的 `serving_server_conf.prototxt` 内容如下 ```log feed_var { @@ -132,13 +123,24 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD shape: 224 } fetch_var { - name: "save_infer_model/scale_0.tmp_1" + name: "batch_norm_25.tmp_2" alias_name: "features" is_lod_tensor: false fetch_type: 1 shape: 512 } ``` +- 转换通用检测 inference 模型为 Serving 模型: + ```shell + # 转换通用检测模型 + python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \ + --model_filename inference.pdmodel \ + --params_filename inference.pdiparams \ + --serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \ + --serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/ + ``` + 上述命令的参数含义与[#3.1 模型转换](#3.1)相同 + 通用检测 inference 模型转换完成后,会在当前文件夹多出 `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` 和 `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` 的文件夹,具备如下结构: ```shell ├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ @@ -151,25 +153,27 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ├── serving_client_conf.prototxt └── serving_client_conf.stream.prototxt ``` - 上述命令中参数具体含义如下表所示 - | 参数 | 类型 | 默认值 | 描述 | - | ----------------- | ---- | ------------------ | ------------------------------------------------------------ | - | `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。 | - | `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 | - | `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None | - | `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server | - | `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client | + 上述转换命令的参数具体含义如下表所示 + | 参数 | 类型 | 默认值 | 描述 | + | ----------------- | ---- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | + | `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。 | + | `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 | + | `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None | + | `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server | + | `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client | - 下载并解压已经构建后完成的检索库 index ```shell # 回到deploy目录 cd ../ # 下载构建完成的检索库 index - wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar + wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar # 解压构建完成的检索库 index - tar -xf drink_dataset_v1.0.tar + tar -xf drink_dataset_v2.0.tar ``` + + ### 3.2 服务部署和请求 **注意:** 识别服务涉及到多个模型,出于性能考虑采用 PipeLine 部署方式。Pipeline 部署方式当前不支持 windows 平台。 @@ -190,6 +194,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ``` + #### 3.2.1 Python Serving - 启动服务: @@ -204,30 +209,32 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD ``` 成功运行后,模型预测的结果会打印在客户端中,如下所示: ```log - {'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [345, 95, 524, 576], 'rec_docs': '红牛-强化型', 'rec_scores': 0.79903316}]"], 'tensors': []} + {'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [438, 71, 660, 712], 'rec_docs': '元气森林', 'rec_scores': 0.7581642}, {'bbox': [220, 72, 449, 689], 'rec_docs': '元气森林', 'rec_scores': 0.68961805}, {'bbox': [794, 104, 978, 652], 'rec_docs': '元气森林', 'rec_scores': 0.63075215}]"], 'tensors': []} ``` + #### 3.2.2 C++ Serving 与Python Serving不同,C++ Serving客户端调用 C++ OP来预测,因此在启动服务之前,需要编译并安装 serving server包,并设置 `SERVING_BIN`。 - 编译并安装Serving server包 ```shell # 进入工作目录 - cd PaddleClas/deploy/paddleserving + cd ./deploy/paddleserving + # 一键编译安装Serving server、设置 SERVING_BIN source ./build_server.sh python3.7 ``` - **注:**[build_server.sh](../build_server.sh#L55-L62)所设定的路径可能需要根据实际机器上的环境如CUDA、python版本等作一定修改,然后再编译;如果执行`build_server.sh`过程中遇到非网络原因的报错,则可以手动将脚本中的命令逐条复制到终端执行。 + **注:** [build_server.sh](../build_server.sh#L55-L62) 所设定的路径可能需要根据实际机器上的环境如CUDA、python版本等作一定修改,然后再编译;如果执行 `build_server.sh` 过程中遇到非网络原因的报错,则可以手动将脚本中的命令逐条复制到终端执行。 - C++ Serving使用的输入输出格式与Python不同,因此需要执行以下命令,将4个文件复制到下的文件覆盖掉[3.1](#31-模型转换)得到文件夹中的对应4个prototxt文件。 ```shell - # 进入PaddleClas/deploy目录 - cd PaddleClas/deploy/ + # 回到deploy目录 + cd ../ # 覆盖prototxt文件 - \cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_serving/ - \cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_client/ + \cp ./paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_serving/*.prototxt ./models/general_PPLCNetV2_base_pretrained_v1.0_serving/ + \cp ./paddleserving/recognition/preprocess/general_PPLCNetV2_base_pretrained_v1.0_client/*.prototxt ./models/general_PPLCNetV2_base_pretrained_v1.0_client/ \cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/ \cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ ``` @@ -235,7 +242,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD - 启动服务: ```shell # 进入工作目录 - cd PaddleClas/deploy/paddleserving/recognition + cd ./paddleserving/recognition # 端口号默认为9400;运行日志默认保存在 log_PPShiTu.txt 中 # CPU部署 @@ -252,9 +259,9 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD 成功运行后,模型预测的结果会打印在客户端中,如下所示: ```log WARNING: Logging before InitGoogleLogging() is written to STDERR - I0614 03:01:36.273097 6084 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9400"): added 1 - I0614 03:01:37.393564 6084 general_model.cpp:490] [client]logid=0,client_cost=1107.82ms,server_cost=1101.75ms. - [{'bbox': [345, 95, 524, 585], 'rec_docs': '红牛-强化型', 'rec_scores': 0.8073724}] + I0903 16:03:20.020586 35600 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9400"): added 1 + I0903 16:03:21.346057 35600 general_model.cpp:490] [client]logid=0,client_cost=1306.26ms,server_cost=1293.65ms. + [{'bbox': [437, 71, 660, 727], 'rec_docs': '元气森林', 'rec_scores': 0.76902336}, {'bbox': [222, 72, 449, 700], 'rec_docs': '元气森林', 'rec_scores': 0.69347066}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305151}] ``` - 关闭服务 @@ -265,6 +272,7 @@ python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUD 执行完毕后出现`Process stopped`信息表示成功关闭服务。 + ## 4. FAQ **Q1**: 发送请求后没有结果返回或者提示输出解码报错 @@ -276,6 +284,6 @@ unset http_proxy ``` **Q2**: 启动服务后没有任何反应 -**A2**: 可以检查`config.yml`中`model_config`对应的路径是否存在,文件夹命名是否正确 +**A2**: 可以检查 `config.yml` 中 `model_config` 对应的路径是否存在,文件夹命名是否正确 更多的服务部署类型,如 `RPC 预测服务` 等,可以参考 Serving 的[github 官网](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples) diff --git a/docs/zh_CN/inference_deployment/shitu_gallery_manager.md b/docs/zh_CN/inference_deployment/shitu_gallery_manager.md new file mode 100644 index 0000000000000000000000000000000000000000..4023ff9c4e87a79c33d30fdbc16d4f479a88f51b --- /dev/null +++ b/docs/zh_CN/inference_deployment/shitu_gallery_manager.md @@ -0,0 +1,195 @@ +# PP-ShiTu 库管理工具 + +本工具是PP-ShiTu的离线库管理工具,主要功能包括:新建图像库、更改图像库、建立索引库、更新索引库等功能。此工具是为了用户能够可视化的管理图像及对应的index库,用户可根据实际情况,灵活的增删改查相应的gallery图像库及索引文件,在提升用户体验的同时,辅助PP-ShiTu在实际应用的过程中达到更好的效果。 + +目前此工具支持平台包括: + +- Mac +- Windows +- Linux(注意,由于linux输入法问题,可能无法支持中文) + +## 目录 + +- [1. 功能介绍](#1) + + - [1.1 新建图像库](#1.1) + - [1.2 打开图像库](#1.2) + - [1.3 导入图像](#1.3) + - [1.4 图像操作](#1.3) + + - [1.5 其他功能](#1.5) + +- [2. 使用说明](#2) + + - [2.1 环境安装](#2.1) + - [2.2 模型准备](#2.2) + - [2.3运行使用](#2.3) + +- [3.生成文件介绍](#3) + +- [致谢](#4) + +- [FAQ](#FAQ) + + + +## 1. 功能介绍 + +此工具主要功能包括: + +- 构建`PP-ShiTu`中索引库对应的`gallery`图像库 +- 根据构建的`gallery`图像库,生成索引库 +- 对`gallery`图像库进行操作,如增删改查等操作,并更新对应的索引库 + +其中主界面的按钮如下图所示 + +
+ +

界面按钮展示

+
+ +上图中第一行包括:`主要功能按钮`、`保存按钮`、`新增类别按钮`、`删减类别按钮`。 + +第二行包括:`搜索框`、`搜索确定键`、`新加图像按钮`、`删除图像按钮`。 + +下面将进行具体功能介绍,其操作入口,可以点击`主要功能按钮`下拉菜单查看,如下图所示: + +
+ +

主要功能展示

+
+ + + +### 1.1 新建图像库 + +点击新建库功能后,会选择一个**空的存储目录**或者**新建目录**,此时所有的图片及对应的索引库都会存放在此目录下。完成操作后,如下图所示 + +
+ +

新建库

+
+ +此时,用户可以新建类别具体可以点击`新增类别按钮`、`删减类别按钮`。选中类别后,可以进行添加图像及相关操作,具体可以点击及`新加图像按钮`、`删除图像按钮`。完成操作后,**注意保存**。 + + + +### 1.2 打开图像库 + +此功能是,用此工具存储好的库,进行打开编辑。注意,**打开库时,请选择打开的是新建库时文件夹路径**。打开库后,示例如下 + +
+ +

打开库

+
+ + + +### 1.3 导入图像 + +在打开图像库或者新建图像库完成后,可以使用导入图像功能,即导入用户自己生成好的图像库。具体有支持两种导入格式 + +- image_list格式:打开具体的`.txt`文件。`.txt`文件中每一行格式: `image_path label`。跟据文件路径及label导入 +- 多文件夹格式:打开`具体文件夹`,此文件夹下存储多个子文件夹,每个子文件夹名字为`label_name`,每个子文件夹中存储对应的图像数据。 + + + +### 1.4 图像操作 + +选择图像后,鼠标右击可以进行如下操作,可以根据需求,选择具体的操作,**注意修改完成图像后,请点击保存按钮,进行保存** + +
+ +

图像操作

+
+ + + +### 1.5 生成、更新index库 + +在用户完成图像库的新建、打开或者修改,并完成保存操作后。可以点击`主要功能按钮`中`新建/重建索引库`、`更新索引库`等功能,进行索引库的新建或者更新,生成`PP-ShiTu`使用的Index库 + + + +## 2. 使用说明 + + + +### 2.1 环境安装 + +安装好`PaddleClas`后 + +```shell +pip install fastapi +pip install uvicorn +pip install pyqt5 +``` + + + +### 2.2 模型准备 + +请按照[PP-ShiTu快速体验](../quick_start/quick_start_recognition.md#2.2.1)中下载及准备inference model,并修改好`${PaddleClas}/deploy/configs/inference_drink.yaml`的相关参数。 + + + +### 2.3 运行使用 + +运行方式如下 + +```shell +cd ${PaddleClas}/deploy/shitu_index_manager +python index_manager.py -c ../configs/inference_drink.yaml +``` + + + +## 3. 生成文件介绍 + +使用此工具后,会生成如下格式的文件 + +```shell +index_root/ # 库存储目录 +|-- image_list.txt # 图像列表,每行:image_path label。由前端生成及修改,后端只读 +|-- images # 图像存储目录,由前端生成及增删查等操作。后端只读 +| |-- md5.jpg +| |-- md5.jpg +| |-- …… +|-- features.pkl # 建库之后,保存的embedding向量,后端生成,前端无需操作 +|-- index # 真正的生成的index库存储目录,后端生成及操作,前端无需操作。 +| |-- vector.index # faiss生成的索引库 +| |-- id_map.pkl # 索引文件 +``` + +其中`index_root`是使用此工具时,用户选择的存储目录,库的索引文件存储在`index`文件夹中。 + +使用`PP-ShiTu`时,索引文件目录需换成`index`文件夹的地址。 + + + +## 致谢 + +此工具的前端主要由[国内qt论坛](http://www.qtcn.org/)总版主[小熊宝宝](https://github.com/cnhemiya)完成,感谢**小熊宝宝**的大力支持~~ + +此工具前端原项目地址:https://github.com/cnhemiya/shitu-manager + + + +## FAQ + +- 问题1: 点击新建索引库后,程序假死 + + 答:生成索引库比较耗时,耐心等待一段时间就好 + +- 问题2: 导入图像是什么格式? + + 答: 目前支持两种格式 1)image_list 格式,list中每行格式:path label。2)文件夹格式:类似`ImageNet`存储方式 + +- 问题3: 生成 index库报错 + + 答:在修改图像后,必须点击保存按钮,保存完成后,再继续生成index库。 + +- 问题4: 报错 图像与index库不一致 + + 答:可能用户自己修改了image_list.txt,修改完成后,请及时更新index库,保证其一致。 + diff --git a/docs/zh_CN/inference_deployment/whl_deploy.md b/docs/zh_CN/inference_deployment/whl_deploy.md index 0b84c83e5c846dea76f26334a652ec2a9e819cba..1f8b8995b93f393ed91c774b03ed9bc9101a3961 100644 --- a/docs/zh_CN/inference_deployment/whl_deploy.md +++ b/docs/zh_CN/inference_deployment/whl_deploy.md @@ -23,17 +23,16 @@ PaddleClas 支持 Python Whl 包方式进行预测,目前 Whl 包方式仅支 ## 1. 安装 paddleclas -* pip 安装 +* **[推荐]** 直接 pip 安装: ```bash -pip3 install paddleclas==2.2.1 +pip3 install paddleclas ``` -* 本地构建并安装 +* 如需使用 PaddleClas develop 分支体验最新功能,或是需要基于 PaddleClas 进行二次开发,请本地构建安装: ```bash -python3 setup.py bdist_wheel -pip3 install dist/* +python3 setup.py install ``` diff --git a/docs/zh_CN/installation/install_paddleclas.md b/docs/zh_CN/installation/install_paddleclas.md index 108033ed092ec7130145724d40f8498dca453127..752b81ad69296c1cdd3f4c5aa387f35e1a7108b7 100644 --- a/docs/zh_CN/installation/install_paddleclas.md +++ b/docs/zh_CN/installation/install_paddleclas.md @@ -98,16 +98,16 @@ git clone https://gitee.com/paddlepaddle/PaddleClas.git -b release/2.4 ### 1.3 安装 PaddleClas 及其 Python 依赖库 -建议直接从 PyPI 安装 PaddleClas: +* **[建议]** 直接安装 PaddleClas: ```shell pip install paddleclas ``` -PaddleClas 的 Python 依赖库在 `requirements.txt` 中给出,可通过如下命令安装: +* 如需使用 PaddleClas develop 分支体验最新功能,或是需要基于 PaddleClas 进行二次开发,请本地构建安装,命令如下: ```shell -pip install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple +python setup.py install ``` diff --git a/docs/zh_CN/introduction/ppshitu_application_scenarios.md b/docs/zh_CN/introduction/ppshitu_application_scenarios.md new file mode 100644 index 0000000000000000000000000000000000000000..29c80cdf950565d505af3f0656bff52141a2382d --- /dev/null +++ b/docs/zh_CN/introduction/ppshitu_application_scenarios.md @@ -0,0 +1,201 @@ +# PP-ShiTu应用场景介绍 + +该文档介绍了PP-ShiTu提供的各种应用场景库简介、下载链接以及使用简介。 + +------ + +## 目录 + +- [1. 应用场景介绍](#1-应用场景介绍) +- [2. 使用说明](#2-使用说明) + - [2.1 环境配置](#21-环境配置) + - [2.2 下载、解压场景库数据](#22-下载解压场景库数据) + - [2.3 准备模型](#23-准备模型) + - [2.4 场景库识别与检索](#24-场景库识别与检索) + - [2.4.1 识别单张图像](#241-识别单张图像) + - [2.4.2 基于文件夹的批量识别](#242-基于文件夹的批量识别) + + + + +## 1. 应用场景介绍 + +PP-ShiTu对原数据集进行了`Gallery`库和`Query`库划分,并生成了对应的`Index`索引库,具体应用场景介绍和下载地址如下表所示。 + +| 场景 |示例图|场景简介|Recall@1|场景库下载地址|原数据集下载地址| +|:---:|:---:|:---:|:---:|:---:|:---:| +| 球类 | |各种球类识别 | 0.9769 | [Balls](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Balls.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/balls-image-classification) | +| 狗识别 | | 狗细分类识别,包括69种狗的图像 | 0.9606 | [DogBreeds](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/DogBreeds.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/70-dog-breedsimage-data-set) | +| 宝石 | | 宝石种类识别 | 0.9653 | [Gemstones](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Gemstones.tar) | [原数据下载地址](https://www.kaggle.com/datasets/lsind18/gemstones-images) | +| 动物 | |各种动物识别 | 0.9078 | [AnimalImageDataset](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/AnimalImageDataset.tar) | [原数据下载地址](https://www.kaggle.com/datasets/iamsouravbanerjee/animal-image-dataset-90-different-animals) | +| 鸟类 | |鸟细分类识别,包括400种各种姿态的鸟类图像 | 0.9673 | [Bird400](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Bird400.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) | +| 交通工具 | |车、船等交通工具粗分类识别 | 0.9307 | [Vechicles](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Vechicles.tar) | [原数据下载地址](https://www.kaggle.com/datasets/rishabkoul1/vechicle-dataset) | +| 花 | |104种花细分类识别 | 0.9788 | [104flowers](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/104flowrs.tar) | [原数据下载地址](https://www.kaggle.com/datasets/msheriey/104-flowers-garden-of-eden) | +| 运动种类 | |100种运动图像识别 | 0.9413 | [100sports](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/100sports.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/sports-classification) | +| 乐器 | |30种不同乐器种类识别 | 0.9467 | [MusicInstruments](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/MusicInstruments.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/musical-instruments-image-classification) | +| 宝可梦 | |宝可梦神奇宝贝识别 | 0.9236 | [Pokemon](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Pokemon.tar) | [原数据下载地址](https://www.kaggle.com/datasets/lantian773030/pokemonclassification) | +| 船 | |船种类识别 |0.9242 | [Boat](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Boat.tar) | [原数据下载地址](https://www.kaggle.com/datasets/imsparsh/dockship-boat-type-classification) | +| 鞋子 | |鞋子种类识别,包括靴子、拖鞋等 | 0.9000 | [Shoes](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Shoes.tar) | [原数据下载地址](https://www.kaggle.com/datasets/noobyogi0100/shoe-dataset) | +| 巴黎建筑 | |巴黎著名建筑景点识别,如:巴黎铁塔、圣母院等 | 1.000 | [Paris](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Paris.tar) | [原数据下载地址](https://www.kaggle.com/datasets/skylord/oxbuildings) | +| 蝴蝶 | |75种蝴蝶细分类识别 | 0.9360 | [Butterfly](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Butterfly.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/butterfly-images40-species) | +| 野外植物 | |野外植物识别 | 0.9758 | [WildEdiblePlants](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/WildEdiblePlants.tar) | [原数据下载地址](https://www.kaggle.com/datasets/ryanpartridge01/wild-edible-plants) | +| 天气 | |各种天气场景识别,如:雨天、打雷、下雪等 | 0.9924 | [WeatherImageRecognition](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/WeatherImageRecognition.tar) | [原数据下载地址](https://www.kaggle.com/datasets/jehanbhathena/weather-dataset) | +| 坚果 | |各种坚果种类识别 | 0.9412 | [TreeNuts](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/TreeNuts.tar) | [原数据下载地址](https://www.kaggle.com/datasets/gpiosenka/tree-nuts-image-classification) | +| 时装 | |首饰、挎包、化妆品等时尚商品识别 | 0.9555 | [FashionProductImageSmall](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/FashionProductImageSmall.tar) | [原数据下载地址](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small) | +| 垃圾 | |12种垃圾分类识别 | 0.9845 | [Garbage12](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Garbage12.tar) | [原数据下载地址](https://www.kaggle.com/datasets/mostafaabla/garbage-classification) | +| 航拍场景 | |各种航拍场景识别,如机场、火车站等 | 0.9797 | [AID](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/AID.tar) | [原数据下载地址](https://www.kaggle.com/datasets/jiayuanchengala/aid-scene-classification-datasets) | +| 蔬菜 | |各种蔬菜识别 | 0.8929 | [Veg200](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Veg200.tar) | [原数据下载地址](https://www.kaggle.com/datasets/zhaoyj688/vegfru) | +| 商标 | |两千多种logo识别 | 0.9313 | [Logo3k](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/PP-ShiTuV2_application_dataset/Logo3k.tar) | [原数据下载地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | + + + + + +## 2. 使用说明 + + + +### 2.1 环境配置 +- 安装:请先参考文档[环境准备](../installation/install_paddleclas.md)配置PaddleClas运行环境 +- 进入`deploy`运行目录,本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面命令进入`deploy`目录。 +```shell +cd deploy +``` + + + + +### 2.2 下载、解压场景库数据 +首先创建存放场景库的地址`deploy/datasets`: + +```shell +mkdir datasets +``` +下载并解压对应场景库到`deploy/datasets`中。 +```shell +cd datasets + +# 下载并解压场景库数据 +wget {场景库下载链接} && tar -xf {压缩包的名称} +``` +以`dataset_name`为例,解压完毕后,`datasets/dataset_name`文件夹下应有如下文件结构: +```shel +├── dataset_name/ +│ ├── Gallery/ +│ ├── Index/ +│ ├── Query/ +│ ├── gallery_list.txt/ +│ ├── query_list.txt/ +│ ├── label_list.txt/ +├── ... +``` +其中,`Gallery`文件夹中存放的是用于构建索引库的原始图像,`Index`表示基于原始图像构建得到的索引库信息,`Query`文件夹存放的是用于检索的图像列表,`gallery_list.txt`和`query_list.txt`分别为索引库和检索图像的标签文件,`label_list.txt`是标签的中英文对照文件(注意:商标场景库文件不包含中英文对照文件)。 + + + +### 2.3 准备模型 +创建存放模型的文件夹`deploy/models`,并下载轻量级主体检测、识别模型,命令如下: +```shell +cd .. +mkdir models +cd models + +# 下载检测模型并解压 +# wget {检测模型下载链接} && tar -xf {检测模型压缩包名称} +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar + +# 下载识别 inference 模型并解压 +#wget {识别模型下载链接} && tar -xf {识别模型压缩包名称} +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar +``` + +解压完成后,`models`文件夹下有如下文件结构: +``` +├── inference_model_name +│ ├── inference.pdiparams +│ ├── inference.pdiparams.info +│ └── inference.pdmodel +└── det_model_name + ├── inference.pdiparams + ├── inference.pdiparams.info + └── inference.pdmodel +``` + + + +### 2.4 场景库识别与检索 + +以`动物识别`场景为例,展示识别和检索过程(如果希望尝试其他场景库的识别与检索效果,在下载解压好对应的场景库数据和模型后,替换对应的配置文件即可完成预测)。 + +注意,此部分使用了`faiss`作为检索库,安装方法如下: +```shell +pip install faiss-cpu==1.7.1post2 +``` + +若使用时,不能正常引用,则`uninstall`之后,重新`install`,尤其是在windows下。 + + + +#### 2.4.1 识别单张图像 + +假设需要测试`./datasets/AnimalImageDataset/Query/羚羊/0a37838e99.jpg`这张图像识别和检索效果。 + +首先分别修改配置文件`./configs/inference_general.yaml`中的`Global.det_inference_model_dir`和`Global.rec_inference_model_dir`字段为对应的检测和识别模型文件夹,以及修改测试图像地址字段`Global.infer_imgs`示例如下: + +```shell +Global: + infer_imgs: './datasets/AnimalImageDataset/Query/羚羊/0a37838e99.jpg' + det_inference_model_dir: './models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar' + rec_inference_model_dir: './models/general_PPLCNetV2_base_pretrained_v1.0_infer.tar' +``` + +并修改配置文件`./configs/inference_general.yaml`中的`IndexProcess.index_dir`字段为对应场景index库地址: + +```shell +IndexProcess: + index_dir:'./datasets/AnimalImageDataset/Index/' +``` + + +运行下面的命令,对图像`./datasets/AnimalImageDataset/Query/羚羊/0a37838e99.jpg`进行识别与检索 + +```shell +# 使用下面的命令使用 GPU 进行预测 +python3.7 python/predict_system.py -c configs/inference_general.yaml + +# 使用下面的命令使用 CPU 进行预测 +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False +``` + +最终输出结果如下: +``` +[{'bbox': [609, 70, 1079, 629], 'rec_docs': '羚羊', 'rec_scores': 0.6571544}] +``` +其中`bbox`表示检测出的主体所在位置,`rec_docs`表示索引库中与检测框最为相似的类别,`rec_scores`表示对应的置信度。 +检测的可视化结果也保存在`output`文件夹下,对于本张图像,识别结果可视化如下所示。 + +![](../../images/ppshitu_application_scenarios/systerm_result.jpg) + + + +#### 2.4.2 基于文件夹的批量识别 + +如果希望预测文件夹内的图像,可以直接修改配置文件中`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。 + +```shell +# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./datasets/AnimalImageDataset/Query/羚羊" +``` +终端中会输出该文件夹内所有图像的识别结果,如下所示。 +``` +... +[{'bbox': [0, 0, 1200, 675], 'rec_docs': '羚羊', 'rec_scores': 0.6153812}] +[{'bbox': [0, 0, 275, 183], 'rec_docs': '羚羊', 'rec_scores': 0.77218026}] +[{'bbox': [264, 79, 1088, 850], 'rec_docs': '羚羊', 'rec_scores': 0.81452656}] +[{'bbox': [0, 0, 188, 268], 'rec_docs': '羚羊', 'rec_scores': 0.637074}] +[{'bbox': [118, 41, 235, 161], 'rec_docs': '羚羊', 'rec_scores': 0.67315465}] +[{'bbox': [0, 0, 175, 287], 'rec_docs': '羚羊', 'rec_scores': 0.68271667}] +[{'bbox': [0, 0, 310, 163], 'rec_docs': '羚羊', 'rec_scores': 0.6706451}] +... +``` +所有图像的识别结果可视化图像也保存在`output`文件夹内。 diff --git a/docs/zh_CN/models/LeViT.md b/docs/zh_CN/models/LeViT.md index 5f0e480047adc612850fdd1be9e8de8e978e898e..d8aaa744b4c7312d8c6a5c186ae48a0e671171c2 100644 --- a/docs/zh_CN/models/LeViT.md +++ b/docs/zh_CN/models/LeViT.md @@ -18,7 +18,7 @@ LeViT 是一种快速推理的、用于图像分类任务的混合神经网络 | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(M) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| | LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305 | 7.8 | -| LeViT-128 | 0.7810 | 0.9371 | 0.786 | 0.940 | 406 | 9.2 | +| LeViT-128 | 0.7810 | 0.9372 | 0.786 | 0.940 | 406 | 9.2 | | LeViT-192 | 0.7934 | 0.9446 | 0.800 | 0.947 | 658 | 11 | | LeViT-256 | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 | | LeViT-384 | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 | diff --git a/docs/zh_CN/models/PP-LCNetV2.md b/docs/zh_CN/models/PP-LCNetV2.md index 01498478c1ee39fa651c1f6c6bd53a0b768fc241..5cb2bfe9a7160bae68fcb72c6e9575a14363a229 100644 --- a/docs/zh_CN/models/PP-LCNetV2.md +++ b/docs/zh_CN/models/PP-LCNetV2.md @@ -11,7 +11,7 @@ - [1.2.2 PW 卷积](#1.2.2) - [1.2.3 Shortcut](#1.2.3) - [1.2.4 激活函数](#1.2.4) - - [1.2.5 SE 模块](#1.2.5) + - [1.2.5 SE 模块](#1.2.5) - [1.3 实验结果](#1.3) - [2. 模型快速体验](#2) - [2.1 安装 paddlepaddle](#2.1) @@ -57,7 +57,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ### 1.2.1 Rep 策略 -卷积核的大小决定了卷积层感受野的大小,通过组合使用不同大小的卷积核,能够获取不同尺度的特征,因此 PPLCNetV2 在 Stage3、Stage4 中,在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积,同时为了避免对模型效率的影响,使用重参数化(Re parameterization,Rep)策略对同层的 DW 卷积进行融合,如下图所示。 +卷积核的大小决定了卷积层感受野的大小,通过组合使用不同大小的卷积核,能够获取不同尺度的特征,因此 PPLCNetV2 在 Stage4、Stage5 中,在同一层组合使用 kernel size 分别为 5、3、1 的 DW 卷积,同时为了避免对模型效率的影响,使用重参数化(Re parameterization,Rep)策略对同层的 DW 卷积进行融合,如下图所示。 ![](../../images/PP-LCNetV2/rep.png) @@ -65,7 +65,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ### 1.2.2 PW 卷积 -深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成,用以替换标准卷积,为了使深度可分离卷积具有更强的拟合能力,我们尝试使用两层 PW 卷积,同时为了控制模型效率不受影响,两层 PW 卷积设置为:第一个在通道维度对特征图压缩,第二个再通过放大还原特征图通道,如下图所示。通过实验发现,该策略能够显著提高模型性能,同时为了平衡对模型效率带来的影响,PPLCNetV2 仅在 Stage4、Stage5 中使用了该策略。 +深度可分离卷积通常由一层 DW 卷积和一层 PW 卷积组成,用以替换标准卷积,为了使深度可分离卷积具有更强的拟合能力,我们尝试使用两层 PW 卷积,同时为了控制模型效率不受影响,两层 PW 卷积设置为:第一个在通道维度对特征图压缩,第二个再通过放大还原特征图通道,如下图所示。通过实验发现,该策略能够显著提高模型性能,同时为了平衡对模型效率带来的影响,PPLCNetV2 仅在 Stage4 中使用了该策略。 ![](../../images/PP-LCNetV2/split_pw.png) @@ -73,7 +73,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ### 1.2.3 Shortcut -残差结构(residual)自提出以来,被诸多模型广泛使用,但在轻量级卷积神经网络中,由于残差结构所带来的元素级(element-wise)加法操作,会对模型的速度造成影响,我们在 PP-LCNetV2 中,以 Stage 为单位实验了 残差结构对模型的影响,发现残差结构的使用并非一定会带来性能的提高,因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构:在 Block 中增加 Shortcut,如下图所示。 +残差结构(residual)自提出以来,被诸多模型广泛使用,但在轻量级卷积神经网络中,由于残差结构所带来的元素级(element-wise)加法操作,会对模型的速度造成影响,我们在 PP-LCNetV2 中,以 Stage 为单位实验了残差结构对模型的影响,发现残差结构的使用并非一定会带来性能的提高,因此 PPLCNetV2 仅在最后一个 Stage 中的使用了残差结构:在 Block 中增加 Shortcut,如下图所示。 ![](../../images/PP-LCNetV2/shortcut.png) @@ -87,7 +87,7 @@ PP-LCNetV2 模型的网络整体结构如上图所示。PP-LCNetV2 模型是在 ### 1.2.5 SE 模块 -虽然 SE 模块能够显著提高模型性能,但其对模型速度的影响同样不可忽视,在 PP-LCNetV1 中,我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中,我们以 Stage 为单位对 SE 模块的位置做了进一步实验,并发现在 Stage3 中使用能够取得更好的平衡。 +虽然 SE 模块能够显著提高模型性能,但其对模型速度的影响同样不可忽视,在 PP-LCNetV1 中,我们发现在模型中后部使用 SE 模块能够获得最大化的收益。在 PP-LCNetV2 的优化过程中,我们以 Stage 为单位对 SE 模块的位置做了进一步实验,并发现在 Stage4 中使用能够取得更好的平衡。 @@ -101,7 +101,7 @@ PPLCNetV2 目前提供的模型的精度、速度指标及预训练权重链接 | PPLCNetV2_base_ssld | 6.6 | 604 | 80.07 | 94.87 | 4.32 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNetV2_base_ssld_pretrained.pdparams) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNetV2_base_ssld_infer.tar) | **备注:** - + * 1. `_ssld` 表示使用 `SSLD 蒸馏`后的模型。关于 `SSLD蒸馏` 的内容,详情 [SSLD 蒸馏](../advanced_tutorials/knowledge_distillation.md)。 * 2. PP-LCNetV2 更多模型指标及权重,敬请期待。 @@ -110,17 +110,17 @@ PPLCNetV2 目前提供的模型的精度、速度指标及预训练权重链接 | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | |:--:|:--:|:--:|:--:|:--:|:--:| | MobileNetV3_Large_x1_25 | 7.4 | 714 | 76.4 | 93.00 | 5.19 | -| PPLCNetV2_x2_5 | 9 | 906 | 76.60 | 93.00 | 7.25 | +| PPLCNetV1_x2_5 | 9 | 906 | 76.60 | 93.00 | 7.25 | | PPLCNetV2_base | 6.6 | 604 | 77.04 | 93.27 | 4.32 | | PPLCNetV2_base_ssld | 6.6 | 604 | 80.07 | 94.87 | 4.32 | - - + + ## 2. 模型快速体验 - - + + ### 2.1 安装 paddlepaddle - 您的机器安装的是 CUDA9 或 CUDA10,请运行以下命令安装 @@ -146,25 +146,25 @@ python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple ``` pip3 install paddleclas ``` - - - + + + ### 2.3 预测 * 在命令行中使用 PPLCNetV2_base 的权重快速预测 - + ```bash paddleclas --model_name=PPLCNetV2_base --infer_imgs="docs/images/inference_deployment/whl_demo.jpg" ``` - + 结果如下: ``` >>> result class_ids: [8, 7, 86, 82, 83], scores: [0.8859, 0.07156, 0.00588, 0.00047, 0.00034], label_names: ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'prairie chicken, prairie grouse, prairie fowl'], filename: docs/images/inference_deployment/whl_demo.jpg Predict complete -``` - - +``` + + * 在 Python 代码中预测 ```python from paddleclas import PaddleClas @@ -182,18 +182,18 @@ print(next(result)) [{'class_ids': [8, 7, 86, 82, 83], 'scores': [0.8859, 0.07156, 0.00588, 0.00047, 0.00034], 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'prairie chicken, prairie grouse, prairie fowl'], 'filename': 'docs/images/inference_deployment/whl_demo.jpg'}] ``` - - - + + + ## 3. 模型训练、评估和预测 - + ### 3.1 环境配置 * 安装:请先参考文档[环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。 - + ### 3.2 数据准备 @@ -222,15 +222,15 @@ cd path_to_PaddleClas ``` 其中 `train/` 和 `val/` 分别为训练集和验证集。`train_list.txt` 和 `val_list.txt` 分别为训练集和验证集的标签文件。 - -**备注:** + +**备注:** * 关于 `train_list.txt`、`val_list.txt`的格式说明,可以参考[PaddleClas分类数据集格式说明](../data_preparation/classification_dataset.md#1-数据集格式说明) 。 - + -### 3.3 模型训练 +### 3.3 模型训练 在 `ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml` 中提供了 PPLCNetV2_base 训练配置,可以通过如下脚本启动训练: @@ -240,11 +240,11 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch \ --gpus="0,1,2,3" \ tools/train.py \ - -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml + -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml ``` -**备注:** +**备注:** * 当前精度最佳的模型会保存在 `output/PPLCNetV2_base/best_model.pdparams` @@ -271,7 +271,7 @@ python3 tools/eval.py \ ```python python3 tools/infer.py \ -c ppcls/configs/ImageNet/PPLCNetV2/PPLCNetV2_base.yaml \ - -o Global.pretrained_model=output/PPLCNetV2_base/best_model + -o Global.pretrained_model=output/PPLCNetV2_base/best_model ``` 输出结果如下: @@ -280,30 +280,30 @@ python3 tools/infer.py \ [{'class_ids': [8, 7, 86, 82, 83], 'scores': [0.8859, 0.07156, 0.00588, 0.00047, 0.00034], 'file_name': 'docs/images/inference_deployment/whl_demo.jpg', 'label_names': ['hen', 'cock', 'partridge', 'ruffed grouse, partridge, Bonasa umbellus', 'prairie chicken, prairie grouse, prairie fowl']}] ``` -**备注:** +**备注:** * 这里`-o Global.pretrained_model="output/PPLCNetV2_base/best_model"` 指定了当前最佳权重所在的路径,如果指定其他权重,只需替换对应的路径即可。 - + * 默认是对 `docs/images/inference_deployment/whl_demo.jpg` 进行预测,此处也可以通过增加字段 `-o Infer.infer_imgs=xxx` 对其他图片预测。 - + * 默认输出的是 Top-5 的值,如果希望输出 Top-k 的值,可以指定`-o Infer.PostProcess.topk=k`,其中,`k` 为您指定的值。 - + ## 4. 模型推理部署 - + ### 4.1 推理模型准备 Paddle Inference 是飞桨的原生推理库, 作用于服务器端和云端,提供高性能的推理能力。相比于直接基于预训练模型进行预测,Paddle Inference可使用MKLDNN、CUDNN、TensorRT 进行预测加速,从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍,可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。 - + 当使用 Paddle Inference 推理时,加载的模型类型为 inference 模型。本案例提供了两种获得 inference 模型的方法,如果希望得到和文档相同的结果,请选择[直接下载 inference 模型](#6.1.2)的方式。 - - + + ### 4.1.1 基于训练得到的权重导出 inference 模型 @@ -325,7 +325,7 @@ python3 tools/export_model.py \ ``` - + ### 4.1.2 直接下载 inference 模型 @@ -346,7 +346,7 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet │ └── inference.pdmodel ``` - + ### 4.2 基于 Python 预测引擎推理 @@ -397,32 +397,32 @@ ILSVRC2012_val_00030010.jpeg: class id(s): [80, 143, 81, 137, 98], score(s): [0. ``` - + ### 4.3 基于 C++ 预测引擎推理 PaddleClas 提供了基于 C++ 预测引擎推理的示例,您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台,可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。 - + ### 4.4 服务化部署 Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍,可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。 - + PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例,您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。 - + ### 4.5 端侧部署 Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架,定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍,可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。 - + PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例,您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。 - + ### 4.6 Paddle2ONNX 模型转换与预测 - + Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署,包括TensorRT/OpenVINO/MNN/TNN/NCNN,以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍,可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。 PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例,您可以参考[Paddle2ONNX 模型转换与预测](../../../deploy/paddle2onnx/readme.md)来完成相应的部署工作。 diff --git a/docs/zh_CN/models/PVTV2.md b/docs/zh_CN/models/PVTV2.md index 0819a1dd012bae0acdef63359e85d58d25abf1eb..ac7e348eb8fc228c62ac3ce83be91585cbd6f4fd 100644 --- a/docs/zh_CN/models/PVTV2.md +++ b/docs/zh_CN/models/PVTV2.md @@ -18,10 +18,10 @@ PVTV2 是 VisionTransformer 系列模型,该模型基于 PVT(Pyramid Vision | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| -| PVT_V2_B0 | 0.705 | 0.902 | 0.705 | - | 0.53 | 3.7 | -| PVT_V2_B1 | 0.787 | 0.945 | 0.787 | - | 2.0 | 14.0 | -| PVT_V2_B2 | 0.821 | 0.960 | 0.820 | - | 3.9 | 25.4 | -| PVT_V2_B3 | 0.831 | 0.965 | 0.831 | - | 6.7 | 45.2 | -| PVT_V2_B4 | 0.836 | 0.967 | 0.836 | - | 9.8 | 62.6 | -| PVT_V2_B5 | 0.837 | 0.966 | 0.838 | - | 11.4 | 82.0 | -| PVT_V2_B2_Linear | 0.821 | 0.961 | 0.821 | - | 3.8 | 22.6 | +| PVT_V2_B0 | 0.7052 | 0.9016 | 0.705 | - | 0.53 | 3.7 | +| PVT_V2_B1 | 0.7869 | 0.9450 | 0.787 | - | 2.0 | 14.0 | +| PVT_V2_B2 | 0.8206 | 0.9599 | 0.820 | - | 3.9 | 25.4 | +| PVT_V2_B3 | 0.8310 | 0.9648 | 0.831 | - | 6.7 | 45.2 | +| PVT_V2_B4 | 0.8361 | 0.9666 | 0.836 | - | 9.8 | 62.6 | +| PVT_V2_B5 | 0.8374 | 0.9662 | 0.838 | - | 11.4 | 82.0 | +| PVT_V2_B2_Linear | 0.8205 | 0.9605 | 0.820 | - | 3.8 | 22.6 | diff --git a/docs/zh_CN/models/SwinTransformer.md b/docs/zh_CN/models/SwinTransformer.md index df29b0a0c99754196bd3871536013b4f67aa2447..1cdd4675d8a3a329824e516455a9a5fda670070d 100644 --- a/docs/zh_CN/models/SwinTransformer.md +++ b/docs/zh_CN/models/SwinTransformer.md @@ -33,19 +33,17 @@ Swin Transformer 是一种新的视觉 Transformer 网络,可以用作计算 | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPs
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| -| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 0.812 | 0.955 | 4.5 | 28 | -| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 0.832 | 0.962 | 8.7 | 50 | -| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 0.835 | 0.965 | 15.4 | 88 | -| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 0.845 | 0.970 | 47.1 | 88 | -| SwinTransformer_base_patch4_window7_224[1] | 0.8487 | 0.9746 | 0.852 | 0.975 | 15.4 | 88 | -| SwinTransformer_base_patch4_window12_384[1] | 0.8642 | 0.9807 | 0.864 | 0.980 | 47.1 | 88 | -| SwinTransformer_large_patch4_window7_224[1] | 0.8596 | 0.9783 | 0.863 | 0.979 | 34.5 | 197 | -| SwinTransformer_large_patch4_window12_384[1] | 0.8719 | 0.9823 | 0.873 | 0.982 | 103.9 | 197 | +| SwinTransformer_tiny_patch4_window7_224 | 0.8110 | 0.9549 | 0.812 | 0.955 | 4.5 | 28 | +| SwinTransformer_small_patch4_window7_224 | 0.8321 | 0.9622 | 0.832 | 0.962 | 8.7 | 50 | +| SwinTransformer_base_patch4_window7_224 | 0.8337 | 0.9643 | 0.835 | 0.965 | 15.4 | 88 | +| SwinTransformer_base_patch4_window12_384 | 0.8417 | 0.9674 | 0.845 | 0.970 | 47.1 | 88 | +| SwinTransformer_base_patch4_window7_224[1] | 0.8516 | 0.9748 | 0.852 | 0.975 | 15.4 | 88 | +| SwinTransformer_base_patch4_window12_384[1] | 0.8634 | 0.9798 | 0.864 | 0.980 | 47.1 | 88 | +| SwinTransformer_large_patch4_window7_224[1] | 0.8619 | 0.9788 | 0.863 | 0.979 | 34.5 | 197 | +| SwinTransformer_large_patch4_window12_384[1] | 0.8706 | 0.9814 | 0.873 | 0.982 | 103.9 | 197 | [1]:基于 ImageNet22k 数据集预训练,然后在 ImageNet1k 数据集迁移学习得到。 -**注**:与 Reference 的精度差异源于数据预处理不同。 - ### 1.3 Benchmark @@ -68,14 +66,14 @@ Swin Transformer 是一种新的视觉 Transformer 网络,可以用作计算 **备注:** 精度类型为 FP32,推理过程使用 TensorRT。 - - + + ## 2. 模型快速体验 安装 paddlepaddle 和 paddleclas 即可快速对图片进行预测,体验方法可以参考[ResNet50 模型快速体验](./ResNet.md#2-模型快速体验)。 - - + + ## 3. 模型训练、评估和预测 @@ -83,52 +81,51 @@ Swin Transformer 是一种新的视觉 Transformer 网络,可以用作计算 **备注:** 由于 SwinTransformer 系列模型默认使用的 GPU 数量为 8 个,所以在训练时,需要指定8个GPU,如`python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c xxx.yaml`, 如果使用 4 个 GPU 训练,默认学习率需要减小一半,精度可能有损。 - + ## 4. 模型推理部署 - + ### 4.1 推理模型准备 Paddle Inference 是飞桨的原生推理库, 作用于服务器端和云端,提供高性能的推理能力。相比于直接基于预训练模型进行预测,Paddle Inference可使用 MKLDNN、CUDNN、TensorRT 进行预测加速,从而实现更优的推理性能。更多关于Paddle Inference推理引擎的介绍,可以参考[Paddle Inference官网教程](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/inference/inference_cn.html)。 - + Inference 的获取可以参考 [ResNet50 推理模型准备](./ResNet.md#41-推理模型准备) 。 - + ### 4.2 基于 Python 预测引擎推理 PaddleClas 提供了基于 python 预测引擎推理的示例。您可以参考[ResNet50 基于 Python 预测引擎推理](./ResNet.md#42-基于-python-预测引擎推理) 对 SwinTransformer 完成推理预测。 - + ### 4.3 基于 C++ 预测引擎推理 PaddleClas 提供了基于 C++ 预测引擎推理的示例,您可以参考[服务器端 C++ 预测](../inference_deployment/cpp_deploy.md)来完成相应的推理部署。如果您使用的是 Windows 平台,可以参考[基于 Visual Studio 2019 Community CMake 编译指南](../inference_deployment/cpp_deploy_on_windows.md)完成相应的预测库编译和模型预测工作。 - + ### 4.4 服务化部署 Paddle Serving 提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案。更多关于Paddle Serving 的介绍,可以参考[Paddle Serving 代码仓库](https://github.com/PaddlePaddle/Serving)。 - + PaddleClas 提供了基于 Paddle Serving 来完成模型服务化部署的示例,您可以参考[模型服务化部署](../inference_deployment/paddle_serving_deploy.md)来完成相应的部署工作。 - + ### 4.5 端侧部署 Paddle Lite 是一个高性能、轻量级、灵活性强且易于扩展的深度学习推理框架,定位于支持包括移动端、嵌入式以及服务器端在内的多硬件平台。更多关于 Paddle Lite 的介绍,可以参考[Paddle Lite 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)。 - + PaddleClas 提供了基于 Paddle Lite 来完成模型端侧部署的示例,您可以参考[端侧部署](../inference_deployment/paddle_lite_deploy.md)来完成相应的部署工作。 - + ### 4.6 Paddle2ONNX 模型转换与预测 - + Paddle2ONNX 支持将 PaddlePaddle 模型格式转化到 ONNX 模型格式。通过 ONNX 可以完成将 Paddle 模型到多种推理引擎的部署,包括TensorRT/OpenVINO/MNN/TNN/NCNN,以及其它对 ONNX 开源格式进行支持的推理引擎或硬件。更多关于 Paddle2ONNX 的介绍,可以参考[Paddle2ONNX 代码仓库](https://github.com/PaddlePaddle/Paddle2ONNX)。 PaddleClas 提供了基于 Paddle2ONNX 来完成 inference 模型转换 ONNX 模型并作推理预测的示例,您可以参考[Paddle2ONNX 模型转换与预测](@shuilong)来完成相应的部署工作。 - diff --git a/docs/zh_CN/models/Twins.md b/docs/zh_CN/models/Twins.md index 623ebf83717dba852b81ca6fbc4648f3bfce23c0..0d46dbebbd3935cf53c207c89b10b7783353a5ec 100644 --- a/docs/zh_CN/models/Twins.md +++ b/docs/zh_CN/models/Twins.md @@ -17,14 +17,12 @@ Twins 网络包括 Twins-PCPVT 和 Twins-SVT,其重点对空间注意力机制 | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPs
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| -| pcpvt_small | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1 | -| pcpvt_base | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8 | -| pcpvt_large | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9 | -| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8 | 24 | -| alt_gvt_base | 0.8294 | 0.9621 | 0.832 | - | 8.3 | 56 | -| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2 | - -**注**:与 Reference 的精度差异源于数据预处理不同。 +| pcpvt_small | 0.8115 | 0.9567 | 0.812 | - | 3.7 | 24.1 | +| pcpvt_base | 0.8268 | 0.9627 | 0.827 | - | 6.4 | 43.8 | +| pcpvt_large | 0.8306 | 0.9659 | 0.831 | - | 9.5 | 60.9 | +| alt_gvt_small | 0.8177 | 0.9557 | 0.817 | - | 2.8 | 24 | +| alt_gvt_base | 0.8315 | 0.9629 | 0.832 | - | 8.3 | 56 | +| alt_gvt_large | 0.8364 | 0.9651 | 0.837 | - | 14.8 | 99.2 | diff --git a/docs/zh_CN/models/ViT_and_DeiT.md b/docs/zh_CN/models/ViT_and_DeiT.md index 51df9396cc3c120aa3ceb419d3a8ce4d70b15316..9087e44088da0b77e779da77f433bf285bc1e6bf 100644 --- a/docs/zh_CN/models/ViT_and_DeiT.md +++ b/docs/zh_CN/models/ViT_and_DeiT.md @@ -21,27 +21,25 @@ DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020 | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| -| ViT_small_patch16_224 | 0.7769 | 0.9342 | 0.7785 | 0.9342 | 9.41 | 48.60 | -| ViT_base_patch16_224 | 0.8195 | 0.9617 | 0.8178 | 0.9613 | 16.85 | 86.42 | +| ViT_small_patch16_224 | 0.7553 | 0.9211 | 0.7785 | 0.9342 | 9.41 | 48.60 | +| ViT_base_patch16_224 | 0.8187 | 0.9618 | 0.8178 | 0.9613 | 16.85 | 86.42 | | ViT_base_patch16_384 | 0.8414 | 0.9717 | 0.8420 | 0.9722 | 49.35 | 86.42 | | ViT_base_patch32_384 | 0.8176 | 0.9613 | 0.8166 | 0.9613 | 12.66 | 88.19 | -| ViT_large_patch16_224 | 0.8323 | 0.9650 | 0.8306 | 0.9644 | 59.65 | 304.12 | +| ViT_large_patch16_224 | 0.8303 | 0.9655 | 0.8306 | 0.9644 | 59.65 | 304.12 | | ViT_large_patch16_384 | 0.8513 | 0.9736 | 0.8517 | 0.9736 | 174.70 | 304.12 | | ViT_large_patch32_384 | 0.8153 | 0.9608 | 0.815 | - | 44.24 | 306.48 | | Models | Top1 | Top5 | Reference
top1 | Reference
top5 | FLOPS
(G) | Params
(M) | |:--:|:--:|:--:|:--:|:--:|:--:|:--:| -| DeiT_tiny_patch16_224 | 0.718 | 0.910 | 0.722 | 0.911 | 1.07 | 5.68 | -| DeiT_small_patch16_224 | 0.796 | 0.949 | 0.799 | 0.950 | 4.24 | 21.97 | -| DeiT_base_patch16_224 | 0.817 | 0.957 | 0.818 | 0.956 | 16.85 | 86.42 | -| DeiT_base_patch16_384 | 0.830 | 0.962 | 0.829 | 0.972 | 49.35 | 86.42 | -| DeiT_tiny_distilled_patch16_224 | 0.741 | 0.918 | 0.745 | 0.919 | 1.08 | 5.87 | -| DeiT_small_distilled_patch16_224 | 0.809 | 0.953 | 0.812 | 0.954 | 4.26 | 22.36 | -| DeiT_base_distilled_patch16_224 | 0.831 | 0.964 | 0.834 | 0.965 | 16.93 | 87.18 | -| DeiT_base_distilled_patch16_384 | 0.851 | 0.973 | 0.852 | 0.972 | 49.43 | 87.18 | - -关于 Params、FLOPs、Inference speed 等信息,敬请期待。 +| DeiT_tiny_patch16_224 | 0.7208 | 0.9112 | 0.722 | 0.911 | 1.07 | 5.68 | +| DeiT_small_patch16_224 | 0.7982 | 0.9495 | 0.799 | 0.950 | 4.24 | 21.97 | +| DeiT_base_patch16_224 | 0.8180 | 0.9558 | 0.818 | 0.956 | 16.85 | 86.42 | +| DeiT_base_patch16_384 | 0.8289 | 0.9624 | 0.829 | 0.972 | 49.35 | 86.42 | +| DeiT_tiny_distilled_patch16_224 | 0.7449 | 0.9192 | 0.745 | 0.919 | 1.08 | 5.87 | +| DeiT_small_distilled_patch16_224 | 0.8117 | 0.9538 | 0.812 | 0.954 | 4.26 | 22.36 | +| DeiT_base_distilled_patch16_224 | 0.8330 | 0.9647 | 0.834 | 0.965 | 16.93 | 87.18 | +| DeiT_base_distilled_patch16_384 | 0.8520 | 0.9720 | 0.852 | 0.972 | 49.43 | 87.18 | @@ -67,4 +65,3 @@ DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020 | DeiT_small_
distilled_patch16_224 | 256 | 224 | 3.70 | 6.20 | 10.53 | | DeiT_base_
distilled_patch16_224 | 256 | 224 | 6.17 | 14.94 | 28.58 | | DeiT_base_
distilled_patch16_384 | 384 | 384 | 14.12 | 48.76 | 97.09 | - diff --git a/docs/zh_CN/models_training/recognition.md b/docs/zh_CN/models_training/recognition.md index 6f044d575ed5026c99396ae5ca12bd02c2b27639..32777d6acc686453dfd43e51a037de4b79081320 100644 --- a/docs/zh_CN/models_training/recognition.md +++ b/docs/zh_CN/models_training/recognition.md @@ -1,34 +1,39 @@ # 图像识别 ---- -在 PaddleClas 中,图像识别,是指给定一张查询图像,系统能够识别该查询图像类别。广义上,图像分类也是图像识别的一种。但是与普通图像识别不同的是,图像分类只能判别出模型已经学习的类别,如果需要添加新的类别,分类模型只能重新训练。PaddleClas 中的图像识别,**对于陌生类别,只需要更新相应的检索库**,就能够正确的识别出查询图像的类别,而无需重新训练模型,这大大增加了识别系统的可用性,同时降低了更新模型的需求,方便用户部署应用。 + +在 PaddleClas 中,**图像识别**是指给定一张查询图像,系统能够识别该查询图像类别。广义上,图像分类也是图像识别的一种。但图像分类只能判断模型学习过的类别,如果需要添加新的类别,分类模型只能重新训练,这显然会增加实际应用的成本,限制了应用场景。 + +因此 PaddleClas 通过主体检测+特征提取+特征检索的方式来实现图像识别,其好处是**对于陌生类别,只需要更新相应的检索库**,就能够正确的识别出查询图像的类别,而无需重新训练模型,这大大增加了识别系统的可用性,同时降低了更新模型的需求,方便用户部署应用。 对于一张待查询图片,PaddleClas 中的图像识别流程主要分为三部分: -1. 主体检测:对于给定一个查询图像,主体检测器首先检测出图像的物体,从而去掉无用背景信息,提高识别精度。 -2. 特征提取:对主体检测的各个候选区域,通过特征模型,进行特征提取 -3. 特征检索:将提取的特征与特征库中的向量进行相似度比对,得到其标签信息 +1. 主体检测:对于一张给定的查询图像,主体检测器检测出图像中的主体候选区域,过滤掉无用的背景信息,提高后续识别精度。 +2. 特征提取:将主体检测的各个候选区域裁剪出来,输入到通过特征提取模型中进行特征提取。 +3. 特征检索:将提取的特征与特征库中的向量进行相似度比对,计算其相似度和标签信息。 + +完整的图像识别系统,如下图所示 -其中特征库,需要利用已经标注好的图像数据集提前建立。完整的图像识别系统,如下图所示 + -![](../../images/structure.jpg) -体验整体图像识别系统,或查看特征库建立方法,详见[图像识别快速开始文档](../quick_start/quick_start_recognition.md)。其中,图像识别快速开始文档主要讲解整体流程的使用过程。以下内容,主要对上述三个步骤的训练部分进行介绍。 +在Android端或PC端体验整体图像识别系统,或查看特征库建立方法,可以参考 [图像识别快速开始文档](../quick_start/quick_start_recognition.md)。 -首先,请参考[安装指南](../installation/install_paddleclas.md)配置运行环境。 +以下内容,主要对上述三个步骤的训练部分进行介绍。 -## 目录 +在训练开始之前,请参考 [安装指南](../installation/install_paddleclas.md) 配置运行环境。 -- [1. 主体检测](#1) -- [2. 特征模型训练](#2) - - [2.1. 特征模型数据准备与处理](#2.1) - - [2. 2 特征模型基于单卡 GPU 上的训练与评估](#2.2) - - [2.2.1 特征模型训练](#2.2.2) - - [2.2.2 特征模型恢复训练](#2.2.2) - - [2.2.3 特征模型评估](#2.2.3) - - [2.3 特征模型导出 inference 模型](#2.3) -- [3. 特征检索](#3) -- [4. 基础知识](#4) +## 目录 - +- [1. 主体检测](#1-主体检测) +- [2. 特征提取模型训练](#2-特征提取模型训练) + - [2.1 特征提取模型数据的准备与处理](#21-特征提取模型数据的准备与处理) + - [2.2 特征提取模型在 GPU 上的训练与评估](#22-特征提取模型在-gpu-上的训练与评估) + - [2.2.1 特征提取模型训练](#221-特征提取模型训练) + - [2.2.2 特征提取模型恢复训练](#222-特征提取模型恢复训练) + - [2.2.3 特征提取模型评估](#223-特征提取模型评估) + - [2.3 特征提取模型导出 inference 模型](#23-特征提取模型导出-inference-模型) +- [3. 特征检索](#3-特征检索) +- [4. 基础知识](#4-基础知识) + + ## 1. 主体检测 @@ -38,142 +43,143 @@ [{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}] ``` -关于主体检测训练方法可以参考: [PaddleDetection 训练教程](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#4-%E8%AE%AD%E7%BB%83)。 +关于主体检测数据集构造与模型训练方法可以参考: [30分钟快速上手PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#30%E5%88%86%E9%92%9F%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8Bpaddledetection)。 更多关于 PaddleClas 中提供的主体检测的模型介绍与下载请参考:[主体检测教程](../image_recognition_pipeline/mainbody_detection.md)。 - + -## 2. 特征模型训练 +## 2. 特征提取模型训练 - +为了快速体验 PaddleClas 图像检索模块,以下使用经典的200类鸟类细粒度分类数据集 [CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf) 为例,介绍特征提取模型训练过程。CUB_200_2011 下载方式请参考 [CUB_200_2011官网](https://www.vision.caltech.edu/datasets/cub_200_2011/) -### 2.1 特征模型数据的准备与处理 + -* 进入 `PaddleClas` 目录。 +### 2.1 特征提取模型数据的准备与处理 -```bash -## linux or mac, $path_to_PaddleClas 表示 PaddleClas 的根目录,用户需要根据自己的真实目录修改 -cd $path_to_PaddleClas -``` +* 进入 `PaddleClas` 目录 -* 进入 `dataset` 目录,为了快速体验 PaddleClas 图像检索模块,此处使用的数据集为 [CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf),其是一个包含 200 类鸟的细粒度鸟类数据集。首先,下载 CUB_200_2011 数据集,下载方式请参考[官网](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html)。 + ```shell + cd PaddleClas + ``` -```shell -# linux or mac -cd dataset +* 进入 `dataset` 目录 -# 将下载后的数据拷贝到此目录 -cp {数据存放的路径}/CUB_200_2011.tgz . + ```shell + # 进入dataset目录 + cd dataset -# 解压 -tar -xzvf CUB_200_2011.tgz + # 将下载后的数据拷贝到dataset目录下 + cp {数据存放的路径}/CUB_200_2011.tgz ./ -#进入 CUB_200_2011 目录 -cd CUB_200_2011 -``` + # 解压该数据集 + tar -xzvf CUB_200_2011.tgz -该数据集在用作图像检索任务时,通常将前 100 类当做训练集,后 100 类当做测试集,所以此处需要将下载的数据集做一些后处理,来更好的适应 PaddleClas 的图像检索训练。 + #进入 CUB_200_2011 目录 + cd CUB_200_2011 + ``` -```shell -#新建 train 和 test 目录 -mkdir train && mkdir test +* 该数据集在用作图像检索任务时,通常将前 100 类当做训练集,后 100 类当做测试集,所以此处需要将下载的数据集做一些后处理,来更好的适应 PaddleClas 的图像检索训练。 -#将数据分成训练集和测试集,前 100 类作为训练集,后 100 类作为测试集 -ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh -ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh + ```shell + #新建 train 和 test 目录 + mkdir train + mkdir test -#生成 train_list 和 test_list -tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt -tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt -``` - -至此,现在已经得到 `CUB_200_2011` 的训练集(`train` 目录)、测试集(`test` 目录)、`train_list.txt`、`test_list.txt`。 + #将数据分成训练集和测试集,前 100 类作为训练集,后 100 类作为测试集 + ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh + ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh -数据处理完毕后,`CUB_200_2011` 中的 `train` 目录下应有如下结构: + #生成 train_list 和 test_list + tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt + tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt + ``` -``` -├── 1 -│ ├── Black_Footed_Albatross_0001_796111.jpg -│ ├── Black_Footed_Albatross_0002_55.jpg - ... -├── 10 -│ ├── Red_Winged_Blackbird_0001_3695.jpg -│ ├── Red_Winged_Blackbird_0005_5636.jpg -... -``` + 至此,现在已经得到 `CUB_200_2011` 的训练集(`train` 目录)、测试集(`test` 目录)、`train_list.txt`、`test_list.txt`。 -`train_list.txt` 应为: + 数据处理完毕后,`CUB_200_2011` 中的 `train` 目录下应有如下结构: -``` -train/99/Ovenbird_0137_92639.jpg 99 1 -train/99/Ovenbird_0136_92859.jpg 99 2 -train/99/Ovenbird_0135_93168.jpg 99 3 -train/99/Ovenbird_0131_92559.jpg 99 4 -train/99/Ovenbird_0130_92452.jpg 99 5 -... -``` -其中,分隔符为空格" ", 三列数据的含义分别是训练数据的路径、训练数据的 label 信息、训练数据的 unique id。 + ``` + CUB_200_2011/train/ + ├── 1 + │ ├── Black_Footed_Albatross_0001_796111.jpg + │ ├── Black_Footed_Albatross_0002_55.jpg + ... + ├── 10 + │ ├── Red_Winged_Blackbird_0001_3695.jpg + │ ├── Red_Winged_Blackbird_0005_5636.jpg + ... + ``` -测试集格式与训练集格式相同。 + `train_list.txt` 应为: -**注意**: + ``` + train/99/Ovenbird_0137_92639.jpg 99 1 + train/99/Ovenbird_0136_92859.jpg 99 2 + train/99/Ovenbird_0135_93168.jpg 99 3 + train/99/Ovenbird_0131_92559.jpg 99 4 + train/99/Ovenbird_0130_92452.jpg 99 5 + ... + ``` + 其中,分隔符为空格`" "`, 三列数据的含义分别是`训练数据的相对路径`、`训练数据的 label 标签`、`训练数据的 unique id`。测试集格式与训练集格式相同。 -* 当 gallery dataset 和 query dataset 相同时,为了去掉检索得到的第一个数据(检索图片本身无须评估),每个数据需要对应一个 unique id,用于后续评测 mAP、recall@1 等指标。关于 gallery dataset 与 query dataset 的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于 mAP、recall@1 等评测指标请参考[图像检索评价指标](#图像检索评价指标)。 +* 构建完毕后返回 `PaddleClas` 根目录 -返回 `PaddleClas` 根目录 + ```shell + # linux or mac + cd ../../ + ``` -```shell -# linux or mac -cd ../../ -``` +**注意**: - +* 当 gallery dataset 和 query dataset 相同时,为了去掉检索得到的第一个数据(检索图片本身不能出现在gallery中),每个数据需要对应一个 unique id(一般使用从1开始的自然数为unique id,如1,2,3,...),用于后续评测 `mAP`、`recall@1` 等指标。关于 gallery dataset 与 query dataset 的解析请参考[图像检索数据集介绍](#图像检索数据集介绍), 关于 `mAP`、`recall@1` 等评测指标请参考[图像检索评价指标](#图像检索评价指标)。 -### 2.2 特征模型 GPU 上的训练与评估 + -在基于单卡 GPU 上训练与评估,推荐使用 `tools/train.py` 与 `tools/eval.py` 脚本。 +### 2.2 特征提取模型在 GPU 上的训练与评估 -PaddleClas 支持使用 VisualDL 可视化训练过程。VisualDL 是飞桨可视化分析工具,以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、高维数据分布等。可帮助用户更清晰直观地理解深度学习模型训练过程及模型结构,进而实现高效的模型优化。更多细节请查看[VisualDL](../others/VisualDL.md)。 +下面以 MobileNetV1 模型为例,介绍特征提取模型在 GPU 上的训练与评估流程 -#### 2.2.1 特征模型训练 +#### 2.2.1 特征提取模型训练 准备好配置文件之后,可以使用下面的方式启动图像检索任务的训练。PaddleClas 训练图像检索任务的方法是度量学习,关于度量学习的解析请参考[度量学习](#度量学习)。 ```shell # 单卡 GPU -python3 tools/train.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Arch.Backbone.pretrained=True \ - -o Global.device=gpu +python3.7 tools/train.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Arch.Backbone.pretrained=True \ +-o Global.device=gpu + # 多卡 GPU export CUDA_VISIBLE_DEVICES=0,1,2,3 -python3 -m paddle.distributed.launch tools/train.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Arch.Backbone.pretrained=True \ - -o Global.device=gpu +python3.7 -m paddle.distributed.launch tools/train.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Arch.Backbone.pretrained=True \ +-o Global.device=gpu ``` -其中,`-c` 用于指定配置文件的路径,`-o` 用于指定需要修改或者添加的参数,其中 `-o Arch.Backbone.pretrained=True` 表示 Backbone 部分使用预训练模型,此外,`Arch.Backbone.pretrained` 也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu` 表示使用 GPU 进行训练。如果希望使用 CPU 进行训练,则需要将 `Global.device` 设置为 `cpu`。 +**注**:其中,`-c` 用于指定配置文件的路径,`-o` 用于指定需要修改或者添加的参数,其中 `-o Arch.Backbone.pretrained=True` 表示 Backbone 在训练开始前会加载预训练模型;`-o Arch.Backbone.pretrained` 也可以指定为模型权重文件的路径,使用时换成自己的预训练模型权重文件的路径即可;`-o Global.device=gpu` 表示使用 GPU 进行训练。如果希望使用 CPU 进行训练,则设置 `-o Global.device=cpu`即可。 更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)。 -运行上述命令,可以看到输出日志,示例如下: +运行上述训练命令,可以看到输出日志,示例如下: - ``` - ... - [Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154 - ... - [Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238 - ... - ``` -此处配置文件的 Backbone 是 MobileNetV1,如果想使用其他 Backbone,可以重写参数 `Arch.Backbone.name`,比如命令中增加 `-o Arch.Backbone.name={其他 Backbone}`。此外,由于不同模型 `Neck` 部分的输入维度不同,更换 Backbone 后可能需要改写此处的输入大小,改写方式类似替换 Backbone 的名字。 + ```log + ... + [Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154 + ... + [Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238 + ... + ``` + +此处配置文件的 Backbone 是 MobileNetV1,如果想使用其他 Backbone,可以重写参数 `Arch.Backbone.name`,比如命令中增加 `-o Arch.Backbone.name={其他 Backbone 的名字}`。此外,由于不同模型 `Neck` 部分的输入维度不同,更换 Backbone 后可能需要改写 `Neck` 的输入大小,改写方式类似替换 Backbone 的名字。 在训练 Loss 部分,此处使用了 [CELoss](../../../ppcls/loss/celoss.py) 和 [TripletLossV2](../../../ppcls/loss/triplet.py),配置文件如下: -``` +```yaml Loss: Train: - CELoss: @@ -183,110 +189,113 @@ Loss: margin: 0.5 ``` -最终的总 Loss 是所有 Loss 的加权和,其中 weight 定义了特定 Loss 在最终总 Loss 的权重。如果想替换其他 Loss,也可以在配置文件中更改 Loss 字段,目前支持的 Loss 请参考 [Loss](../../../ppcls/loss)。 +最终的总 Loss 是所有 Loss 的加权和,其中 weight 定义了特定 Loss 在最终总 Loss 的权重。如果想替换其他 Loss,也可以在配置文件中更改 Loss 字段,目前支持的 Loss 请参考 [Loss](../../../ppcls/loss/__init__.py)。 -#### 2.2.2 特征模型恢复训练 +#### 2.2.2 特征提取模型恢复训练 -如果训练任务因为其他原因被终止,也可以加载断点权重文件,继续训练: +如果训练任务因为其他原因被终止,且训练过程中有保存权重文件,可以加载断点权重文件,继续训练: ```shell -# 单卡 -python3 tools/train.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Global.checkpoints="./output/RecModel/epoch_5" \ - -o Global.device=gpu -# 多卡 +# 单卡恢复训练 +python33.7 tools/train.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Global.checkpoints="./output/RecModel/epoch_5" \ +-o Global.device=gpu + +# 多卡恢复训练 export CUDA_VISIBLE_DEVICES=0,1,2,3 -python3 -m paddle.distributed.launch tools/train.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Global.checkpoints="./output/RecModel/epoch_5" \ - -o Global.device=gpu +python3.7 -m paddle.distributed.launch tools/train.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Global.checkpoints="./output/RecModel/epoch_5" \ +-o Global.device=gpu ``` 其中配置文件不需要做任何修改,只需要在继续训练时设置 `Global.checkpoints` 参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。 **注意**: -* `-o Global.checkpoints` 参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点 `5` 继续训练,则 `Global.checkpoints` 参数只需设置为 `"./output/RecModel/epoch_5"`,PaddleClas 会自动补充后缀名。 - - ```shell - output/ - └── RecModel - ├── best_model.pdopt - ├── best_model.pdparams - ├── best_model.pdstates - ├── epoch_1.pdopt - ├── epoch_1.pdparams - ├── epoch_1.pdstates - . - . - . - ``` +* `-o Global.checkpoints` 后的参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点 `epoch_5` 继续训练,则 `Global.checkpoints` 参数只需设置为 `"./output/RecModel/epoch_5"`,PaddleClas 会自动补充后缀名。 + + `epoch_5.pdparams`所在目录如下所示: + + ```log + output/ + └── RecModel + ├── best_model.pdopt + ├── best_model.pdparams + ├── best_model.pdstates + ├── epoch_5.pdopt + ├── epoch_5.pdparams + ├── epoch_5.pdstates + . + . + . + ``` -#### 2.2.3 特征模型评估 +#### 2.2.3 特征提取模型评估 -可以通过以下命令进行模型评估。 +可以通过以下命令进行指定模型进行评估。 ```bash -# 单卡 -python3 tools/eval.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Global.pretrained_model=./output/RecModel/best_model -# 多卡 +# 单卡评估 +python3.7 tools/eval.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Global.pretrained_model=./output/RecModel/best_model + +# 多卡评估 export CUDA_VISIBLE_DEVICES=0,1,2,3 -python3 -m paddle.distributed.launch tools/eval.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Global.pretrained_model=./output/RecModel/best_model +python3.7 -m paddle.distributed.launch tools/eval.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Global.pretrained_model=./output/RecModel/best_model ``` -上述命令将使用 `./configs/quick_start/MobileNetV1_retrieval.yaml` 作为配置文件,对上述训练得到的模型 `./output/RecModel/best_model` 进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过 `-o` 参数更新配置,如上所示。 +上述命令将使用 `./configs/quick_start/MobileNetV1_retrieval.yaml` 作为配置文件,对上述训练得到的模型 `./output/RecModel/best_model.pdparams` 进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过 `-o` 参数更新配置,如上所示。 可配置的部分评估参数说明如下: -* `Arch.name`:模型名称 * `Global.pretrained_model`:待评估的模型的预训练模型文件路径,不同于 `Global.Backbone.pretrained`,此处的预训练模型是整个模型的权重,而 `Global.Backbone.pretrained` 只是 Backbone 部分的权重。当需要做模型评估时,需要加载整个模型的权重。 -* `Metric.Eval`:待评估的指标,默认评估 recall@1、recall@5、mAP。当你不准备评测某一项指标时,可以将对应的试标从配置文件中删除;当你想增加某一项评测指标时,也可以参考 [Metric](../../../ppcls/metric/metrics.py) 部分在配置文件 `Metric.Eval` 中添加相关的指标。 +* `Metric.Eval`:待评估的指标,默认评估 `recall@1`、`recall@5`、`mAP`。当你不准备评测某一项指标时,可以将对应的试标从配置文件中删除;当你想增加某一项评测指标时,也可以参考 [Metric](../../../ppcls/metric/metrics.py) 部分在配置文件 `Metric.Eval` 中添加相关的指标。 **注意:** -* 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas 会自动补齐 `.pdparams` 的后缀,如 [2.2.2 特征模型恢复训练](#2.2.2)。 +* 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas 会自动补齐 `.pdparams` 的后缀,如 [2.2.2 特征提取模型恢复训练](#2.2.2)。 -* Metric learning 任务一般不评测 TopkAcc。 +* Metric learning 任务一般不评测 `TopkAcc` 指标。 -### 2.3 特征模型导出 inference 模型 +### 2.3 特征提取模型导出 inference 模型 通过导出 inference 模型,PaddlePaddle 支持使用预测引擎进行预测推理。对训练好的模型进行转换: ```bash -python3 tools/export_model.py \ - -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ - -o Global.pretrained_model=output/RecModel/best_model \ - -o Global.save_inference_dir=./inference +python3.7 tools/export_model.py \ +-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ +-o Global.pretrained_model=output/RecModel/best_model \ +-o Global.save_inference_dir=./inference ``` -其中,`Global.pretrained_model` 用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2.2 特征模型恢复训练](#2.2.2))。当执行后,会在当前目录下生成 `./inference` 目录,目录下包含 `inference.pdiparams`、`inference.pdiparams.info`、`inference.pdmodel` 文件。`Global.save_inference_dir` 可以指定导出 inference 模型的路径。此处保存的 inference 模型在 embedding 特征层做了截断,即模型最终的输出为 n 维 embedding 特征。 +其中,`Global.pretrained_model` 用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[2.2.2 特征提取模型恢复训练](#2.2.2))。当执行后,会在当前目录下生成 `./inference` 目录,目录下包含 `inference.pdiparams`、`inference.pdiparams.info`、`inference.pdmodel` 文件。`Global.save_inference_dir` 可以指定导出 inference 模型文件夹的路径。此处保存的 inference 模型在 embedding 特征层做了截断,即模型的推理输出为 n 维特征。 -上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理。使用 inference 模型推理的流程可以参考[基于 Python 预测引擎预测推理](../inference_deployment/python_deploy.md)。 +有了上述命令将生成的模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),接下来就可以使用预测引擎进行推理。使用 inference 模型推理的流程可以参考[基于 Python 预测引擎预测推理](../inference_deployment/python_deploy.md)。 - + ## 3. 特征检索 PaddleClas 图像检索部分目前支持的环境如下: -```shell -└── CPU/单卡 GPU - ├── Linux - ├── MacOS - └── Windows -``` +| 操作系统 | 推理硬件 | +| :------- | :------- | +| Linux | CPU/GPU | +| Windows | CPU/GPU | +| MacOS | CPU/GPU | -此部分使用了 [Faiss](https://github.com/facebookresearch/faiss) 作为检索库,其是一个高效的特征检索及聚类的库。此库中集成了多种相似度检索算法,以满足不同的检索场景。在 PaddleClas 中,支持三种检索算法: + +此部分使用了第三方开源库 [Faiss](https://github.com/facebookresearch/faiss) 作为检索工具,它是一个高效的特征检索与聚类的库,集成了多种相似度检索算法,以满足不同的检索场景。PaddleClas 目前支持三种检索算法: - **HNSW32**: 一种图索引方法。检索精度较高,速度较快。但是特征库只支持添加图像功能,不支持删除图像特征功能。(默认方法) - **IVF**:倒排索引检索方法。速度较快,但是精度略低。特征库支持增加、删除图像特功能。 @@ -296,22 +305,27 @@ PaddleClas 图像检索部分目前支持的环境如下: 具体安装方法如下: -```python -pip install faiss-cpu==1.7.1post2 +```shell +python3.7 -m pip install faiss-cpu==1.7.1post2 ``` -若使用时,不能正常引用,则 `uninstall` 之后,重新 `install`,尤其是 `windows` 下。 +若无法正常使用faiss,可以按以下命令先将其卸载,然后重新安装(Windows系统中该问题比较常见)。 + +```shell +python3.7 -m pip uninstall faiss-cpu +python3.7 -m pip install faiss-cpu==1.7.1post2 +``` ## 4. 基础知识 -图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次,对这些图像的特征向量用欧式距离或余弦距离进行最近邻搜索以找到底库中相似的图像,最后,可以使用一些后处理技术对检索结果进行微调,确定被识别图像的类别等信息。所以,决定一个图像检索算法性能的关键在于图像对应的特征向量的好坏。 +图像检索指的是给定一个包含特定实例(例如特定目标、场景、物品等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。不同于图像分类,图像检索解决的是一个开集问题,训练集中可能不包含被识别的图像的类别。图像检索的整体流程为:首先将图像中表示为一个合适的特征向量,其次对这些图像的特征向量用合适的距离度量函数进行最近邻搜索以找到数据库图像中相似的图像,最后,可能会使用一些后处理对检索结果进行进一步优化,得到待识别图像的类别、相似度等信息。所以,图像检索算法性能的关键在于图像提取的特征向量的表示能力强弱。 - 度量学习(Metric Learning) -度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法(kNN、k-means 等)取得较好的性能。深度度量学习(Deep Metric Learning)是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间(嵌入空间,embedding space)的映射,使得同类对象在嵌入空间上使用常用的距离函数(欧氏距离、cosine 距离等)计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。更详细的介绍请参考[此文档](../algorithm_introduction/metric_learning.md)。 + 度量学习研究如何在一个特定的任务上学习一个距离函数,使得该距离函数能够帮助基于近邻的算法(kNN、k-means 等)取得较好的性能。深度度量学习(Deep Metric Learning)是度量学习的一种方法,它的目标是学习一个从原始特征到低维稠密的向量空间(嵌入空间,embedding space)的映射,使得同类对象在嵌入空间上使用常用的距离函数(欧氏距离、cosine 距离等)计算的距离比较近,而不同类的对象之间的距离则比较远。深度度量学习在计算机视觉领域取得了非常多的成功的应用,比如人脸识别、商品识别、图像检索、行人重识别等。更详细的介绍请参考[此文档](../algorithm_introduction/metric_learning.md)。 @@ -319,19 +333,17 @@ pip install faiss-cpu==1.7.1post2 - 训练集合(train dataset):用来训练模型,使模型能够学习该集合的图像特征。 - 底库数据集合(gallery dataset):用来提供图像检索任务中的底库数据,该集合可与训练集或测试集相同,也可以不同,当与训练集相同时,测试集的类别体系应与训练集的类别体系相同。 - - 测试集合(query dataset):用来测试模型的好坏,通常要对测试集的每一张测试图片进行特征提取,之后和底库数据的特征进行距离匹配,得到识别结果,后根据识别结果计算整个测试集的指标。 + - 测试集合(query dataset):用来测试模型的检索性能,通常要对测试集的每一张测试图片进行特征提取,之后和底库数据的特征进行距离匹配,得到检索结果,后根据检索结果计算模型在整个测试集上的性能指标。 - 图像检索评价指标 - 召回率(recall):表示预测为正例且标签为正例的个数 / 标签为正例的个数 - - - recall@1:检索的 top-1 中预测正例且标签为正例的个数 / 标签为正例的个数 - - recall@5:检索的 top-5 中所有预测正例且标签为正例的个数 / 标签为正例的个数 + - `recall@k`:检索的 top-k 结果中预测为正例且标签为正例的个数 / 标签为正例的个数 - 平均检索精度(mAP) - - AP: AP 指的是不同召回率上的正确率的平均值 - - mAP: 测试集中所有图片对应的 AP 的平均值 + - `AP`: AP 指的是不同召回率上的正确率的平均值 + - `mAP`: 测试集中所有图片对应的 AP 的平均值 diff --git a/docs/zh_CN/others/update_history.md b/docs/zh_CN/others/update_history.md index 5ea649e52a9d53eb3aab5b8b9322d1a87920fefa..0e88a51362d7d04db960c966db72a5ec3a0ee787 100644 --- a/docs/zh_CN/others/update_history.md +++ b/docs/zh_CN/others/update_history.md @@ -1,5 +1,6 @@ # 更新日志 +- 2022.4.21 新增 CVPR2022 oral论文 [MixFormer](https://arxiv.org/pdf/2204.02557.pdf) 相关[代码](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files)。 - 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf),新增饮料识别demo。 - 2021.10.23 发布轻量级图像识别系统PP-ShiTu,CPU上0.2s即可完成在10w+库的图像识别。[点击这里](../quick_start/quick_start_recognition.md)立即体验。 - 2021.09.17 发布PP-LCNet系列超轻量骨干网络模型, 在Intel CPU上,单张图像预测速度约5ms,ImageNet-1K数据集上Top1识别准确率达到80.82%,超越ResNet152的模型效果。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](../models/PP-LCNet.md),相关指标和预训练权重可以从 [这里](../algorithm_introduction/ImageNet_models.md)下载。 diff --git a/docs/zh_CN/quick_start/quick_start_recognition.md b/docs/zh_CN/quick_start/quick_start_recognition.md index 38803ec9be510d3a4a96117fce3a1ccf537d3af9..550e455228f65b6886b13d8627413d4dd1387990 100644 --- a/docs/zh_CN/quick_start/quick_start_recognition.md +++ b/docs/zh_CN/quick_start/quick_start_recognition.md @@ -1,87 +1,131 @@ -# 图像识别快速开始 +## 图像识别快速体验 -本文档包含 3 个部分:环境配置、图像识别体验、未知类别的图像识别体验。 +本文档包含 2 个部分:PP-ShiTu android端 demo 快速体验与PP-ShiTu PC端 demo 快速体验。 如果图像类别已经存在于图像索引库中,那么可以直接参考[图像识别体验](#图像识别体验)章节,完成图像识别过程;如果希望识别未知类别的图像,即图像类别之前不存在于索引库中,那么可以参考[未知类别的图像识别体验](#未知类别的图像识别体验)章节,完成建立索引并识别的过程。 ## 目录 -* [1. 环境配置](#环境配置) -* [2. 图像识别体验](#图像识别体验) - * [2.1 下载、解压 inference 模型与 demo 数据](#2.1) - * [2.2 瓶装饮料识别与检索](#瓶装饮料识别与检索) - * [2.2.1 识别单张图像](#识别单张图像) - * [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别) -* [3. 未知类别的图像识别体验](#未知类别的图像识别体验) - * [3.1 准备新的数据与标签](#准备新的数据与标签) - * [3.2 建立新的索引库](#建立新的索引库) - * [3.3 基于新的索引库的图像识别](#基于新的索引库的图像识别) -* [4. 服务端识别模型列表](#4) +- [1. PP-ShiTu android demo 快速体验](#1-pp-shitu-android-demo-快速体验) + - [1.1 安装 PP-ShiTu android demo](#11-安装-pp-shitu-android-demo) + - [1.2 操作说明](#12-操作说明) +- [2. PP-ShiTu PC端 demo 快速体验](#2-pp-shitu-pc端-demo-快速体验) + - [2.1 环境配置](#21-环境配置) + - [2.2 图像识别体验](#22-图像识别体验) + - [2.2.1 下载、解压 inference 模型与 demo 数据](#221-下载解压-inference-模型与-demo-数据) + - [2.2.2 瓶装饮料识别与检索](#222-瓶装饮料识别与检索) + - [2.2.2.1 识别单张图像](#2221-识别单张图像) + - [2.2.2.2 基于文件夹的批量识别](#2222-基于文件夹的批量识别) + - [2.3 未知类别的图像识别体验](#23-未知类别的图像识别体验) + - [2.3.1 准备新的数据与标签](#231-准备新的数据与标签) + - [2.3.2 建立新的索引库](#232-建立新的索引库) + - [2.3.3 基于新的索引库的图像识别](#233-基于新的索引库的图像识别) + - [2.4 服务端识别模型列表](#24-服务端识别模型列表) + + + +## 1. PP-ShiTu android demo 快速体验 + + + +### 1.1 安装 PP-ShiTu android demo + +可以通过扫描二维码或者[点击链接](https://paddle-imagenet-models-name.bj.bcebos.com/demos/PP-ShiTu.apk)下载并安装APP + +
+ + + +### 1.2 功能体验 +目前 PP-ShiTu android demo 具有图像检索、图像加库、保存检索库、初始化检索库、查看检索库标签等基本功能,接下来介绍如何体验这几个功能。 + +#### (1)识别图像中的物体 +点击下方的“拍照识别”按钮或者“本地识别”按钮,即可拍摄一张图像或者选中一张图像,然后等待几秒钟,APP便会将图像中的主体框标注出来并且在图像下方给出预测的类别以及预测时间等信息。 + +在选择好要检索的图片之后,首先会通过检测模型进行主体检测,得到图像中的物体的区域,然后将这块区域裁剪出来输入到识别模型中,得到对应的特征向量并在检索库中检索,返回并显示最终的检索结果。 + +假设待检索的图像如下: + + + +得到的检索结果可视化如下: + + + +#### (2)向检索库中添加新的类别或物体 +点击上方的“拍照上传”按钮或者“本地上传”按钮,即可拍摄一张图像或从图库中选择一张图像,然后再输入这张图像的类别名字(比如`keyboard`),点击“确定”按钮,即可将图片对应的特征向量与标签加入检索库。 + +在选择好要入库的图片之后,首先会通过检测模型进行主体检测,得到图像中的物体的区域,然后将这块区域裁剪出来输入到识别模型中,得到对应的特征向量,再与用户输入的图像标签一起加入到检索库中。 + +**温馨提示:** 使用安卓demo管理类别主要用于功能体验,如果您有较为重要的数据要生成检索库,推荐使用[检索库管理工具](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/inference_deployment/shitu_gallery_manager.md) + +#### (3) 保存检索库 +点击上方的“保存修改”按钮,即可将当前库以 `latest` 的库名保存下来。 +再次打开程序时,将会自动选择使用`latest`库。app仅存在一个自定义库,每次保存时会覆盖之前的库。 + +#### (4) 检索库恢复出厂设置 +**警告:本操作无法撤销,初始化后自定义的标签和类别都会被删除,请谨慎操作** + +点击上方的“初始化 ”按钮,删除所有自定义的标签和类别,恢复出厂特征库。 + +初始化库时会删掉`latest`库(如果存在),自动将检索库和标签库切换成 `original.index` 和 `original.txt`。不管是否有保存过,自定义的标签和类别都会被清空。 + +#### (5) 查看当前检索库中的类别列表 +点击“类别查询”按钮,即可在弹窗中查看。 + +当检索标签库过多(如本demo自带的196类检索标签库)时,可在弹窗中滑动查看。 + + +## 2. PP-ShiTu PC端 demo 快速体验 -## 1. 环境配置 +### 2.1 环境配置 * 安装:请先参考文档 [环境准备](../installation/install_paddleclas.md) 配置 PaddleClas 运行环境。 * 进入 `deploy` 运行目录。本部分所有内容与命令均需要在 `deploy` 目录下运行,可以通过下面的命令进入 `deploy` 目录。 - ``` + ```shell cd deploy ``` -## 2. 图像识别体验 +### 2.2 图像识别体验 轻量级通用主体检测模型与轻量级通用识别模型和配置文件下载方式如下表所示。 -| 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 | -| ------------ | ------------- | -------- | ------- | -| 轻量级通用主体检测模型 | 通用场景 |[tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) [zip 格式文件下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | - | -| 轻量级通用识别模型 | 通用场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) [zip 格式文件下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.zip) | [inference_general.yaml](../../../deploy/configs/inference_general.yaml) | -| 轻量级通用识别二值模型 | 检索库很大, 存储受限场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_binary_v1.0_infer.tar) [zip 格式文件下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_binary_v1.0_infer.zip)| [inference_general_binary.yaml](../../../deploy/configs/inference_general_binary.yaml) | +| 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 | +| ---------------------- | -------- | ----------- | ------------ | +| 轻量级通用主体检测模型 | 通用场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) \| [zip 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.zip) | - | +| 轻量级通用识别模型 | 通用场景 | [tar 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar) \| [zip 格式下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.zip) | [inference_general.yaml](../../../deploy/configs/inference_general.yaml) | -注意:由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar xf xxx.tar` 解压。 +注意:由于部分解压缩软件在解压上述 `tar` 格式文件时存在问题,建议非命令行用户下载 `zip` 格式文件并解压。`tar` 格式文件建议使用命令 `tar -xf xxx.tar` 解压。 -本章节 demo 数据下载地址如下: [瓶装饮料数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar)。 +本章节 demo 数据下载地址如下: [drink_dataset_v2.0.tar(瓶装饮料数据)](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar), +下面以 **drink_dataset_v2.0.tar** 为例介绍PC端的 PP-ShiTu 快速体验流程。用户也可以自行下载并解压其它场景的数据进行体验:[22种场景数据下载](../introduction/ppshitu_application_scenarios.md#1-应用场景介绍)。 -如果希望体验服务端主体检测和各垂类方向的识别模型,可以参考[第4章](#4)。 +如果希望体验服务端主体检测和各垂类方向的识别模型,可以参考 [2.4 服务端识别模型列表](#24-服务端识别模型列表) **注意** -1. windows 环境下如果没有安装 wget, 可以按照下面的步骤安装 wget 与 tar 命令,也可以在下载模型时将链接复制到浏览器中下载,并解压放置在相应目录下; linux 或者 macOS 用户可以右键点击,然后复制下载链接,即可通过 `wget` 命令下载。 -2. 如果 macOS 环境下没有安装 `wget` 命令,可以运行下面的命令进行安装。 - -```shell -# 安装 homebrew -ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"; -# 安装 wget -brew install wget -``` - -4. 如果希望在 windows 环境下安装 wget,可以参考:[链接](https://www.cnblogs.com/jeshy/p/10518062.html);如果希望在 windows 环境中安装 tar 命令,可以参考:[链接](https://www.cnblogs.com/chooperman/p/14190107.html)。 +- windows 环境下如果没有安装 wget, 可以按照下面的步骤安装 wget 与 tar 命令,也可以在下载模型时将链接复制到浏览器中下载,并解压放置在相应目录下; linux 或者 macOS 用户可以右键点击,然后复制下载链接,即可通过 `wget` 命令下载。 +- 如果 macOS 环境下没有安装 `wget` 命令,可以运行下面的命令进行安装。 + ```shell + # 安装 homebrew + ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"; + # 安装 wget + brew install wget + ``` +- 如果希望在 windows 环境下安装 wget,可以参考:[链接](https://www.cnblogs.com/jeshy/p/10518062.html);如果希望在 windows 环境中安装 tar 命令,可以参考:[链接](https://www.cnblogs.com/chooperman/p/14190107.html)。 + -* 可以按照下面的命令下载并解压数据与模型 - -```shell -mkdir models -cd models -# 下载识别 inference 模型并解压 -wget {模型下载链接地址} && tar -xf {压缩包的名称} -cd .. - -# 下载 demo 数据并解压 -wget {数据下载链接地址} && tar -xf {压缩包的名称} -``` - - - -### 2.1 下载、解压 inference 模型与 demo 数据 +#### 2.2.1 下载、解压 inference 模型与 demo 数据 下载 demo 数据集以及轻量级主体检测、识别模型,命令如下。 @@ -91,30 +135,30 @@ cd models # 下载通用检测 inference 模型并解压 wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar # 下载识别 inference 模型并解压 -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar && tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar cd ../ # 下载 demo 数据并解压 -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar ``` -解压完毕后,`drink_dataset_v1.0/` 文件夹下应有如下文件结构: +解压完毕后,`drink_dataset_v2.0/` 文件夹下应有如下文件结构: -``` -├── drink_dataset_v1.0/ +```log +├── drink_dataset_v2.0/ │ ├── gallery/ │ ├── index/ -│ ├── test_images/ +│ ├── index_all/ +│ └── test_images/ ├── ... ``` 其中 `gallery` 文件夹中存放的是用于构建索引库的原始图像,`index` 表示基于原始图像构建得到的索引库信息,`test_images` 文件夹中存放的是用于测试识别效果的图像列表。 - `models` 文件夹下应有如下文件结构: -``` -├── general_PPLCNet_x2_5_lite_v1.0_infer +```log +├── general_PPLCNetV2_base_pretrained_v1.0_infer │ ├── inference.pdiparams │ ├── inference.pdiparams.info │ └── inference.pdmodel @@ -129,183 +173,185 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_da 如果使用服务端通用识别模型,Demo 数据需要重新提取特征、够建索引,方式如下: ```shell -# 下面是使用下载的服务端商品识别模型进行索引库构建 -python3.7 python/build_gallery.py -c configs/inference_general.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer +python3.7 python/build_gallery.py \ +-c configs/inference_general.yaml \ +-o Global.rec_inference_model_dir=./models/general_PPLCNetV2_base_pretrained_v1.0_infer ``` -### 2.2 瓶装饮料识别与检索 +#### 2.2.2 瓶装饮料识别与检索 以瓶装饮料识别 demo 为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的 demo 数据与模型之后,替换对应的配置文件即可完成预测)。 注意,此部分使用了 `faiss` 作为检索库,安装方法如下: ```python -pip install faiss-cpu==1.7.1post2 +python3.7 -m pip install faiss-cpu==1.7.1post2 ``` 若使用时,不能正常引用,则 `uninstall` 之后,重新 `install`,尤其是 windows 下。 -#### 2.2.1 识别单张图像 +##### 2.2.2.1 识别单张图像 + +运行下面的命令,对图像 `./drink_dataset_v2.0/test_images/100.jpeg` 进行识别与检索 -运行下面的命令,对图像 `./drink_dataset_v1.0/test_images/nongfu_spring.jpeg` 进行识别与检索 +待检索图像如下所示 + +![](../../images/recognition/drink_data_demo/test_images/100.jpeg) ```shell # 使用下面的命令使用 GPU 进行预测 python3.7 python/predict_system.py -c configs/inference_general.yaml + # 使用下面的命令使用 CPU 进行预测 python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False ``` -待检索图像如下所示。 - -![](../../images/recognition/drink_data_demo/test_images/nongfu_spring.jpeg) - 最终输出结果如下。 -``` -[{'bbox': [244, 49, 509, 964], 'rec_docs': '农夫山泉-饮用天然水', 'rec_scores': 0.7585664}] +```log +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] ``` 其中 `bbox` 表示检测出的主体所在位置,`rec_docs` 表示索引库中与检测框最为相似的类别,`rec_scores` 表示对应的置信度。 -检测的可视化结果也保存在 `output` 文件夹下,对于本张图像,识别结果可视化如下所示。 +检测的可视化结果默认保存在 `output` 文件夹下,对于本张图像,识别结果可视化如下所示。 -![](../../images/recognition/drink_data_demo/output/nongfu_spring.jpeg) +![](../../images/recognition/drink_data_demo/output/100.jpeg) -#### 2.2.2 基于文件夹的批量识别 + +##### 2.2.2.2 基于文件夹的批量识别 如果希望预测文件夹内的图像,可以直接修改配置文件中的 `Global.infer_imgs` 字段,也可以通过下面的 `-o` 参数修改对应的配置。 ```shell # 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False -python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v1.0/test_images/" +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/" ``` 终端中会输出该文件夹内所有图像的识别结果,如下所示。 -``` +```log ... -[{'bbox': [345, 95, 524, 586], 'rec_docs': '红牛-强化型', 'rec_scores': 0.80164653}] -Inference: 23.43583106994629 ms per batch image -[{'bbox': [233, 0, 372, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.72513914}] -Inference: 117.95639991760254 ms per batch image -[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.7855944}] -Inference: 22.172927856445312 ms per batch image -[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.5829516}] -Inference: 118.08514595031738 ms per batch image -[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.75581443}] -Inference: 150.06470680236816 ms per batch image -[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.8478892}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6790612}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6292581}] +[{'bbox': [0, 0, 600, 600], 'rec_docs': '红牛-强化型', 'rec_scores': 0.74081033}] +Inference: 120.39852142333984 ms per batch image +[{'bbox': [0, 0, 514, 436], 'rec_docs': '康师傅矿物质水', 'rec_scores': 0.6918598}] +Inference: 32.045602798461914 ms per batch image +[{'bbox': [138, 40, 573, 1198], 'rec_docs': '乐虎功能饮料', 'rec_scores': 0.68214047}] +Inference: 113.41428756713867 ms per batch image +[{'bbox': [328, 7, 467, 272], 'rec_docs': '脉动', 'rec_scores': 0.60406065}] +Inference: 122.04337120056152 ms per batch image +[{'bbox': [242, 82, 498, 726], 'rec_docs': '味全_每日C', 'rec_scores': 0.5428652}] +Inference: 37.95266151428223 ms per batch image +[{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}] ... ``` 所有图像的识别结果可视化图像也保存在 `output` 文件夹内。 - 更多地,可以通过修改 `Global.rec_inference_model_dir` 字段来更改识别 inference 模型的路径,通过修改 `IndexProcess.index_dir` 字段来更改索引库索引的路径。 -## 3. 未知类别的图像识别体验 +### 2.3 未知类别的图像识别体验 -对图像 `./drink_dataset_v1.0/test_images/mosilian.jpeg` 进行识别,命令如下 - -```shell -# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False -python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v1.0/test_images/mosilian.jpeg" -``` +对图像 `./drink_dataset_v2.0/test_images/mosilian.jpeg` 进行识别 -待检索图像如下所示。 +待检索图像如下 ![](../../images/recognition/drink_data_demo/test_images/mosilian.jpeg) +执行如下识别命令 + +```shell +# 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg" +``` -输出结果为空。 +可以发现输出结果为空 由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。 -当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的。 +当索引库中的图像无法覆盖我们实际识别的场景时,即识别未知类别的图像前,我们需要将该未知类别的相似图像(至少一张)添加到索引库中,从而完成对未知类别的图像识别。这一过程不需要重新训练模型,以识别 `mosilian.jpeg` 为例,只需按以下步骤重新构建新的索引库即可。 -### 3.1 准备新的数据与标签 +#### 2.3.1 准备新的数据与标签 -首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹。这里 PaddleClas 已经将所有的图像数据都放在文件夹 `drink_dataset_v1.0/gallery/` 中。 - -然后需要编辑记录了图像路径和标签信息的文本文件,这里 PaddleClas 将更正后的标签信息文件放在了 `drink_dataset_v1.0/gallery/drink_label_all.txt` 文件中。可以与默认的 `drink_dataset_v1.0/gallery/drink_label.txt` 标签文件进行对比,添加了光明和三元系列牛奶的索引图像。 +首先需要将与待检索图像相似的图像列表拷贝到索引库原始图像的文件夹中。这里 PaddleClas 已经将所有的图像数据都放在文件夹 `drink_dataset_v2.0/gallery/` 中。 +然后需要编辑记录了图像路径和标签信息的文本文件,这里 PaddleClas 将更新后的标签信息文件放在了 `drink_dataset_v2.0/gallery/drink_label_all.txt` 文件中。与原始的 `drink_dataset_v2.0/gallery/drink_label.txt` 标签文件进行对比,可以发现新增了光明和三元系列牛奶的索引图像。 每一行的文本中,第一个字段表示图像的相对路径,第二个字段表示图像对应的标签信息,中间用 `\t` 键分隔开(注意:有些编辑器会将 `tab` 自动转换为 `空格`,这种情况下会导致文件解析报错)。 -### 3.2 建立新的索引库 +#### 2.3.2 建立新的索引库 -使用下面的命令构建 `index` 索引,加速识别后的检索过程。 +使用下面的命令构建新的索引库 `index_all`。 ```shell -python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v1.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v1.0/index_all" +python3.7 python/build_gallery.py -c configs/inference_general.yaml -o IndexProcess.data_file="./drink_dataset_v2.0/gallery/drink_label_all.txt" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all" ``` -最终新的索引信息保存在文件夹 `./drink_dataset_v1.0/index_all` 中。具体 `yaml` 请参考[向量检索文档](../image_recognition_pipeline/vector_search.md)。 +最终构建完毕的新的索引库保存在文件夹 `./drink_dataset_v2.0/index_all` 下。具体 `yaml` 请参考[向量检索文档](../image_recognition_pipeline/vector_search.md)。 -### 3.3 基于新的索引库的图像识别 +#### 2.3.3 基于新的索引库的图像识别 -使用新的索引库,对上述图像进行识别,运行命令如下。 +使用新的索引库,重新对 `mosilian.jpeg` 图像进行识别,运行命令如下。 ```shell # 使用下面的命令使用 GPU 进行预测,如果希望使用 CPU 预测,可以在命令后面添加 -o Global.use_gpu=False -python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="././drink_dataset_v1.0/test_images/mosilian.jpeg" -o IndexProcess.index_dir="./drink_dataset_v1.0/index_all" +python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs="./drink_dataset_v2.0/test_images/mosilian.jpeg" -o IndexProcess.index_dir="./drink_dataset_v2.0/index_all" ``` 输出结果如下。 -``` -[{'bbox': [396, 553, 508, 621], 'rec_docs': '光明_莫斯利安', 'rec_scores': 0.5921005}] +```log +[{'bbox': [290, 297, 564, 919], 'rec_docs': '光明_莫斯利安', 'rec_scores': 0.59137374}] ``` -最终识别结果为`光明_莫斯利安`,识别正确,识别结果可视化如下所示。 +最终识别结果为 `光明_莫斯利安` ,识别正确,识别结果可视化如下所示。 ![](../../images/recognition/drink_data_demo/output/mosilian.jpeg) - -## 4. 服务端识别模型列表 + + +### 2.4 服务端识别模型列表 目前,我们更推荐您使用[轻量级通用主体检测模型与轻量级通用识别模型](#轻量级通用主体检测模型与轻量级通用识别模型),以获得更好的测试结果。但是如果您希望体验服务端识别模型,服务器端通用主体检测模型与各方向识别模型、测试数据下载地址以及对应的配置文件地址如下。 -| 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 | -| ------------ | ------------- | -------- | ------- | -| 通用主体检测模型 | 通用场景 |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | -| Logo 识别模型 | Logo 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | -| 动漫人物识别模型 | 动漫人物场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | -| 车辆细分类模型 | 车辆场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | -| 商品识别模型 | 商品场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | -| 车辆 ReID 模型 | 车辆 ReID 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | +| 模型简介 | 推荐场景 | inference 模型 | 预测配置文件 | +| ---------------- | -------------- | ------------ | ----------- | +| 通用主体检测模型 | 通用场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | +| Logo 识别模型 | Logo 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | +| 动漫人物识别模型 | 动漫人物场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | +| 车辆细分类模型 | 车辆场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | +| 商品识别模型 | 商品场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | +| 车辆 ReID 模型 | 车辆 ReID 场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | +可以按照如下命令下载上述模型到 `deploy/models` 文件夹中,以供识别任务使用 ```shell -cd PaddleClas/deploy/ +cd ./deploy mkdir -p models -``` -```shell cd ./models -# 下载通用主体检测模型并解压 +# 下载服务器端通用主体检测模型并解压 wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar -# 下载识别模型并解压 +# 下载通用识别模型并解压 wget {识别模型下载链接地址} && tar -xf {压缩包的名称} ``` -使用如下命令下载各方向识别模型的测试数据: +然后使用如下命令下载各个识别场景的测试数据: ```shell # 回到 deploy 目录下 @@ -316,7 +362,7 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit 解压完毕后,`recognition_demo_data_v1.1` 文件夹下应有如下文件结构: -``` +```log ├── recognition_demo_data_v1.1 │ ├── gallery_cartoon │ ├── gallery_logo @@ -329,6 +375,6 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit ├── ... ``` -按照上述步骤下载模型和测试数据后,您可以进行相关方向识别模型的测试。 +按照上述步骤下载模型和测试数据后,您可以重新建立索引库,并进行相关方向识别模型的测试。 * 更多关于主体检测的介绍可以参考:[主体检测教程文档](../image_recognition_pipeline/mainbody_detection.md);关于特征提取的介绍可以参考:[特征提取教程文档](../image_recognition_pipeline/feature_extraction.md);关于向量检索的介绍可以参考:[向量检索教程文档](../image_recognition_pipeline/vector_search.md)。 diff --git a/docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md b/docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cca051cec89d9ef5a86a3edc2cb8745a5bc69a2e --- /dev/null +++ b/docs/zh_CN/samples/Fresh_Food_Recogniiton/README.md @@ -0,0 +1,21 @@ +## 生鲜品自主结算 + +在超市等无人零售场景中,目前主要是结算方式,主要有以下几种 + +- 条形码方式 +- RFID等射频码 +- 称重方法 + +但是以上几种方法存在如下缺点: 1)针对条形码方式,对于成品包装的商品,较为成熟,但是对与生鲜产品等商品,并不能满足需求。 2)RFID等方式,虽然对生鲜等产品能够支持,但是额外生成标签,增加成本 3)称重方法,对于相同重量的山商品,不能很好的区分,同时重量称等精密仪器在长时间的负重和使用过程中,精度会发生变化,需要工作人员定期调教,以满足精度需求。 + +因此,如何选择一种既能大规模支持各种商品识别,又能方便管理,同时维护成本不高的识别系统,显得尤为重要。 + +深圳市银歌云技术有限公司基于飞桨的图像识别开发套件PaddleClas,提供了一套基于计算机视觉的完整生鲜品自主结算方案,其通过结算平台的摄像头拍摄的图像,自动的识别称上的商品,整个流程在1秒内完成,无需售卖人员的操作及称重。整个流程,实现了精度高、速度快,无需人工干预的自动结算效果。减少人工成本的同时,大大提高了效率和用户体验。 + +本案例使用了飞桨图像分类开发套件中的通用图像识别系统[PP-ShiTuV2](../../PPShiTu/PPShiTuV2_introduction.md)。 + + +![result](./imgs/yingeo.png) + +**注**: AI Studio在线运行代码请参考[生鲜品自主结算](https://aistudio.baidu.com/aistudio/projectdetail/4486158) + diff --git a/docs/zh_CN/samples/Fresh_Food_Recogniiton/imgs/yingeo.png b/docs/zh_CN/samples/Fresh_Food_Recogniiton/imgs/yingeo.png new file mode 100644 index 0000000000000000000000000000000000000000..e84b06a0591e3892d37a8e7068ab439d365246f0 Binary files /dev/null and b/docs/zh_CN/samples/Fresh_Food_Recogniiton/imgs/yingeo.png differ diff --git a/paddleclas.py b/paddleclas.py index b11b343d42ca210e0363c71b4522c294db08fa41..1463b80d28cdcf7b89ed72c3089aa7f5e084793a 100644 --- a/paddleclas.py +++ b/paddleclas.py @@ -32,6 +32,7 @@ from .ppcls.arch import backbone from .ppcls.utils import logger from .deploy.python.predict_cls import ClsPredictor +from .deploy.python.predict_system import SystemPredictor from .deploy.utils.get_image_list import get_image_list from .deploy.utils import config @@ -50,6 +51,11 @@ BASE_IMAGES_DIR = os.path.join(BASE_DIR, "images") IMN_MODEL_BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/{}_infer.tar" IMN_MODEL_SERIES = { "AlexNet": ["AlexNet"], + "CSWinTransformer": [ + "CSWinTransformer_tiny_224", "CSWinTransformer_small_224", + "CSWinTransformer_base_224", "CSWinTransformer_base_384", + "CSWinTransformer_large_224", "CSWinTransformer_large_384" + ], "DarkNet": ["DarkNet53"], "DeiT": [ "DeiT_base_distilled_patch16_224", "DeiT_base_distilled_patch16_384", @@ -81,6 +87,8 @@ IMN_MODEL_SERIES = { "HRNet_W48_C_ssld" ], "Inception": ["GoogLeNet", "InceptionV3", "InceptionV4"], + "LeViT": + ["LeViT_128S", "LeViT_128", "LeViT_192", "LeViT_256", "LeViT_384"], "MixNet": ["MixNet_S", "MixNet_M", "MixNet_L"], "MobileNetV1": [ "MobileNetV1_x0_25", "MobileNetV1_x0_5", "MobileNetV1_x0_75", @@ -99,6 +107,7 @@ IMN_MODEL_SERIES = { "MobileNetV3_large_x1_0", "MobileNetV3_large_x1_25", "MobileNetV3_small_x1_0_ssld", "MobileNetV3_large_x1_0_ssld" ], + "MobileViT": ["MobileViT_XXS", "MobileViT_XS", "MobileViT_S"], "PPHGNet": [ "PPHGNet_tiny", "PPHGNet_small", @@ -110,6 +119,10 @@ IMN_MODEL_SERIES = { "PPLCNet_x1_0", "PPLCNet_x1_5", "PPLCNet_x2_0", "PPLCNet_x2_5" ], "PPLCNetV2": ["PPLCNetV2_base"], + "PVTV2": [ + "PVT_V2_B0", "PVT_V2_B1", "PVT_V2_B2", "PVT_V2_B2_Linear", "PVT_V2_B3", + "PVT_V2_B4", "PVT_V2_B5" + ], "RedNet": ["RedNet26", "RedNet38", "RedNet50", "RedNet101", "RedNet152"], "RegNet": ["RegNetX_4GF"], "Res2Net": [ @@ -162,6 +175,7 @@ IMN_MODEL_SERIES = { "pcpvt_small", "pcpvt_base", "pcpvt_large", "alt_gvt_small", "alt_gvt_base", "alt_gvt_large" ], + "TNT": ["TNT_small"], "VGG": ["VGG11", "VGG13", "VGG16", "VGG19"], "VisionTransformer": [ "ViT_base_patch16_224", "ViT_base_patch16_384", "ViT_base_patch32_384", @@ -178,7 +192,16 @@ PULC_MODEL_BASE_DOWNLOAD_URL = "https://paddleclas.bj.bcebos.com/models/PULC/inf PULC_MODELS = [ "car_exists", "language_classification", "person_attribute", "person_exists", "safety_helmet", "text_image_orientation", - "textline_orientation", "traffic_sign", "vehicle_attribute" + "textline_orientation", "traffic_sign", "vehicle_attribute", + "table_attribute" +] + +SHITU_MODEL_BASE_DOWNLOAD_URL = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/{}_infer.tar" +SHITU_MODELS = [ + # "picodet_PPLCNet_x2_5_mainbody_lite_v1.0", # ShiTuV1(V2)_mainbody_det + # "general_PPLCNet_x2_5_lite_v1.0" # ShiTuV1_general_rec + # "PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0", # ShiTuV2_general_rec TODO(hesensen): add lite model + "PP-ShiTuV2" ] @@ -200,12 +223,24 @@ class InputModelError(Exception): def init_config(model_type, model_name, inference_model_dir, **kwargs): - cfg_path = f"deploy/configs/PULC/{model_name}/inference_{model_name}.yaml" if model_type == "pulc" else "deploy/configs/inference_cls.yaml" + if model_type == "pulc": + cfg_path = f"deploy/configs/PULC/{model_name}/inference_{model_name}.yaml" + elif model_type == "shitu": + cfg_path = "deploy/configs/inference_general.yaml" + else: + cfg_path = "deploy/configs/inference_cls.yaml" + __dir__ = os.path.dirname(__file__) cfg_path = os.path.join(__dir__, cfg_path) cfg = config.get_config(cfg_path, show=False) - - cfg.Global.inference_model_dir = inference_model_dir + if cfg.Global.get("inference_model_dir"): + cfg.Global.inference_model_dir = inference_model_dir + else: + cfg.Global.rec_inference_model_dir = os.path.join( + inference_model_dir, + "PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0") + cfg.Global.det_inference_model_dir = os.path.join( + inference_model_dir, "picodet_PPLCNet_x2_5_mainbody_lite_v1.0") if "batch_size" in kwargs and kwargs["batch_size"]: cfg.Global.batch_size = kwargs["batch_size"] @@ -219,6 +254,10 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs): if "infer_imgs" in kwargs and kwargs["infer_imgs"]: cfg.Global.infer_imgs = kwargs["infer_imgs"] + if "index_dir" in kwargs and kwargs["index_dir"]: + cfg.IndexProcess.index_dir = kwargs["index_dir"] + if "data_file" in kwargs and kwargs["data_file"]: + cfg.IndexProcess.data_file = kwargs["data_file"] if "enable_mkldnn" in kwargs and kwargs["enable_mkldnn"]: cfg.Global.enable_mkldnn = kwargs["enable_mkldnn"] if "cpu_num_threads" in kwargs and kwargs["cpu_num_threads"]: @@ -240,25 +279,45 @@ def init_config(model_type, model_name, inference_model_dir, **kwargs): if "thresh" in kwargs and kwargs[ "thresh"] and "ThreshOutput" in cfg.PostProcess: cfg.PostProcess.ThreshOutput.thresh = kwargs["thresh"] - if "Topk" in cfg.PostProcess: - if "topk" in kwargs and kwargs["topk"]: - cfg.PostProcess.Topk.topk = kwargs["topk"] - if "class_id_map_file" in kwargs and kwargs["class_id_map_file"]: - cfg.PostProcess.Topk.class_id_map_file = kwargs[ - "class_id_map_file"] - else: - class_id_map_file_path = os.path.relpath( - cfg.PostProcess.Topk.class_id_map_file, "../") - cfg.PostProcess.Topk.class_id_map_file = os.path.join( - __dir__, class_id_map_file_path) - if "VehicleAttribute" in cfg.PostProcess: - if "color_threshold" in kwargs and kwargs["color_threshold"]: - cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ - "color_threshold"] - if "type_threshold" in kwargs and kwargs["type_threshold"]: - cfg.PostProcess.VehicleAttribute.type_threshold = kwargs[ - "type_threshold"] + if cfg.get("PostProcess"): + if "Topk" in cfg.PostProcess: + if "topk" in kwargs and kwargs["topk"]: + cfg.PostProcess.Topk.topk = kwargs["topk"] + if "class_id_map_file" in kwargs and kwargs["class_id_map_file"]: + cfg.PostProcess.Topk.class_id_map_file = kwargs[ + "class_id_map_file"] + else: + class_id_map_file_path = os.path.relpath( + cfg.PostProcess.Topk.class_id_map_file, "../") + cfg.PostProcess.Topk.class_id_map_file = os.path.join( + __dir__, class_id_map_file_path) + if "VehicleAttribute" in cfg.PostProcess: + if "color_threshold" in kwargs and kwargs["color_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "color_threshold"] + if "type_threshold" in kwargs and kwargs["type_threshold"]: + cfg.PostProcess.VehicleAttribute.type_threshold = kwargs[ + "type_threshold"] + if "TableAttribute" in cfg.PostProcess: + if "source_threshold" in kwargs and kwargs["source_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "source_threshold"] + if "number_threshold" in kwargs and kwargs["number_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "number_threshold"] + if "color_threshold" in kwargs and kwargs["color_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "color_threshold"] + if "clarity_threshold" in kwargs and kwargs["clarity_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "clarity_threshold"] + if "obstruction_threshold" in kwargs and kwargs["obstruction_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "obstruction_threshold"] + if "angle_threshold" in kwargs and kwargs["angle_threshold"]: + cfg.PostProcess.VehicleAttribute.color_threshold = kwargs[ + "angle_threshold"] if "save_dir" in kwargs and kwargs["save_dir"]: cfg.PostProcess.SavePreLabel.save_dir = kwargs["save_dir"] @@ -282,6 +341,13 @@ def args_cfg(): type=str, help="The directory of model files. Valid when model_name not specifed." ) + parser.add_argument( + "--index_dir", + type=str, + required=False, + help="The index directory path.") + parser.add_argument( + "--data_file", type=str, required=False, help="The label file path.") parser.add_argument("--use_gpu", type=str2bool, help="Whether use GPU.") parser.add_argument( "--gpu_mem", @@ -334,6 +400,7 @@ def print_info(): """ imn_table = PrettyTable(["IMN Model Series", "Model Name"]) pulc_table = PrettyTable(["PULC Models"]) + shitu_table = PrettyTable(["PP-ShiTu Models"]) try: sz = os.get_terminal_size() total_width = sz.columns @@ -352,11 +419,16 @@ def print_info(): textwrap.fill( " ".join(PULC_MODELS), width=total_width).center(table_width - 4) ]) + shitu_table.add_row([ + textwrap.fill( + " ".join(SHITU_MODELS), width=total_width).center(table_width - 4) + ]) print("{}".format("-" * table_width)) print("Models supported by PaddleClas".center(table_width)) print(imn_table) print(pulc_table) + print(shitu_table) print("Powered by PaddlePaddle!".rjust(table_width)) print("{}".format("-" * table_width)) @@ -412,6 +484,10 @@ def check_model_file(model_type, model_name): storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR, "PULC", model_name) url = PULC_MODEL_BASE_DOWNLOAD_URL.format(model_name) + elif model_type == "shitu": + storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR, + "PP-ShiTu", model_name) + url = SHITU_MODEL_BASE_DOWNLOAD_URL.format(model_name) else: storage_directory = partial(os.path.join, BASE_INFERENCE_MODEL_DIR, "IMN", model_name) @@ -472,8 +548,10 @@ class PaddleClas(object): model_name, inference_model_dir) self._config = init_config(self.model_type, model_name, inference_model_dir, **kwargs) - - self.cls_predictor = ClsPredictor(self._config) + if self.model_type == "shitu": + self.predictor = SystemPredictor(self._config) + else: + self.predictor = ClsPredictor(self._config) def get_config(self): """Get the config. @@ -485,6 +563,7 @@ class PaddleClas(object): """ all_imn_model_names = get_imn_model_names() all_pulc_model_names = PULC_MODELS + all_shitu_model_names = SHITU_MODELS if model_name: if model_name in all_imn_model_names: @@ -493,6 +572,15 @@ class PaddleClas(object): elif model_name in all_pulc_model_names: inference_model_dir = check_model_file("pulc", model_name) return "pulc", inference_model_dir + elif model_name in all_shitu_model_names: + inference_model_dir = check_model_file( + "shitu", + "PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0") + inference_model_dir = check_model_file( + "shitu", "picodet_PPLCNet_x2_5_mainbody_lite_v1.0") + inference_model_dir = os.path.abspath( + os.path.dirname(inference_model_dir)) + return "shitu", inference_model_dir else: similar_imn_names = similar_model_names(model_name, all_imn_model_names) @@ -513,12 +601,13 @@ class PaddleClas(object): raise InputModelError(err) return "custom", inference_model_dir else: - err = f"Please specify the model name supported by PaddleClas or directory contained model files(inference.pdmodel, inference.pdiparams)." + err = "Please specify the model name supported by PaddleClas or directory contained model files(inference.pdmodel, inference.pdiparams)." raise InputModelError(err) return None - def predict(self, input_data: Union[str, np.array], - print_pred: bool=False) -> Generator[list, None, None]: + def predict_cls(self, + input_data: Union[str, np.array], + print_pred: bool=False) -> Generator[list, None, None]: """Predict input_data. Args: @@ -538,7 +627,7 @@ class PaddleClas(object): """ if isinstance(input_data, np.ndarray): - yield self.cls_predictor.predict(input_data) + yield self.predictor.predict(input_data) elif isinstance(input_data, str): if input_data.startswith("http") or input_data.startswith("https"): image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR) @@ -570,7 +659,7 @@ class PaddleClas(object): cnt += 1 if cnt % batch_size == 0 or (idx_img + 1) == len(image_list): - preds = self.cls_predictor.predict(img_list) + preds = self.predictor.predict(img_list) if preds: for idx_pred, pred in enumerate(preds): @@ -587,6 +676,77 @@ class PaddleClas(object): raise ImageTypeError(err) return + def predict_shitu(self, + input_data: Union[str, np.array], + print_pred: bool=False) -> Generator[list, None, None]: + """Predict input_data. + Args: + input_data (Union[str, np.array]): + When the type is str, it is the path of image, or the directory containing images, or the URL of image from Internet. + When the type is np.array, it is the image data whose channel order is RGB. + print_pred (bool, optional): Whether print the prediction result. Defaults to False. + + Raises: + ImageTypeError: Illegal input_data. + + Yields: + Generator[list, None, None]: + The prediction result(s) of input_data by batch_size. For every one image, + prediction result(s) is zipped as a dict, that includs topk "class_ids", "scores" and "label_names". + The format of batch prediction result(s) is as follow: [{"class_ids": [...], "scores": [...], "label_names": [...]}, ...] + """ + if isinstance(input_data, np.ndarray): + yield self.predictor.predict(input_data) + elif isinstance(input_data, str): + if input_data.startswith("http") or input_data.startswith("https"): + image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR) + if not os.path.exists(image_storage_dir()): + os.makedirs(image_storage_dir()) + image_save_path = image_storage_dir("tmp.jpg") + download_with_progressbar(input_data, image_save_path) + logger.info( + f"Image to be predicted from Internet: {input_data}, has been saved to: {image_save_path}" + ) + input_data = image_save_path + image_list = get_image_list(input_data) + + cnt = 0 + for idx_img, img_path in enumerate(image_list): + img = cv2.imread(img_path) + if img is None: + logger.warning( + f"Image file failed to read and has been skipped. The path: {img_path}" + ) + continue + img = img[:, :, ::-1] + cnt += 1 + + preds = self.predictor.predict( + img) # [dict1, dict2, ..., dictn] + if preds: + if print_pred: + logger.info(f"{preds}, filename: {img_path}") + + yield preds + else: + err = "Please input legal image! The type of image supported by PaddleClas are: NumPy.ndarray and string of local path or Ineternet URL" + raise ImageTypeError(err) + return + + def predict(self, + input_data: Union[str, np.array], + print_pred: bool=False, + predict_type="cls"): + if predict_type == "cls": + return self.predict_cls(input_data, print_pred) + elif predict_type == "shitu": + assert not isinstance(input_data, ( + list, tuple + )), "PP-ShiTu predictor only support single image as input now." + return self.predict_shitu(input_data, print_pred) + else: + raise ModuleNotFoundError + # for CLI def main(): @@ -595,7 +755,10 @@ def main(): print_info() cfg = args_cfg() clas_engine = PaddleClas(**cfg) - res = clas_engine.predict(cfg["infer_imgs"], print_pred=True) + res = clas_engine.predict( + cfg["infer_imgs"], + print_pred=True, + predict_type="cls" if "PP-ShiTu" not in cfg["model_name"] else "shitu") for _ in res: pass logger.info("Predict complete!") diff --git a/ppcls/arch/backbone/__init__.py b/ppcls/arch/backbone/__init__.py index 545725f71c23cfb0fa7198dd121fd1ff865fc760..49d47bb7e522ca782087807d31e736e087189441 100644 --- a/ppcls/arch/backbone/__init__.py +++ b/ppcls/arch/backbone/__init__.py @@ -69,10 +69,12 @@ from .model_zoo.repvgg import RepVGG_A0, RepVGG_A1, RepVGG_A2, RepVGG_B0, RepVGG from .model_zoo.van import VAN_tiny from .model_zoo.peleenet import PeleeNet from .model_zoo.convnext import ConvNeXt_tiny +from .model_zoo.cae import cae_base_patch16_224, cae_large_patch16_224 from .variant_models.resnet_variant import ResNet50_last_stage_stride1 from .variant_models.vgg_variant import VGG19Sigmoid from .variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh +from .variant_models.pp_lcnetv2_variant import PPLCNetV2_base_ShiTu from .model_zoo.adaface_ir_net import AdaFace_IR_18, AdaFace_IR_34, AdaFace_IR_50, AdaFace_IR_101, AdaFace_IR_152, AdaFace_IR_SE_50, AdaFace_IR_SE_101, AdaFace_IR_SE_152, AdaFace_IR_SE_200 diff --git a/ppcls/arch/backbone/base/theseus_layer.py b/ppcls/arch/backbone/base/theseus_layer.py index a533cdc77bd1a124030fde66184144bdd3025f37..616e728943d81c0e2b0a404906ad691a176f9c51 100644 --- a/ppcls/arch/backbone/base/theseus_layer.py +++ b/ppcls/arch/backbone/base/theseus_layer.py @@ -103,7 +103,7 @@ class TheseusLayer(nn.Layer): return new_layer net = paddleclas.MobileNetV1() - res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) + res = net.upgrade_sublayer(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func) print(res) # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer} """ @@ -117,18 +117,26 @@ class TheseusLayer(nn.Layer): layer_list = parse_pattern_str(pattern=pattern, parent_layer=self) if not layer_list: continue + sub_layer_parent = layer_list[-2]["layer"] if len( layer_list) > 1 else self - sub_layer = layer_list[-1]["layer"] sub_layer_name = layer_list[-1]["name"] - sub_layer_index = layer_list[-1]["index"] + sub_layer_index_list = layer_list[-1]["index_list"] new_sub_layer = handle_func(sub_layer, pattern) - if sub_layer_index: - getattr(sub_layer_parent, - sub_layer_name)[sub_layer_index] = new_sub_layer + if sub_layer_index_list: + if len(sub_layer_index_list) > 1: + sub_layer_parent = getattr( + sub_layer_parent, + sub_layer_name)[sub_layer_index_list[0]] + for sub_layer_index in sub_layer_index_list[1:-1]: + sub_layer_parent = sub_layer_parent[sub_layer_index] + sub_layer_parent[sub_layer_index_list[-1]] = new_sub_layer + else: + getattr(sub_layer_parent, sub_layer_name)[ + sub_layer_index_list[0]] = new_sub_layer else: setattr(sub_layer_parent, sub_layer_name, new_sub_layer) @@ -151,8 +159,8 @@ class TheseusLayer(nn.Layer): parent_layer = self for layer_dict in layer_list: - name, index = layer_dict["name"], layer_dict["index"] - if not set_identity(parent_layer, name, index): + name, index_list = layer_dict["name"], layer_dict["index_list"] + if not set_identity(parent_layer, name, index_list): msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'." logger.warning(msg) return False @@ -208,13 +216,13 @@ def save_sub_res_hook(layer, input, output): def set_identity(parent_layer: nn.Layer, layer_name: str, - layer_index: str=None) -> bool: - """set the layer specified by layer_name and layer_index to Indentity. + layer_index_list: str=None) -> bool: + """set the layer specified by layer_name and layer_index_list to Indentity. Args: - parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index. + parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index_list. layer_name (str): The name of target layer to be set to Indentity. - layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. + layer_index_list (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None. Returns: bool: True if successfully, False otherwise. @@ -228,16 +236,19 @@ def set_identity(parent_layer: nn.Layer, if sub_layer_name == layer_name: stop_after = True - if layer_index and stop_after: - stop_after = False - for sub_layer_index in parent_layer._sub_layers[ - layer_name]._sub_layers: - if stop_after: - parent_layer._sub_layers[layer_name][ - sub_layer_index] = Identity() - continue - if layer_index == sub_layer_index: - stop_after = True + if layer_index_list and stop_after: + layer_container = parent_layer._sub_layers[layer_name] + for num, layer_index in enumerate(layer_index_list): + stop_after = False + for i in range(num): + layer_container = layer_container[layer_index_list[i]] + for sub_layer_index in layer_container._sub_layers: + if stop_after: + parent_layer._sub_layers[layer_name][ + sub_layer_index] = Identity() + continue + if layer_index == sub_layer_index: + stop_after = True return stop_after @@ -269,10 +280,12 @@ def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[ while len(pattern_list) > 0: if '[' in pattern_list[0]: target_layer_name = pattern_list[0].split('[')[0] - target_layer_index = pattern_list[0].split('[')[1].split(']')[0] + target_layer_index_list = list( + index.split(']')[0] + for index in pattern_list[0].split('[')[1:]) else: target_layer_name = pattern_list[0] - target_layer_index = None + target_layer_index_list = None target_layer = getattr(parent_layer, target_layer_name, None) @@ -281,21 +294,22 @@ def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[ logger.warning(msg) return None - if target_layer_index and target_layer: - if int(target_layer_index) < 0 or int(target_layer_index) >= len( - target_layer): - msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." - logger.warning(msg) - return None - - target_layer = target_layer[target_layer_index] + if target_layer_index_list: + for target_layer_index in target_layer_index_list: + if int(target_layer_index) < 0 or int( + target_layer_index) >= len(target_layer): + msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0." + logger.warning(msg) + return None + target_layer = target_layer[target_layer_index] layer_list.append({ "layer": target_layer, "name": target_layer_name, - "index": target_layer_index + "index_list": target_layer_index_list }) pattern_list = pattern_list[1:] parent_layer = target_layer + return layer_list diff --git a/ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py b/ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py index 40264092a47deb1e11ed11d2edbda7135f0b5a75..ea24489c16c9d1281b3555546c5786a2168a8a38 100644 --- a/ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py +++ b/ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py @@ -126,6 +126,8 @@ class RepDepthwiseSeparable(TheseusLayer): use_se=False, use_shortcut=False): super().__init__() + self.in_channels = in_channels + self.out_channels = out_channels self.is_repped = False self.dw_size = dw_size @@ -306,8 +308,8 @@ class PPLCNetV2(TheseusLayer): self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer") self.flatten = nn.Flatten(start_axis=1, stop_axis=-1) - in_features = self.class_expand if self.use_last_conv else NET_CONFIG[ - "stage4"][0] * 2 * scale + in_features = self.class_expand if self.use_last_conv else make_divisible( + NET_CONFIG["stage4"][0] * 2 * scale) self.fc = Linear(in_features, class_num) def forward(self, x): diff --git a/ppcls/arch/backbone/model_zoo/cae.py b/ppcls/arch/backbone/model_zoo/cae.py new file mode 100644 index 0000000000000000000000000000000000000000..b6262e9f6418aca93e4c486daf1ae64c36537748 --- /dev/null +++ b/ppcls/arch/backbone/model_zoo/cae.py @@ -0,0 +1,860 @@ +# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Code was heavily based on https://github.com/PaddlePaddle/VIMER/blob/main/CAE/models/modeling_finetune.py +# reference: https://arxiv.org/abs/2202.03026 + +import collections +from itertools import repeat +import math +import numpy as np +from functools import partial + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from ....utils.download import get_weights_path_from_url + +MODEL_URLS = { + "cae_base_patch16_224": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/cae_base_patch16_224_pretrained.pdparams", + "cae_large_patch16_224": + "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/cae_large_patch16_224_pretrained.pdparams" +} + +__all__ = list(MODEL_URLS.keys()) + + +def _ntuple(n): + def parse(x): + if isinstance(x, collections.abc.Iterable): + return x + return tuple(repeat(x, n)) + + return parse + + +def trunc_normal_(tensor, mean=0., std=1.): + nn.initializer.TruncatedNormal(mean=mean, std=std)(tensor) + + +def drop_path(x, drop_prob: float=0., training: bool=False): + """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). + + This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, + the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... + See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for + changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use + 'survival rate' as the argument. + + """ + if drop_prob == 0. or not training: + return x + keep_prob = 1 - drop_prob + shape = (x.shape[0], ) + (1, ) * ( + x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets + random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype) + random_tensor.floor_() # binarize + output = x / keep_prob * random_tensor + return output + + +class DropPath(nn.Layer): + """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). + """ + + def __init__(self, drop_prob=None): + super(DropPath, self).__init__() + self.drop_prob = drop_prob + + def forward(self, x): + return drop_path(x, self.drop_prob, self.training) + + def extra_repr(self) -> str: + return 'p={}'.format(self.drop_prob) + + +class Mlp(nn.Layer): + def __init__(self, + in_features, + hidden_features=None, + out_features=None, + act_layer=nn.GELU, + drop=0.): + super().__init__() + out_features = out_features or in_features + hidden_features = hidden_features or in_features + self.fc1 = nn.Linear(in_features, hidden_features, bias_attr=True) + self.act = act_layer() + self.fc2 = nn.Linear(hidden_features, out_features, bias_attr=True) + self.drop = nn.Dropout(drop) + + def forward(self, x): + x = self.fc1(x) + x = self.act(x) + # x = self.drop(x) + # commit this for the orignal BERT implement + x = self.fc2(x) + x = self.drop(x) + return x + + +class Attention(nn.Layer): + def __init__(self, + dim, + num_heads=8, + qkv_bias=False, + qk_scale=None, + attn_drop=0., + proj_drop=0., + window_size=None, + attn_head_dim=None): + super().__init__() + + self.num_heads = num_heads + head_dim = dim // num_heads + if attn_head_dim is not None: + head_dim = attn_head_dim + all_head_dim = head_dim * self.num_heads + self.scale = qk_scale or head_dim**-0.5 + + self.zeros_ = nn.initializer.Constant(value=0.) + + self.qkv = nn.Linear(dim, all_head_dim * 3, bias_attr=False) + if qkv_bias: + self.q_bias = self.create_parameter( + [all_head_dim], default_initializer=self.zeros_) + self.v_bias = self.create_parameter( + [all_head_dim], default_initializer=self.zeros_) + else: + self.q_bias = None + self.v_bias = None + + if window_size: + self.window_size = window_size + self.num_relative_distance = (2 * window_size[0] - 1) * ( + 2 * window_size[1] - 1) + 3 + self.relative_position_bias_table = self.create_parameter( + [self.num_relative_distance, num_heads], + default_initializer=self.zeros_) # 2*Wh-1 * 2*Ww-1, nH + # cls to token & token 2 cls & cls to cls + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(window_size[0]) + coords_w = paddle.arange(window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww + relative_coords = coords_flatten[:, :, + None] - coords_flatten[:, + None, :] # 2, Wh*Ww, Wh*Ww + relative_coords = relative_coords.transpose( + [1, 2, 0]) # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += window_size[ + 0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * window_size[1] - 1 + relative_position_index = \ + paddle.zeros((window_size[0] * window_size[1] + 1, ) * 2, dtype=relative_coords.dtype) + relative_position_index[1:, 1:] = relative_coords.sum( + -1) # Wh*Ww, Wh*Ww + relative_position_index[0, 0:] = self.num_relative_distance - 3 + relative_position_index[0:, 0] = self.num_relative_distance - 2 + relative_position_index[0, 0] = self.num_relative_distance - 1 + + self.register_buffer("relative_position_index", + relative_position_index) + else: + self.window_size = None + self.relative_position_bias_table = None + self.relative_position_index = None + + self.attn_drop = nn.Dropout(attn_drop) + self.proj = nn.Linear(all_head_dim, dim, bias_attr=True) + self.proj_drop = nn.Dropout(proj_drop) + + def forward(self, x, rel_pos_bias=None): + B, N, C = x.shape + qkv_bias = None + if self.q_bias is not None: + k_bias = paddle.zeros_like(self.v_bias) + k_bias.stop_gradient = True + qkv_bias = paddle.concat((self.q_bias, k_bias, self.v_bias)) + # qkv = self.qkv(x).reshape([B, N, 3, self.num_heads, C // self.num_heads]).transpose([2, 0, 3, 1, 4]) + qkv = F.linear(x=x, weight=self.qkv.weight, bias=qkv_bias) + qkv = qkv.reshape([B, N, 3, self.num_heads, -1]).transpose( + [2, 0, 3, 1, 4]) + q, k, v = qkv[0], qkv[1], qkv[ + 2] # make torchscript happy (cannot use tensor as tuple) + + q = q * self.scale + attn = (q @k.transpose([0, 1, 3, 2])) + + if self.relative_position_bias_table is not None: + relative_position_bias = \ + self.relative_position_bias_table[self.relative_position_index.reshape([-1])].reshape([ + self.window_size[0] * self.window_size[1] + 1, + self.window_size[0] * self.window_size[1] + 1, -1]) # Wh*Ww,Wh*Ww,nH + relative_position_bias = relative_position_bias.transpose( + [2, 0, 1]) # nH, Wh*Ww, Wh*Ww + attn = attn + relative_position_bias.unsqueeze(0) + + if rel_pos_bias is not None: + attn = attn + rel_pos_bias + + attn = F.softmax(attn, axis=-1) + attn = self.attn_drop(attn) + + x = (attn @v).transpose([0, 2, 1, 3]).reshape([B, N, -1]) + x = self.proj(x) + x = self.proj_drop(x) + return x + + +class Block(nn.Layer): + def __init__(self, + dim, + num_heads, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop=0., + attn_drop=0., + drop_path=0., + init_values=None, + act_layer=nn.GELU, + norm_layer=nn.LayerNorm, + window_size=None, + attn_head_dim=None): + super().__init__() + self.norm1 = norm_layer(dim) + self.attn = Attention( + dim, + num_heads=num_heads, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + attn_drop=attn_drop, + proj_drop=drop, + window_size=window_size, + attn_head_dim=attn_head_dim) + # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here + self.drop_path = DropPath( + drop_path) if drop_path > 0. else nn.Identity() + self.norm2 = norm_layer(dim) + mlp_hidden_dim = int(dim * mlp_ratio) + self.mlp = Mlp(in_features=dim, + hidden_features=mlp_hidden_dim, + act_layer=act_layer, + drop=drop) + + if init_values > 0: + self.gamma_1 = self.create_parameter( + [dim], + default_initializer=nn.initializer.Constant(value=init_values)) + self.gamma_2 = self.create_parameter( + [dim], + default_initializer=nn.initializer.Constant(value=init_values)) + else: + self.gamma_1, self.gamma_2 = None, None + + def forward(self, x, rel_pos_bias=None): + if self.gamma_1 is None: + x = x + self.drop_path( + self.attn( + self.norm1(x), rel_pos_bias=rel_pos_bias)) + x = x + self.drop_path(self.mlp(self.norm2(x))) + else: + x = x + self.drop_path(self.gamma_1 * self.attn( + self.norm1(x), rel_pos_bias=rel_pos_bias)) + x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x))) + return x + + +class PatchEmbed(nn.Layer): + """ Image to Patch Embedding + """ + + def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768): + super().__init__() + to_2tuple = _ntuple(2) + img_size = to_2tuple(img_size) + patch_size = to_2tuple(patch_size) + num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // + patch_size[0]) + self.patch_shape = (img_size[0] // patch_size[0], + img_size[1] // patch_size[1]) + self.img_size = img_size + self.patch_size = patch_size + self.num_patches = num_patches + self.in_chans = in_chans + self.out_chans = embed_dim + self.proj = nn.Conv2D( + in_chans, + embed_dim, + kernel_size=patch_size, + stride=patch_size, + bias_attr=True) + + def forward(self, x, **kwargs): + B, C, H, W = x.shape + # FIXME look at relaxing size constraints + assert H == self.img_size[0] and W == self.img_size[1], \ + f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})." + x = self.proj(x).flatten(2).transpose([0, 2, 1]) + return x + + def _init_weights(self): + fan_out = self.out_chans + fan_in = self.patch_size[0] * self.patch_size[1] * self.in_chans + weight_attr = paddle.ParamAttr( + initializer=nn.initializer.XavierUniform(fan_in, fan_out)) # MAE + bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0)) + return weight_attr, bias_attr + + +class RelativePositionBias(nn.Layer): + def __init__(self, window_size, num_heads): + super().__init__() + self.window_size = window_size + self.num_relative_distance = (2 * window_size[0] - 1) * ( + 2 * window_size[1] - 1) + 3 + self.zeros_ = nn.initializer.Constant(value=0.) + self.relative_position_bias_table = self.create_parameter( + [self.num_relative_distance, num_heads], + default_initializer=self.zeros_) # 2*Wh-1 * 2*Ww-1, nH + # cls to token & token 2 cls & cls to cls + + # get pair-wise relative position index for each token inside the window + coords_h = paddle.arange(window_size[0]) + coords_w = paddle.arange(window_size[1]) + coords = paddle.stack(paddle.meshgrid( + [coords_h, coords_w])) # 2, Wh, Ww + coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww + relative_coords = coords_flatten[:, :, + None] - coords_flatten[:, + None, :] # 2, Wh*Ww, Wh*Ww + relative_coords = relative_coords.transpose( + [1, 2, 0]) # Wh*Ww, Wh*Ww, 2 + relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0 + relative_coords[:, :, 1] += window_size[1] - 1 + relative_coords[:, :, 0] *= 2 * window_size[1] - 1 + relative_position_index = \ + paddle.zeros((window_size[0] * window_size[1] + 1,) * 2, dtype=relative_coords.dtype) + relative_position_index[1:, 1:] = relative_coords.sum( + -1) # Wh*Ww, Wh*Ww + relative_position_index[0, 0:] = self.num_relative_distance - 3 + relative_position_index[0:, 0] = self.num_relative_distance - 2 + relative_position_index[0, 0] = self.num_relative_distance - 1 + + self.register_buffer("relative_position_index", + relative_position_index) + + def forward(self): + relative_position_bias = \ + self.relative_position_bias_table[self.relative_position_index.reshape([-1])].reshape([ + self.window_size[0] * self.window_size[1] + 1, + self.window_size[0] * self.window_size[1] + 1, -1]) # Wh*Ww,Wh*Ww,nH + return relative_position_bias.transpose([2, 0, 1]) # nH, Wh*Ww, Wh*Ww + + +def get_sinusoid_encoding_table(n_position, d_hid, token=False): + ''' Sinusoid position encoding table ''' + + def get_position_angle_vec(position): + return [ + position / np.power(10000, 2 * (hid_j // 2) / d_hid) + for hid_j in range(d_hid) + ] + + sinusoid_table = np.array( + [get_position_angle_vec(pos_i) for pos_i in range(n_position)]) + sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i + sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 + + if token: + sinusoid_table = np.concatenate( + [sinusoid_table, np.zeros([1, d_hid])], dim=0) + + return paddle.to_tensor(sinusoid_table).unsqueeze(0) + + +class VisionTransformer(nn.Layer): + """ Vision Transformer with support for patch or hybrid CNN input stage + """ + + def __init__(self, + img_size=224, + patch_size=16, + in_chans=3, + class_num=1000, + embed_dim=768, + depth=12, + num_heads=12, + mlp_ratio=4., + qkv_bias=False, + qk_scale=None, + drop_rate=0., + attn_drop_rate=0., + drop_path_rate=0., + norm_layer=nn.LayerNorm, + init_values=None, + use_abs_pos_emb=True, + use_rel_pos_bias=False, + use_shared_rel_pos_bias=False, + use_mean_pooling=True, + init_scale=0.001, + lin_probe=False, + sin_pos_emb=True, + args=None): + super().__init__() + self.class_num = class_num + self.num_features = self.embed_dim = embed_dim # num_features for consistency with other models + self.use_mean_pooling = use_mean_pooling + + self.patch_embed = PatchEmbed( + img_size=img_size, + patch_size=patch_size, + in_chans=in_chans, + embed_dim=embed_dim) + num_patches = self.patch_embed.num_patches + + self.zeros_ = nn.initializer.Constant(value=0.) + self.ones_ = nn.initializer.Constant(value=1.) + + self.cls_token = self.create_parameter( + [1, 1, embed_dim], default_initializer=self.zeros_) + + self.use_abs_pos_emb = use_abs_pos_emb + if use_abs_pos_emb: + self.pos_embed = self.create_parameter( + [1, num_patches + 1, embed_dim], + default_initializer=self.zeros_) + elif sin_pos_emb: + # sine-cosine positional embeddings is on the way + self.pos_embed = self.create_parameter( + [1, num_patches + 1, embed_dim], + default_initializer=self.zeros_) + self.pos_embed.set_value( + self.build_2d_sincos_position_embedding(embed_dim)) + self.pos_embed.stop_gradient = True # fixed sin-cos embedding + else: + self.pos_embed = None + + self.pos_drop = nn.Dropout(p=drop_rate) + + if use_shared_rel_pos_bias: + self.rel_pos_bias = RelativePositionBias( + window_size=self.patch_embed.patch_shape, num_heads=num_heads) + else: + self.rel_pos_bias = None + + dpr = [x.item() for x in paddle.linspace(0, drop_path_rate, depth) + ] # stochastic depth decay rule + self.use_rel_pos_bias = use_rel_pos_bias + self.blocks = nn.LayerList([ + Block( + dim=embed_dim, + num_heads=num_heads, + mlp_ratio=mlp_ratio, + qkv_bias=qkv_bias, + qk_scale=qk_scale, + drop=drop_rate, + attn_drop=attn_drop_rate, + drop_path=dpr[i], + norm_layer=norm_layer, + init_values=init_values, + window_size=self.patch_embed.patch_shape + if use_rel_pos_bias else None) for i in range(depth) + ]) + self.norm = nn.Identity() if use_mean_pooling else norm_layer( + embed_dim) + + self.lin_probe = lin_probe + # NOTE: batch norm + if lin_probe: + # TODO + from models.lincls_bn import LP_BatchNorm + self.fc_norm = LP_BatchNorm(embed_dim, affine=False) + else: + if use_mean_pooling: + self.fc_norm = norm_layer(embed_dim) + else: + self.fc_norm = None + self.head = nn.Linear(embed_dim, + class_num) if class_num > 0 else nn.Identity() + + if self.pos_embed is not None and use_abs_pos_emb: + trunc_normal_(self.pos_embed, std=.02) + trunc_normal_(self.cls_token, std=.02) + # trunc_normal_(self.mask_token, std=.02) + trunc_normal_(self.head.weight, std=.02) + self.apply(self._init_weights) + self.fix_init_weight() + + self.head.weight.set_value(self.head.weight * init_scale) + self.head.bias.set_value(self.head.bias * init_scale) + + def build_2d_sincos_position_embedding(self, + embed_dim=768, + temperature=10000.): + h, w = self.patch_embed.patch_shape + grid_w = paddle.arange(w, dtype=paddle.float32) + grid_h = paddle.arange(h, dtype=paddle.float32) + grid_w, grid_h = paddle.meshgrid(grid_w, grid_h) + assert embed_dim % 4 == 0, 'Embed dimension must be divisible by 4 for 2D sin-cos position embedding' + pos_dim = embed_dim // 4 + omega = paddle.arange(pos_dim, dtype=paddle.float32) / pos_dim + omega = 1. / (temperature**omega) + out_w = paddle.einsum('m,d->md', grid_w.flatten(), omega) + out_h = paddle.einsum('m,d->md', grid_h.flatten(), omega) + pos_emb = paddle.concat( + [ + paddle.sin(out_w), paddle.cos(out_w), paddle.sin(out_h), + paddle.cos(out_h) + ], + axis=1)[None, :, :] + + # if not self.use_mean_pooling: + pe_token = paddle.zeros([1, 1, embed_dim], dtype=paddle.float32) + pos_emb = paddle.concat([pe_token, pos_emb], axis=1) + return pos_emb + + def fix_init_weight(self): + def rescale(param, layer_id): + param.set_value(param / math.sqrt(2.0 * layer_id)) + + for layer_id, layer in enumerate(self.blocks): + rescale(layer.attn.proj.weight, layer_id + 1) + rescale(layer.mlp.fc2.weight, layer_id + 1) + + def _init_weights(self, m): + if isinstance(m, nn.Linear): + trunc_normal_(m.weight, std=.02) + if isinstance(m, nn.Linear) and m.bias is not None: + self.zeros_(m.bias) + elif isinstance(m, nn.LayerNorm): + self.zeros_(m.bias) + self.ones_(m.weight) + + def get_num_layers(self): + return len(self.blocks) + + def no_weight_decay(self): + return {'pos_embed', 'cls_token'} + + def get_classifier(self): + return self.head + + def reset_classifier(self, class_num, global_pool=''): + self.class_num = class_num + self.head = nn.Linear(self.embed_dim, + class_num) if class_num > 0 else nn.Identity() + + def forward_features(self, x, is_train=True): + x = self.patch_embed(x) + batch_size, seq_len, _ = x.shape + + cls_tokens = self.cls_token.expand( + [batch_size, -1, + -1]) # stole cls_tokens impl from Phil Wang, thanks + x = paddle.concat((cls_tokens, x), axis=1) + if self.pos_embed is not None: + if self.use_abs_pos_emb: + x = x + self.pos_embed.expand( + [batch_size, -1, -1]).astype(x.dtype).clone().detach() + else: + x = x + self.pos_embed.expand( + [batch_size, -1, -1]).astype(x.dtype).clone().detach() + + x = self.pos_drop(x) + + rel_pos_bias = self.rel_pos_bias( + ) if self.rel_pos_bias is not None else None + for blk in self.blocks: + x = blk(x, rel_pos_bias=rel_pos_bias) + + x = self.norm(x) + if self.fc_norm is not None: + t = x[:, 1:, :] + if self.lin_probe: + if self.use_mean_pooling: + return self.fc_norm(t.mean(1), is_train=is_train) + else: + return self.fc_norm(x[:, 0], is_train=is_train) + else: + return self.fc_norm(t.mean(1)) + + else: + return x[:, 0] + + def forward(self, x, is_train=True): + x = self.forward_features(x, is_train) + x = self.head(x) + return x + + +def _enable_linear_eval(model): + zeros_ = nn.initializer.Constant(value=0.) + normal_ = nn.initializer.Normal(mean=0.0, std=0.01) + linear_keyword = 'head' + head_norm = 'fc_norm' + requires_grad = [] + for name, param in model.named_parameters(): + if name not in [ + '%s.weight' % linear_keyword, '%s.bias' % linear_keyword + ] and head_norm not in name: + param.stop_gradient = True + else: + requires_grad.append(name) + # init the fc layer + normal_(getattr(model, linear_keyword).weight) + zeros_(getattr(model, linear_keyword).bias) + + return + + +def _load_pretrained(pretrained, + pretrained_url, + model, + model_keys, + model_ema_configs, + abs_pos_emb, + rel_pos_bias, + use_ssld=False): + if pretrained is False: + pass + elif pretrained is True: + local_weight_path = get_weights_path_from_url(pretrained_url).replace( + ".pdparams", "") + checkpoint = paddle.load(local_weight_path + ".pdparams") + elif isinstance(pretrained, str): + checkpoint = paddle.load(local_weight_path + ".pdparams") + + checkpoint_model = None + for model_key in model_keys.split('|'): + if model_key in checkpoint: + checkpoint_model = checkpoint[model_key] + break + + if checkpoint_model is None: + checkpoint_model = checkpoint + state_dict = model.state_dict() + all_keys = list(checkpoint_model.keys()) + # NOTE: remove all decoder keys + all_keys = [key for key in all_keys if key.startswith('encoder.')] + for key in all_keys: + new_key = key.replace('encoder.', '') + checkpoint_model[new_key] = checkpoint_model[key] + checkpoint_model.pop(key) + + for key in list(checkpoint_model.keys()): + if key.startswith('regressor_and_decoder.'): + checkpoint_model.pop(key) + if key.startswith('teacher_network.'): + checkpoint_model.pop(key) + + # NOTE: replace norm with fc_norm + for key in list(checkpoint_model.keys()): + if key.startswith('norm.'): + new_key = key.replace('norm.', 'fc_norm.') + checkpoint_model[new_key] = checkpoint_model[key] + checkpoint_model.pop(key) + + for k in ['head.weight', 'head.bias']: + if k in checkpoint_model and checkpoint_model[k].shape != state_dict[ + k].shape: + del checkpoint_model[k] + + if model.use_rel_pos_bias and "rel_pos_bias.relative_position_bias_table" in checkpoint_model: + num_layers = model.get_num_layers() + rel_pos_bias = checkpoint_model[ + "rel_pos_bias.relative_position_bias_table"] + for i in range(num_layers): + checkpoint_model["blocks.%d.attn.relative_position_bias_table" % + i] = rel_pos_bias.clone() + + checkpoint_model.pop("rel_pos_bias.relative_position_bias_table") + + all_keys = list(checkpoint_model.keys()) + + for key in all_keys: + if "relative_position_index" in key: + checkpoint_model.pop(key) + + if "relative_position_bias_table" in key and rel_pos_bias: + rel_pos_bias = checkpoint_model[key] + src_num_pos, num_attn_heads = rel_pos_bias.size() + dst_num_pos, _ = model.state_dict()[key].size() + dst_patch_shape = model.patch_embed.patch_shape + if dst_patch_shape[0] != dst_patch_shape[1]: + raise NotImplementedError() + num_extra_tokens = dst_num_pos - (dst_patch_shape[0] * 2 - 1) * ( + dst_patch_shape[1] * 2 - 1) + src_size = int((src_num_pos - num_extra_tokens)**0.5) + dst_size = int((dst_num_pos - num_extra_tokens)**0.5) + if src_size != dst_size: + extra_tokens = rel_pos_bias[-num_extra_tokens:, :] + rel_pos_bias = rel_pos_bias[:-num_extra_tokens, :] + + def geometric_progression(a, r, n): + return a * (1.0 - r**n) / (1.0 - r) + + left, right = 1.01, 1.5 + while right - left > 1e-6: + q = (left + right) / 2.0 + gp = geometric_progression(1, q, src_size // 2) + if gp > dst_size // 2: + right = q + else: + left = q + + dis = [] + cur = 1 + for i in range(src_size // 2): + dis.append(cur) + cur += q**(i + 1) + + r_ids = [-_ for _ in reversed(dis)] + + x = r_ids + [0] + dis + y = r_ids + [0] + dis + + t = dst_size // 2.0 + dx = np.arange(-t, t + 0.1, 1.0) + dy = np.arange(-t, t + 0.1, 1.0) + + all_rel_pos_bias = [] + + for i in range(num_attn_heads): + z = rel_pos_bias[:, i].view(src_size, + src_size).float().numpy() + f = interpolate.interp2d(x, y, z, kind='cubic') + all_rel_pos_bias.append( + paddle.Tensor(f(dx, dy)).contiguous().view(-1, 1).to( + rel_pos_bias.device)) + + rel_pos_bias = paddle.concat(all_rel_pos_bias, axis=-1) + + new_rel_pos_bias = paddle.concat( + (rel_pos_bias, extra_tokens), axis=0) + checkpoint_model[key] = new_rel_pos_bias + + # interpolate position embedding + if 'pos_embed' in checkpoint_model and abs_pos_emb: + pos_embed_checkpoint = checkpoint_model['pos_embed'] + embedding_size = pos_embed_checkpoint.shape[-1] + num_patches = model.patch_embed.num_patches + num_extra_tokens = model.pos_embed.shape[-2] - num_patches + # height (== width) for the checkpoint position embedding + orig_size = int((pos_embed_checkpoint.shape[-2] - num_extra_tokens)** + 0.5) + # height (== width) for the new position embedding + new_size = int(num_patches**0.5) + # class_token and dist_token are kept unchanged + if orig_size != new_size: + extra_tokens = pos_embed_checkpoint[:, :num_extra_tokens] + # only the position tokens are interpolated + pos_tokens = pos_embed_checkpoint[:, num_extra_tokens:] + pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, + embedding_size).permute(0, 3, 1, 2) + pos_tokens = paddle.nn.functional.interpolate( + pos_tokens, + size=(new_size, new_size), + mode='bicubic', + align_corners=False) + pos_tokens = pos_tokens.permute(0, 2, 3, 1).flatten(1, 2) + new_pos_embed = paddle.concat((extra_tokens, pos_tokens), axis=1) + checkpoint_model['pos_embed'] = new_pos_embed + msg = model.set_state_dict(checkpoint_model) + + model_without_ddp = model + n_parameters = sum(p.numel() for p in model.parameters() + if not p.stop_gradient).item() + + return + + +def cae_base_patch16_224(pretrained=True, use_ssld=False, **kwargs): + config = kwargs.copy() + enable_linear_eval = config.pop('enable_linear_eval') + model_keys = config.pop('model_key') + model_ema_configs = config.pop('model_ema') + abs_pos_emb = config.pop('abs_pos_emb') + rel_pos_bias = config.pop('rel_pos_bias') + if pretrained in config: + pretrained = config.pop('pretrained') + + model = VisionTransformer( + patch_size=16, + embed_dim=768, + depth=12, + num_heads=12, + mlp_ratio=4, + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + **config) + + if enable_linear_eval: + _enable_linear_eval(model) + + _load_pretrained( + pretrained, + MODEL_URLS["cae_base_patch16_224"], + model, + model_keys, + model_ema_configs, + abs_pos_emb, + rel_pos_bias, + use_ssld=False) + + return model + + +def cae_large_patch16_224(pretrained=True, use_ssld=False, **kwargs): + config = kwargs.copy() + enable_linear_eval = config.pop('enable_linear_eval') + model_keys = config.pop('model_key') + model_ema_configs = config.pop('model_ema') + abs_pos_emb = config.pop('abs_pos_emb') + rel_pos_bias = config.pop('rel_pos_bias') + if pretrained in config: + pretrained = config.pop('pretrained') + + model = VisionTransformer( + patch_size=16, + embed_dim=1024, + depth=24, + num_heads=16, + mlp_ratio=4, + qkv_bias=True, + norm_layer=partial( + nn.LayerNorm, epsilon=1e-6), + **config) + + if enable_linear_eval: + _enable_linear_eval(model) + + _load_pretrained( + pretrained, + MODEL_URLS["cae_large_patch16_224"], + model, + model_keys, + model_ema_configs, + abs_pos_emb, + rel_pos_bias, + use_ssld=False) + + return model diff --git a/ppcls/arch/backbone/variant_models/__init__.py b/ppcls/arch/backbone/variant_models/__init__.py index 75cf29ffa9c59b744972a9e82fba7a506219e83b..d2fcd0bdd9b83c6e87a6a0684382c380e5fff93a 100644 --- a/ppcls/arch/backbone/variant_models/__init__.py +++ b/ppcls/arch/backbone/variant_models/__init__.py @@ -1,3 +1,4 @@ from .resnet_variant import ResNet50_last_stage_stride1 from .vgg_variant import VGG19Sigmoid from .pp_lcnet_variant import PPLCNet_x2_5_Tanh +from .pp_lcnetv2_variant import PPLCNetV2_base_ShiTu diff --git a/ppcls/arch/backbone/variant_models/pp_lcnetv2_variant.py b/ppcls/arch/backbone/variant_models/pp_lcnetv2_variant.py new file mode 100644 index 0000000000000000000000000000000000000000..6acccdc8e5c115cf4e1e6b213ab3ea3ffcc710b3 --- /dev/null +++ b/ppcls/arch/backbone/variant_models/pp_lcnetv2_variant.py @@ -0,0 +1,56 @@ +from paddle.nn import Conv2D, Identity + +from ..legendary_models.pp_lcnet_v2 import MODEL_URLS, PPLCNetV2_base, RepDepthwiseSeparable, _load_pretrained + +__all__ = ["PPLCNetV2_base_ShiTu"] + + +def PPLCNetV2_base_ShiTu(pretrained=False, use_ssld=False, **kwargs): + """ + An variant network of PPLCNetV2_base + 1. remove ReLU layer after last_conv + 2. add bias to last_conv + 3. change stride to 1 in last two RepDepthwiseSeparable Block + """ + model = PPLCNetV2_base(pretrained=False, use_ssld=use_ssld, **kwargs) + + def remove_ReLU_function(conv, pattern): + new_conv = Identity() + return new_conv + + def add_bias_last_conv(conv, pattern): + new_conv = Conv2D( + in_channels=conv._in_channels, + out_channels=conv._out_channels, + kernel_size=conv._kernel_size, + stride=conv._stride, + padding=conv._padding, + groups=conv._groups, + bias_attr=True) + return new_conv + + def last_stride_function(rep_block, pattern): + new_conv = RepDepthwiseSeparable( + in_channels=rep_block.in_channels, + out_channels=rep_block.out_channels, + stride=1, + dw_size=rep_block.dw_size, + split_pw=rep_block.split_pw, + use_rep=rep_block.use_rep, + use_se=rep_block.use_se, + use_shortcut=rep_block.use_shortcut) + return new_conv + + pattern_act = ["act"] + pattern_lastconv = ["last_conv"] + pattern_last_stride = [ + "stages[3][0]", + "stages[3][1]", + ] + model.upgrade_sublayer(pattern_act, remove_ReLU_function) + model.upgrade_sublayer(pattern_lastconv, add_bias_last_conv) + model.upgrade_sublayer(pattern_last_stride, last_stride_function) + + # load params again after upgrade some layers + _load_pretrained(pretrained, model, MODEL_URLS["PPLCNetV2_base"], use_ssld) + return model diff --git a/ppcls/configs/CAE/cae_base_patch16_224_finetune.yaml b/ppcls/configs/CAE/cae_base_patch16_224_finetune.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7ec1c9c4693fce359f8bd44fdc38f50baf6465fc --- /dev/null +++ b/ppcls/configs/CAE/cae_base_patch16_224_finetune.yaml @@ -0,0 +1,167 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + save_interval: 20 + eval_during_train: True + eval_interval: 1 + epochs: 100 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: cae_base_patch16_224 + class_num: 102 + drop_rate: 0.0 + drop_path_rate: 0.1 + attn_drop_rate: 0.0 + + use_mean_pooling: True + init_scale: 0.001 + use_rel_pos_bias: True + use_abs_pos_emb: False + init_values: 0.1 + lin_probe: False + + sin_pos_emb: True + + abs_pos_emb: False + enable_linear_eval: False + model_key: model|module|state_dict + rel_pos_bias: True + model_ema: + enable_model_ema: False + model_ema_decay: 0.9999 + model_ema_force_cpu: False + pretrained: True + +# loss function config for traing/eval process +Loss: + Train: + - SoftTargetCrossEntropy: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: AdamWDL + beta1: 0.9 + beta2: 0.999 + epsilon: 1e-8 + weight_decay: 0.05 + layerwise_decay: 0.65 + lr: + name: Cosine + learning_rate: 0.001 + eta_min: 1e-6 + warmup_epoch: 10 + warmup_start_lr: 1e-6 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/flowers102/ + cls_label_path: ./dataset/flowers102/train_list.txt + batch_transform_ops: + - MixupCutmixHybrid: + mixup_alpha: 0.8 + cutmix_alpha: 1.0 + switch_prob: 0.5 + num_classes: 102 + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + interpolation: bilinear + - RandFlipImage: + flip_code: 1 + - RandAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - RandomErasing: + EPSILON: 0.5 + sl: 0.02 + sh: 0.3 + r1: 0.3 + + sampler: + name: DistributedBatchSampler + batch_size: 16 + drop_last: True + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + dataset: + name: ImageNetDataset + image_root: ./dataset/flowers102/ + cls_label_path: ./dataset/flowers102/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 16 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/CAE/cae_large_patch16_224_finetune.yaml b/ppcls/configs/CAE/cae_large_patch16_224_finetune.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f8f7edc58a1e161ecaf64e5bd6cb7cd5b62a6c40 --- /dev/null +++ b/ppcls/configs/CAE/cae_large_patch16_224_finetune.yaml @@ -0,0 +1,167 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/ + device: gpu + save_interval: 20 + eval_during_train: True + eval_interval: 1 + epochs: 100 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +# model architecture +Arch: + name: cae_large_patch16_224 + class_num: 102 + drop_rate: 0.0 + drop_path_rate: 0.2 + attn_drop_rate: 0.0 + + use_mean_pooling: True + init_scale: 0.001 + use_rel_pos_bias: True + use_abs_pos_emb: False + init_values: 0.1 + lin_probe: False + + sin_pos_emb: True + + abs_pos_emb: False + enable_linear_eval: False + model_key: model|module|state_dict + rel_pos_bias: True + model_ema: + enable_model_ema: False + model_ema_decay: 0.9999 + model_ema_force_cpu: False + pretrained: True + +# loss function config for traing/eval process +Loss: + Train: + - SoftTargetCrossEntropy: + weight: 1.0 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: AdamWDL + beta1: 0.9 + beta2: 0.999 + epsilon: 1e-8 + weight_decay: 0.05 + layerwise_decay: 0.75 + lr: + name: Cosine + learning_rate: 0.001 + eta_min: 1e-6 + warmup_epoch: 10 + warmup_start_lr: 1e-6 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/flowers102/ + cls_label_path: ./dataset/flowers102/train_list.txt + batch_transform_ops: + - MixupCutmixHybrid: + mixup_alpha: 0.8 + cutmix_alpha: 1.0 + switch_prob: 0.5 + num_classes: 102 + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + interpolation: bilinear + - RandFlipImage: + flip_code: 1 + - RandAugment: + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - RandomErasing: + EPSILON: 0.5 + sl: 0.02 + sh: 0.3 + r1: 0.3 + + sampler: + name: DistributedBatchSampler + batch_size: 16 + drop_last: True + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + dataset: + name: ImageNetDataset + image_root: ./dataset/flowers102/ + cls_label_path: ./dataset/flowers102/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 16 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.5, 0.5, 0.5] + std: [0.5, 0.5, 0.5] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: ppcls/utils/imagenet1k_label_list.txt + +Metric: + Train: + - TopkAcc: + topk: [1, 5] + Eval: + - TopkAcc: + topk: [1, 5] diff --git a/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml b/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml index 626dd7c2ee05ffac21320b948855d12442e64eff..70daa639bebad9fc6050276cf6303bb5a8b4b47a 100644 --- a/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml +++ b/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml @@ -23,7 +23,7 @@ Arch: infer_output_key: features infer_add_softmax: False - Backbone: + Backbone: name: PPLCNet_x2_5 pretrained: True use_ssld: True @@ -34,7 +34,7 @@ Arch: embedding_size: 1280 class_num: 512 Head: - name: ArcMargin + name: ArcMargin embedding_size: 512 class_num: 185341 margin: 0.2 @@ -57,10 +57,9 @@ Optimizer: learning_rate: 0.04 warmup_epoch: 5 regularizer: - name: 'L2' + name: "L2" coeff: 0.00001 - # data loader for train and eval DataLoader: Train: @@ -80,7 +79,7 @@ DataLoader: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" sampler: name: DistributedBatchSampler @@ -93,7 +92,7 @@ DataLoader: Eval: Query: - dataset: + dataset: name: VeriWild image_root: ./dataset/Aliproduct/ cls_label_path: ./dataset/Aliproduct/val_list.txt @@ -107,7 +106,7 @@ DataLoader: scale: 0.00392157 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" sampler: name: DistributedBatchSampler batch_size: 64 @@ -118,7 +117,7 @@ DataLoader: use_shared_memory: True Gallery: - dataset: + dataset: name: VeriWild image_root: ./dataset/Aliproduct/ cls_label_path: ./dataset/Aliproduct/val_list.txt @@ -132,7 +131,7 @@ DataLoader: scale: 0.00392157 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" sampler: name: DistributedBatchSampler batch_size: 64 @@ -146,3 +145,4 @@ Metric: Eval: - Recallk: topk: [1, 5] + - mAP: {} diff --git a/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml b/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml new file mode 100644 index 0000000000000000000000000000000000000000..e6dfde7cdde9b88772ac414cb0de1646daf9c304 --- /dev/null +++ b/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml @@ -0,0 +1,205 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output + device: gpu + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 100 + print_batch_step: 20 + use_visualdl: False + eval_mode: retrieval + retrieval_feature_from: features # 'backbone' or 'features' + re_ranking: False + use_dali: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: ./inference + +AMP: + scale_loss: 65536 + use_dynamic_loss_scaling: True + # O1: mixed fp16 + level: O1 + +# model architecture +Arch: + name: RecModel + infer_output_key: features + infer_add_softmax: False + + Backbone: + name: PPLCNetV2_base_ShiTu + pretrained: True + use_ssld: True + class_expand: &feat_dim 512 + BackboneStopLayer: + name: flatten + Neck: + name: BNNeck + num_features: *feat_dim + weight_attr: + initializer: + name: Constant + value: 1.0 + bias_attr: + initializer: + name: Constant + value: 0.0 + learning_rate: 1.0e-20 # NOTE: Temporarily set lr small enough to freeze the bias to zero + Head: + name: FC + embedding_size: *feat_dim + class_num: 192612 + weight_attr: + initializer: + name: Normal + std: 0.001 + bias_attr: False + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + - TripletAngularMarginLoss: + weight: 1.0 + feature_from: features + margin: 0.5 + reduction: mean + add_absolute: True + absolute_loss_weight: 0.1 + normalize_feature: True + ap_value: 0.8 + an_value: 0.4 + Eval: + - CELoss: + weight: 1.0 + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.06 # for 8gpu x 256bs + warmup_epoch: 5 + regularizer: + name: L2 + coeff: 0.00001 + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ + cls_label_path: ./dataset/train_reg_all_data_v2.txt + relabel: True + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: [224, 224] + return_numpy: False + interpolation: bilinear + backend: cv2 + - RandFlipImage: + flip_code: 1 + - Pad: + padding: 10 + backend: cv2 + - RandCropImageV2: + size: [224, 224] + - RandomRotation: + prob: 0.5 + degrees: 90 + interpolation: bilinear + - ResizeImage: + size: [224, 224] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: hwc + sampler: + name: PKSampler + batch_size: 256 + sample_per_id: 4 + drop_last: False + shuffle: True + sample_method: "id_avg_prob" + id_list: [50030, 80700, 92019, 96015] # be careful when set relabel=True + ratio: [4, 4] + loader: + num_workers: 4 + use_shared_memory: True + + Eval: + Query: + dataset: + name: VeriWild + image_root: ./dataset/Aliproduct/ + cls_label_path: ./dataset/Aliproduct/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: [224, 224] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: hwc + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + + Gallery: + dataset: + name: VeriWild + image_root: ./dataset/Aliproduct/ + cls_label_path: ./dataset/Aliproduct/val_list.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: [224, 224] + return_numpy: False + interpolation: bilinear + backend: cv2 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: hwc + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Metric: + Eval: + - Recallk: + topk: [1, 5] + - mAP: {} diff --git a/ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml b/ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7822a2bea425ac657dc9a10644aa2cdf1fd4273c --- /dev/null +++ b/ppcls/configs/ImageNet/Distillation/resnet34_distill_resnet18_wsl.yaml @@ -0,0 +1,152 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: ./output/r34_r18_wsl + device: "gpu" + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 100 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: "./inference" + +# model architecture +Arch: + name: "DistillationModel" + # if not null, its lengths should be same as models + pretrained_list: + # if not null, its lengths should be same as models + freeze_params_list: + - True + - False + models: + - Teacher: + name: ResNet34 + pretrained: True + + - Student: + name: ResNet18 + pretrained: False + + infer_model_name: "Student" + + +# loss function config for traing/eval process +Loss: + Train: + - DistillationGTCELoss: + weight: 1.0 + model_names: ["Student"] + - DistillationWSLLoss: + weight: 2.5 + model_name_pairs: [["Student", "Teacher"]] + temperature: 2 + Eval: + - CELoss: + weight: 1.0 + + +Optimizer: + name: Momentum + momentum: 0.9 + weight_decay: 1e-4 + lr: + name: MultiStepDecay + learning_rate: 0.1 + milestones: [30, 60, 90] + step_each_epoch: 1 + gamma: 0.1 + + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: "./dataset/ILSVRC2012/" + cls_label_path: "./dataset/ILSVRC2012/train_list.txt" + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandCropImage: + size: 224 + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: True + loader: + num_workers: 8 + use_shared_memory: True + + Eval: + dataset: + name: ImageNetDataset + image_root: "./dataset/ILSVRC2012/" + cls_label_path: "./dataset/ILSVRC2012/val_list.txt" + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 0.00392157 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: "docs/images/inference_deployment/whl_demo.jpg" + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + resize_short: 256 + - CropImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 5 + class_id_map_file: "ppcls/utils/imagenet1k_label_list.txt" + +Metric: + Train: + - DistillationTopkAcc: + model_key: "Student" + topk: [1, 5] + Eval: + - DistillationTopkAcc: + model_key: "Student" + topk: [1, 5] diff --git a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml index 5e07692b01715ffa8196e5ded4604f9294d1ed07..8941894ad707bf6e34a9b9b83b446b72cd83ca1c 100644 --- a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml +++ b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_base.yaml @@ -142,6 +142,8 @@ Infer: channel_first: False - ResizeImage: resize_short: 236 + interpolation: bicubic + backend: pil - CropImage: size: 224 - NormalizeImage: diff --git a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml index eabccd4b712ab48886c74caf6b784b4c193f6913..227450fca21cf2feecb9c616a4ac3d17ce881a60 100644 --- a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml +++ b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_small.yaml @@ -142,6 +142,8 @@ Infer: channel_first: False - ResizeImage: resize_short: 236 + interpolation: bicubic + backend: pil - CropImage: size: 224 - NormalizeImage: diff --git a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_tiny.yaml b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_tiny.yaml index e423c866b131aefda13b0186eca7ac27d3c84733..f8332a0430a89da37208915d9df177f16f8d0cda 100644 --- a/ppcls/configs/ImageNet/PPHGNet/PPHGNet_tiny.yaml +++ b/ppcls/configs/ImageNet/PPHGNet/PPHGNet_tiny.yaml @@ -142,6 +142,8 @@ Infer: channel_first: False - ResizeImage: resize_short: 232 + interpolation: bicubic + backend: pil - CropImage: size: 224 - NormalizeImage: diff --git a/ppcls/configs/PULC/table_attribute/PPLCNet_x1_0.yaml b/ppcls/configs/PULC/table_attribute/PPLCNet_x1_0.yaml new file mode 100644 index 0000000000000000000000000000000000000000..2c1e9b253cb3b60afa635ebb6bb94dbdcf5bb886 --- /dev/null +++ b/ppcls/configs/PULC/table_attribute/PPLCNet_x1_0.yaml @@ -0,0 +1,133 @@ +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: "./output/" + device: "gpu" + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 20 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 224, 224] + save_inference_dir: "./inference" + use_multilabel: True + +# model architecture +Arch: + name: "PPLCNet_x1_0" + pretrained: True + use_ssld: True + class_num: 6 + + +# loss function config for traing/eval process +Loss: + Train: + - MultiLabelLoss: + weight: 1.0 + weight_ratio: True + size_sum: True + Eval: + - MultiLabelLoss: + weight: 1.0 + weight_ratio: True + size_sum: True + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.01 + warmup_epoch: 5 + regularizer: + name: 'L2' + coeff: 0.0005 + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: MultiLabelDataset + image_root: "dataset/table_attribute/" + cls_label_path: "dataset/table_attribute/train_list.txt" + label_ratio: True + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: [224, 224] + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: True + shuffle: True + loader: + num_workers: 4 + use_shared_memory: True + Eval: + dataset: + name: MultiLabelDataset + image_root: "dataset/table_attribute/" + cls_label_path: "dataset/table_attribute/val_list.txt" + label_ratio: True + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: [224, 224] + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 64 + drop_last: False + shuffle: False + loader: + num_workers: 4 + use_shared_memory: True + +Infer: + infer_imgs: deploy/images/PULC/table_attribute/val_3610.jpg + batch_size: 10 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: [224, 224] + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: TableAttribute + source_threshold: 0.5 + number_threshold: 0.5 + color_threshold: 0.5 + clarity_threshold : 0.5 + obstruction_threshold: 0.5 + angle_threshold: 0.5 + +Metric: + Eval: + - ATTRMetric: + + diff --git a/ppcls/configs/quick_start/MobileNetV1_retrieval.yaml b/ppcls/configs/quick_start/MobileNetV1_retrieval.yaml index f088e1cd9c9024fc80a3a171cee8d6865abd7dda..bac477392c3b1fe45a6cbe7e643a4a5aec96ac3e 100644 --- a/ppcls/configs/quick_start/MobileNetV1_retrieval.yaml +++ b/ppcls/configs/quick_start/MobileNetV1_retrieval.yaml @@ -20,8 +20,8 @@ Arch: name: RecModel infer_output_key: features infer_add_softmax: False - - Backbone: + + Backbone: name: MobileNetV1 pretrained: False BackboneStopLayer: @@ -31,12 +31,12 @@ Arch: embedding_size: 1024 class_num: 512 Head: - name: ArcMargin + name: ArcMargin embedding_size: 512 class_num: 101 margin: 0.15 scale: 30 - + # loss function config for traing/eval process Loss: Train: @@ -60,7 +60,7 @@ Optimizer: verbose: False last_epoch: -1 regularizer: - name: 'L2' + name: "L2" coeff: 0.0005 # data loader for train and eval @@ -82,7 +82,7 @@ DataLoader: scale: 0.00392157 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" - RandomErasing: EPSILON: 0.5 sl: 0.02 @@ -98,10 +98,10 @@ DataLoader: loader: num_workers: 4 use_shared_memory: True - + Eval: Query: - dataset: + dataset: name: VeriWild image_root: ./dataset/CUB_200_2011/ cls_label_path: ./dataset/CUB_200_2011/test_list.txt @@ -115,7 +115,7 @@ DataLoader: scale: 0.00392157 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" sampler: name: DistributedBatchSampler batch_size: 64 @@ -126,7 +126,7 @@ DataLoader: use_shared_memory: True Gallery: - dataset: + dataset: name: VeriWild image_root: ./dataset/CUB_200_2011/ cls_label_path: ./dataset/CUB_200_2011/test_list.txt @@ -140,7 +140,7 @@ DataLoader: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - order: '' + order: "" sampler: name: DistributedBatchSampler batch_size: 64 @@ -155,4 +155,3 @@ Metric: - Recallk: topk: [1, 5] - mAP: {} - diff --git a/ppcls/data/__init__.py b/ppcls/data/__init__.py index 80cf3bc9af826e935fe0fe6ccf8cad8d6924d370..5f73e7d832a7da4f5733e7c20c1c481a8fb5d09b 100644 --- a/ppcls/data/__init__.py +++ b/ppcls/data/__init__.py @@ -72,7 +72,12 @@ def build_dataloader(config, mode, device, use_dali=False, seed=None): # build dataset if use_dali: from ppcls.data.dataloader.dali import dali_dataloader - return dali_dataloader(config, mode, paddle.device.get_device(), seed) + return dali_dataloader( + config, + mode, + paddle.device.get_device(), + num_threads=config[mode]['loader']["num_workers"], + seed=seed) class_num = config.get("class_num", None) config_dataset = config[mode]['dataset'] diff --git a/ppcls/data/dataloader/dali.py b/ppcls/data/dataloader/dali.py index faef45e26b3dee2e17464a502f42f9886eac6518..a0b91a9a9dd468b0bb4ba0dd314341f067f6e3f5 100644 --- a/ppcls/data/dataloader/dali.py +++ b/ppcls/data/dataloader/dali.py @@ -143,7 +143,7 @@ class HybridValPipe(Pipeline): return self.epoch_size("Reader") -def dali_dataloader(config, mode, device, seed=None): +def dali_dataloader(config, mode, device, num_threads=4, seed=None): assert "gpu" in device, "gpu training is required for DALI" device_id = int(device.split(':')[1]) config_dataloader = config[mode] @@ -248,6 +248,7 @@ def dali_dataloader(config, mode, device, seed=None): device_id, shard_id, num_shards, + num_threads=num_threads, seed=seed + shard_id, pad_output=pad_output, output_dtype=output_dtype) @@ -270,6 +271,7 @@ def dali_dataloader(config, mode, device, seed=None): device_id=device_id, shard_id=0, num_shards=1, + num_threads=num_threads, seed=seed, pad_output=pad_output, output_dtype=output_dtype) @@ -298,6 +300,7 @@ def dali_dataloader(config, mode, device, seed=None): device_id=device_id, shard_id=shard_id, num_shards=num_shards, + num_threads=num_threads, pad_output=pad_output, output_dtype=output_dtype) else: @@ -311,6 +314,7 @@ def dali_dataloader(config, mode, device, seed=None): mean, std, device_id=device_id, + num_threads=num_threads, pad_output=pad_output, output_dtype=output_dtype) pipe.build() diff --git a/ppcls/data/dataloader/imagenet_dataset.py b/ppcls/data/dataloader/imagenet_dataset.py index 0e2b1c8a0b4cbc7e2f251ed2363fca71ec3239e7..51f1fbec3df24164a8c7c41a29a0ad91128120aa 100644 --- a/ppcls/data/dataloader/imagenet_dataset.py +++ b/ppcls/data/dataloader/imagenet_dataset.py @@ -21,27 +21,54 @@ from .common_dataset import CommonDataset class ImageNetDataset(CommonDataset): - def __init__( - self, - image_root, - cls_label_path, - transform_ops=None, - delimiter=None): + """ImageNetDataset + + Args: + image_root (str): image root, path to `ILSVRC2012` + cls_label_path (str): path to annotation file `train_list.txt` or 'val_list.txt` + transform_ops (list, optional): list of transform op(s). Defaults to None. + delimiter (str, optional): delimiter. Defaults to None. + relabel (bool, optional): whether do relabel when original label do not starts from 0 or are discontinuous. Defaults to False. + """ + def __init__(self, + image_root, + cls_label_path, + transform_ops=None, + delimiter=None, + relabel=False): self.delimiter = delimiter if delimiter is not None else " " - super(ImageNetDataset, self).__init__(image_root, cls_label_path, transform_ops) + self.relabel = relabel + super(ImageNetDataset, self).__init__(image_root, cls_label_path, + transform_ops) def _load_anno(self, seed=None): - assert os.path.exists(self._cls_path) - assert os.path.exists(self._img_root) + assert os.path.exists( + self._cls_path), f"path {self._cls_path} does not exist." + assert os.path.exists( + self._img_root), f"path {self._img_root} does not exist." self.images = [] self.labels = [] with open(self._cls_path) as fd: lines = fd.readlines() + if self.relabel: + label_set = set() + for line in lines: + line = line.strip().split(self.delimiter) + label_set.add(np.int64(line[1])) + label_map = { + oldlabel: newlabel + for newlabel, oldlabel in enumerate(label_set) + } + if seed is not None: np.random.RandomState(seed).shuffle(lines) - for l in lines: - l = l.strip().split(self.delimiter) - self.images.append(os.path.join(self._img_root, l[0])) - self.labels.append(np.int64(l[1])) - assert os.path.exists(self.images[-1]) + for line in lines: + line = line.strip().split(self.delimiter) + self.images.append(os.path.join(self._img_root, line[0])) + if self.relabel: + self.labels.append(label_map[np.int64(line[1])]) + else: + self.labels.append(np.int64(line[1])) + assert os.path.exists(self.images[ + -1]), f"path {self.images[-1]} does not exist." diff --git a/ppcls/data/dataloader/pk_sampler.py b/ppcls/data/dataloader/pk_sampler.py index 69d1a7c83001e0ea326b30082093fee2f83d3b8a..a4081b5c31f3fe37ae18bd9793cc030e479a77ab 100644 --- a/ppcls/data/dataloader/pk_sampler.py +++ b/ppcls/data/dataloader/pk_sampler.py @@ -32,17 +32,23 @@ class PKSampler(DistributedBatchSampler): batch_size (int): batch size sample_per_id (int): number of instance(s) within an class shuffle (bool, optional): _description_. Defaults to True. + id_list(list): list of (start_id, end_id, start_id, end_id) for set of ids to duplicated. + ratio(list): list of (ratio1, ratio2..) the duplication number for ids in id_list. drop_last (bool, optional): whether to discard the data at the end. Defaults to True. sample_method (str, optional): sample method when generating prob_list. Defaults to "sample_avg_prob". """ + def __init__(self, dataset, batch_size, sample_per_id, shuffle=True, drop_last=True, + id_list=None, + ratio=None, sample_method="sample_avg_prob"): - super().__init__(dataset, batch_size, shuffle=shuffle, drop_last=drop_last) + super().__init__( + dataset, batch_size, shuffle=shuffle, drop_last=drop_last) assert batch_size % sample_per_id == 0, \ f"PKSampler configs error, sample_per_id({sample_per_id}) must be a divisor of batch_size({batch_size})." assert hasattr(self.dataset, @@ -67,6 +73,16 @@ class PKSampler(DistributedBatchSampler): logger.error( "PKSampler only support id_avg_prob and sample_avg_prob sample method, " "but receive {}.".format(self.sample_method)) + + if id_list and ratio: + assert len(id_list) % 2 == 0 and len(id_list) == len(ratio) * 2 + for i in range(len(self.prob_list)): + for j in range(len(ratio)): + if i >= id_list[j * 2] and i <= id_list[j * 2 + 1]: + self.prob_list[i] = self.prob_list[i] * ratio[j] + break + self.prob_list = self.prob_list / sum(self.prob_list) + diff = np.abs(sum(self.prob_list) - 1) if diff > 0.00000001: self.prob_list[-1] = 1 - sum(self.prob_list[:-1]) @@ -74,8 +90,8 @@ class PKSampler(DistributedBatchSampler): logger.error("PKSampler prob list error") else: logger.info( - "PKSampler: sum of prob list not equal to 1, diff is {}, change the last prob".format(diff) - ) + "PKSampler: sum of prob list not equal to 1, diff is {}, change the last prob". + format(diff)) def __iter__(self): label_per_batch = self.batch_size // self.sample_per_label diff --git a/ppcls/data/dataloader/vehicle_dataset.py b/ppcls/data/dataloader/vehicle_dataset.py index 2981a57a0516aa25145f39479a34635b3be063f8..e4fbcad6a48d31b02a6fac6063ccb10d4dccdb48 100644 --- a/ppcls/data/dataloader/vehicle_dataset.py +++ b/ppcls/data/dataloader/vehicle_dataset.py @@ -89,11 +89,7 @@ class CompCars(Dataset): class VeriWild(Dataset): - def __init__( - self, - image_root, - cls_label_path, - transform_ops=None, ): + def __init__(self, image_root, cls_label_path, transform_ops=None): self._img_root = image_root self._cls_path = cls_label_path if transform_ops: @@ -102,19 +98,23 @@ class VeriWild(Dataset): self._load_anno() def _load_anno(self): - assert os.path.exists(self._cls_path) - assert os.path.exists(self._img_root) + assert os.path.exists( + self._cls_path), f"path {self._cls_path} does not exist." + assert os.path.exists( + self._img_root), f"path {self._img_root} does not exist." self.images = [] self.labels = [] self.cameras = [] with open(self._cls_path) as fd: lines = fd.readlines() - for l in lines: - l = l.strip().split() - self.images.append(os.path.join(self._img_root, l[0])) - self.labels.append(np.int64(l[1])) - self.cameras.append(np.int64(l[2])) + for line in lines: + line = line.strip().split() + self.images.append(os.path.join(self._img_root, line[0])) + self.labels.append(np.int64(line[1])) + if len(line) >= 3: + self.cameras.append(np.int64(line[2])) assert os.path.exists(self.images[-1]) + self.has_camera = len(self.cameras) > 0 def __getitem__(self, idx): try: @@ -123,7 +123,10 @@ class VeriWild(Dataset): if self._transform_ops: img = transform(img, self._transform_ops) img = img.transpose((2, 0, 1)) - return (img, self.labels[idx], self.cameras[idx]) + if self.has_camera: + return (img, self.labels[idx], self.cameras[idx]) + else: + return (img, self.labels[idx]) except Exception as ex: logger.error("Exception occured when parse line: {} with msg: {}". format(self.images[idx], ex)) diff --git a/ppcls/data/postprocess/__init__.py b/ppcls/data/postprocess/__init__.py index 7e53832342564233a069444e97a0f25f560076ef..130ff22595c3df78261a8bb79292f5cf763feafd 100644 --- a/ppcls/data/postprocess/__init__.py +++ b/ppcls/data/postprocess/__init__.py @@ -21,6 +21,7 @@ from .threshoutput import ThreshOutput, MultiLabelThreshOutput from .attr_rec import VehicleAttribute, PersonAttribute + def build_postprocess(config): config = copy.deepcopy(config) model_name = config.pop("name") diff --git a/ppcls/data/postprocess/attr_rec.py b/ppcls/data/postprocess/attr_rec.py index a8d492501833ac4ccd83d3aea108e7e34c46cadf..2a3de779e32653d43035b3bd3b3549ba147546ca 100644 --- a/ppcls/data/postprocess/attr_rec.py +++ b/ppcls/data/postprocess/attr_rec.py @@ -71,7 +71,6 @@ class VehicleAttribute(object): return batch_res - class PersonAttribute(object): def __init__(self, threshold=0.5, @@ -171,3 +170,58 @@ class PersonAttribute(object): batch_res.append({"attributes": label_res, "output": pred_res}) return batch_res + +class TableAttribute(object): + def __init__( + self, + source_threshold=0.5, + number_threshold=0.5, + color_threshold=0.5, + clarity_threshold=0.5, + obstruction_threshold=0.5, + angle_threshold=0.5, ): + self.source_threshold = source_threshold + self.number_threshold = number_threshold + self.color_threshold = color_threshold + self.clarity_threshold = clarity_threshold + self.obstruction_threshold = obstruction_threshold + self.angle_threshold = angle_threshold + + def __call__(self, x, file_names=None): + if isinstance(x, dict): + x = x['logits'] + assert isinstance(x, paddle.Tensor) + if file_names is not None: + assert x.shape[0] == len(file_names) + x = F.sigmoid(x).numpy() + + # postprocess output of predictor + batch_res = [] + for idx, res in enumerate(x): + res = res.tolist() + label_res = [] + source = 'Scanned' if res[0] > self.source_threshold else 'Photo' + number = 'Little' if res[1] > self.number_threshold else 'Numerous' + color = 'Black-and-White' if res[ + 2] > self.color_threshold else 'Multicolor' + clarity = 'Clear' if res[3] > self.clarity_threshold else 'Blurry' + obstruction = 'Without-Obstacles' if res[ + 4] > self.number_threshold else 'With-Obstacles' + angle = 'Horizontal' if res[ + 5] > self.number_threshold else 'Tilted' + + label_res = [source, number, color, clarity, obstruction, angle] + + threshold_list = [ + self.source_threshold, self.number_threshold, + self.color_threshold, self.clarity_threshold, + self.obstruction_threshold, self.angle_threshold + ] + pred_res = (np.array(res) > np.array(threshold_list) + ).astype(np.int8).tolist() + batch_res.append({ + "attributes": label_res, + "output": pred_res, + "file_name": file_names[idx] + }) + return batch_res diff --git a/ppcls/data/preprocess/__init__.py b/ppcls/data/preprocess/__init__.py index d0cfcf2409d2d890adcf03ef0e03b2475625ead8..8f9ea02834d2851116d5b40fd4fc91d59aac3152 100644 --- a/ppcls/data/preprocess/__init__.py +++ b/ppcls/data/preprocess/__init__.py @@ -38,9 +38,11 @@ from ppcls.data.preprocess.ops.operators import CropWithPadding from ppcls.data.preprocess.ops.operators import RandomInterpolationAugment from ppcls.data.preprocess.ops.operators import ColorJitter from ppcls.data.preprocess.ops.operators import RandomCropImage +from ppcls.data.preprocess.ops.operators import RandomRotation from ppcls.data.preprocess.ops.operators import Padv2 from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator +from ppcls.data.preprocess.batch_ops.batch_operators import MixupCutmixHybrid import numpy as np from PIL import Image diff --git a/ppcls/data/preprocess/batch_ops/batch_operators.py b/ppcls/data/preprocess/batch_ops/batch_operators.py index 6f0abb864574f8bead1d1ad6461460b4ececc7a7..c9563e2294d43df8183e2cd32894c96da1f48f8e 100644 --- a/ppcls/data/preprocess/batch_ops/batch_operators.py +++ b/ppcls/data/preprocess/batch_ops/batch_operators.py @@ -23,6 +23,9 @@ import numpy as np from ppcls.utils import logger from ppcls.data.preprocess.ops.fmix import sample_mask +import paddle +import paddle.nn.functional as F + class BatchOperator(object): """ BatchOperator """ @@ -229,3 +232,270 @@ class OpSampler(object): list(self.ops.keys()), weights=list(self.ops.values()), k=1)[0] # return batch directly when None Op return op(batch) if op else batch + + +class MixupCutmixHybrid(object): + """ Mixup/Cutmix that applies different params to each element or whole batch + + Args: + mixup_alpha (float): mixup alpha value, mixup is active if > 0. + cutmix_alpha (float): cutmix alpha value, cutmix is active if > 0. + cutmix_minmax (List[float]): cutmix min/max image ratio, cutmix is active and uses this vs alpha if not None. + prob (float): probability of applying mixup or cutmix per batch or element + switch_prob (float): probability of switching to cutmix instead of mixup when both are active + mode (str): how to apply mixup/cutmix params (per 'batch', 'pair' (pair of elements), 'elem' (element) + correct_lam (bool): apply lambda correction when cutmix bbox clipped by image borders + label_smoothing (float): apply label smoothing to the mixed target tensor + num_classes (int): number of classes for target + """ + + def __init__(self, + mixup_alpha=1., + cutmix_alpha=0., + cutmix_minmax=None, + prob=1.0, + switch_prob=0.5, + mode='batch', + correct_lam=True, + label_smoothing=0.1, + num_classes=4): + self.mixup_alpha = mixup_alpha + self.cutmix_alpha = cutmix_alpha + self.cutmix_minmax = cutmix_minmax + if self.cutmix_minmax is not None: + assert len(self.cutmix_minmax) == 2 + # force cutmix alpha == 1.0 when minmax active to keep logic simple & safe + self.cutmix_alpha = 1.0 + self.mix_prob = prob + self.switch_prob = switch_prob + self.label_smoothing = label_smoothing + self.num_classes = num_classes + self.mode = mode + self.correct_lam = correct_lam # correct lambda based on clipped area for cutmix + self.mixup_enabled = True # set to false to disable mixing (intended tp be set by train loop) + + def _one_hot(self, x, num_classes, on_value=1., off_value=0.): + x = paddle.cast(x, dtype='int64') + on_value = paddle.full([x.shape[0], num_classes], on_value) + off_value = paddle.full([x.shape[0], num_classes], off_value) + return paddle.where( + F.one_hot(x, num_classes) == 1, on_value, off_value) + + def _mixup_target(self, target, num_classes, lam=1., smoothing=0.0): + off_value = smoothing / num_classes + on_value = 1. - smoothing + off_value + y1 = self._one_hot( + target, + num_classes, + on_value=on_value, + off_value=off_value, ) + y2 = self._one_hot( + target.flip(0), + num_classes, + on_value=on_value, + off_value=off_value) + return y1 * lam + y2 * (1. - lam) + + def _rand_bbox(self, img_shape, lam, margin=0., count=None): + """ Standard CutMix bounding-box + Generates a random square bbox based on lambda value. This impl includes + support for enforcing a border margin as percent of bbox dimensions. + + Args: + img_shape (tuple): Image shape as tuple + lam (float): Cutmix lambda value + margin (float): Percentage of bbox dimension to enforce as margin (reduce amount of box outside image) + count (int): Number of bbox to generate + """ + ratio = np.sqrt(1 - lam) + img_h, img_w = img_shape[-2:] + cut_h, cut_w = int(img_h * ratio), int(img_w * ratio) + margin_y, margin_x = int(margin * cut_h), int(margin * cut_w) + cy = np.random.randint(0 + margin_y, img_h - margin_y, size=count) + cx = np.random.randint(0 + margin_x, img_w - margin_x, size=count) + yl = np.clip(cy - cut_h // 2, 0, img_h) + yh = np.clip(cy + cut_h // 2, 0, img_h) + xl = np.clip(cx - cut_w // 2, 0, img_w) + xh = np.clip(cx + cut_w // 2, 0, img_w) + return yl, yh, xl, xh + + def _rand_bbox_minmax(self, img_shape, minmax, count=None): + """ Min-Max CutMix bounding-box + Inspired by Darknet cutmix impl, generates a random rectangular bbox + based on min/max percent values applied to each dimension of the input image. + + Typical defaults for minmax are usually in the .2-.3 for min and .8-.9 range for max. + + Args: + img_shape (tuple): Image shape as tuple + minmax (tuple or list): Min and max bbox ratios (as percent of image size) + count (int): Number of bbox to generate + """ + assert len(minmax) == 2 + img_h, img_w = img_shape[-2:] + cut_h = np.random.randint( + int(img_h * minmax[0]), int(img_h * minmax[1]), size=count) + cut_w = np.random.randint( + int(img_w * minmax[0]), int(img_w * minmax[1]), size=count) + yl = np.random.randint(0, img_h - cut_h, size=count) + xl = np.random.randint(0, img_w - cut_w, size=count) + yu = yl + cut_h + xu = xl + cut_w + return yl, yu, xl, xu + + def _cutmix_bbox_and_lam(self, + img_shape, + lam, + ratio_minmax=None, + correct_lam=True, + count=None): + """ Generate bbox and apply lambda correction. + """ + if ratio_minmax is not None: + yl, yu, xl, xu = self._rand_bbox_minmax( + img_shape, ratio_minmax, count=count) + else: + yl, yu, xl, xu = self._rand_bbox(img_shape, lam, count=count) + if correct_lam or ratio_minmax is not None: + bbox_area = (yu - yl) * (xu - xl) + lam = 1. - bbox_area / float(img_shape[-2] * img_shape[-1]) + return (yl, yu, xl, xu), lam + + def _params_per_elem(self, batch_size): + lam = np.ones(batch_size, dtype=np.float32) + use_cutmix = np.zeros(batch_size, dtype=np.bool) + if self.mixup_enabled: + if self.mixup_alpha > 0. and self.cutmix_alpha > 0.: + use_cutmix = np.random.rand(batch_size) < self.switch_prob + lam_mix = np.where( + use_cutmix, + np.random.beta( + self.cutmix_alpha, self.cutmix_alpha, size=batch_size), + np.random.beta( + self.mixup_alpha, self.mixup_alpha, size=batch_size)) + elif self.mixup_alpha > 0.: + lam_mix = np.random.beta( + self.mixup_alpha, self.mixup_alpha, size=batch_size) + elif self.cutmix_alpha > 0.: + use_cutmix = np.ones(batch_size, dtype=np.bool) + lam_mix = np.random.beta( + self.cutmix_alpha, self.cutmix_alpha, size=batch_size) + else: + assert False, "One of mixup_alpha > 0., cutmix_alpha > 0., cutmix_minmax not None should be true." + lam = np.where( + np.random.rand(batch_size) < self.mix_prob, + lam_mix.astype(np.float32), lam) + return lam, use_cutmix + + def _params_per_batch(self): + lam = 1. + use_cutmix = False + if self.mixup_enabled and np.random.rand() < self.mix_prob: + if self.mixup_alpha > 0. and self.cutmix_alpha > 0.: + use_cutmix = np.random.rand() < self.switch_prob + lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha) if use_cutmix else \ + np.random.beta(self.mixup_alpha, self.mixup_alpha) + elif self.mixup_alpha > 0.: + lam_mix = np.random.beta(self.mixup_alpha, self.mixup_alpha) + elif self.cutmix_alpha > 0.: + use_cutmix = True + lam_mix = np.random.beta(self.cutmix_alpha, self.cutmix_alpha) + else: + assert False, "One of mixup_alpha > 0., cutmix_alpha > 0., cutmix_minmax not None should be true." + lam = float(lam_mix) + return lam, use_cutmix + + def _mix_elem(self, x): + batch_size = len(x) + lam_batch, use_cutmix = self._params_per_elem(batch_size) + x_orig = x.clone( + ) # need to keep an unmodified original for mixing source + for i in range(batch_size): + j = batch_size - i - 1 + lam = lam_batch[i] + if lam != 1.: + if use_cutmix[i]: + (yl, yh, xl, xh), lam = self._cutmix_bbox_and_lam( + x[i].shape, + lam, + ratio_minmax=self.cutmix_minmax, + correct_lam=self.correct_lam) + if yl < yh and xl < xh: + x[i][:, yl:yh, xl:xh] = x_orig[j][:, yl:yh, xl:xh] + lam_batch[i] = lam + else: + x[i] = x[i] * lam + x_orig[j] * (1 - lam) + return paddle.to_tensor(lam_batch, dtype=x.dtype).unsqueeze(1) + + def _mix_pair(self, x): + batch_size = len(x) + lam_batch, use_cutmix = self._params_per_elem(batch_size // 2) + x_orig = x.clone( + ) # need to keep an unmodified original for mixing source + for i in range(batch_size // 2): + j = batch_size - i - 1 + lam = lam_batch[i] + if lam != 1.: + if use_cutmix[i]: + (yl, yh, xl, xh), lam = self._cutmix_bbox_and_lam( + x[i].shape, + lam, + ratio_minmax=self.cutmix_minmax, + correct_lam=self.correct_lam) + if yl < yh and xl < xh: + x[i][:, yl:yh, xl:xh] = x_orig[j][:, yl:yh, xl:xh] + x[j][:, yl:yh, xl:xh] = x_orig[i][:, yl:yh, xl:xh] + lam_batch[i] = lam + else: + x[i] = x[i] * lam + x_orig[j] * (1 - lam) + x[j] = x[j] * lam + x_orig[i] * (1 - lam) + lam_batch = np.concatenate((lam_batch, lam_batch[::-1])) + return paddle.to_tensor(lam_batch, dtype=x.dtype).unsqueeze(1) + + def _mix_batch(self, x): + lam, use_cutmix = self._params_per_batch() + if lam == 1.: + return 1. + if use_cutmix: + (yl, yh, xl, xh), lam = self._cutmix_bbox_and_lam( + x.shape, + lam, + ratio_minmax=self.cutmix_minmax, + correct_lam=self.correct_lam) + if yl < yh and xl < xh: + x[:, :, yl:yh, xl:xh] = x.flip(0)[:, :, yl:yh, xl:xh] + + else: + x_flipped = x.flip(0) * (1. - lam) + x[:] = x * lam + x_flipped + return lam + + def _unpack(self, batch): + """ _unpack """ + assert isinstance(batch, list), \ + 'batch should be a list filled with tuples (img, label)' + bs = len(batch) + assert bs > 0, 'size of the batch data should > 0' + #imgs, labels = list(zip(*batch)) + imgs = [] + labels = [] + for item in batch: + imgs.append(item[0]) + labels.append(item[1]) + return np.array(imgs), np.array(labels), bs + + def __call__(self, batch): + x, target, bs = self._unpack(batch) + x = paddle.to_tensor(x) + target = paddle.to_tensor(target) + assert len(x) % 2 == 0, 'Batch size should be even when using this' + if self.mode == 'elem': + lam = self._mix_elem(x) + elif self.mode == 'pair': + lam = self._mix_pair(x) + else: + lam = self._mix_batch(x) + target = self._mixup_target(target, self.num_classes, lam, + self.label_smoothing) + + return list(zip(x.numpy(), target.numpy())) diff --git a/ppcls/data/preprocess/ops/operators.py b/ppcls/data/preprocess/ops/operators.py index e617b8a71afffeb9e18e4be412f5a3374bd387ec..c70b9cb723dce77755189aad16f3839046673f35 100644 --- a/ppcls/data/preprocess/ops/operators.py +++ b/ppcls/data/preprocess/ops/operators.py @@ -26,6 +26,7 @@ import cv2 import numpy as np from PIL import Image, ImageOps, __version__ as PILLOW_VERSION from paddle.vision.transforms import ColorJitter as RawColorJitter +from paddle.vision.transforms import RandomRotation as RawRandomRotation from paddle.vision.transforms import ToTensor, Normalize, RandomHorizontalFlip, RandomResizedCrop from paddle.vision.transforms import functional as F from .autoaugment import ImageNetPolicy @@ -181,7 +182,8 @@ class DecodeImage(object): img = np.asarray(img)[:, :, ::-1] # BRG if self.to_rgb: - assert img.shape[2] == 3, f"invalid shape of image[{img.shape}]" + assert img.shape[ + 2] == 3, f"invalid shape of image[{img.shape}]" img = img[:, :, ::-1] if self.channel_first: @@ -495,7 +497,13 @@ class RandFlipImage(object): if isinstance(img, np.ndarray): return cv2.flip(img, self.flip_code) else: - return img.transpose(Image.FLIP_LEFT_RIGHT) + if self.flip_code == 1: + return img.transpose(Image.FLIP_LEFT_RIGHT) + elif self.flip_code == 0: + return img.transpose(Image.FLIP_TOP_BOTTOM) + else: + return img.transpose(Image.FLIP_LEFT_RIGHT).transpose( + Image.FLIP_LEFT_RIGHT) else: return img @@ -653,17 +661,38 @@ class ColorJitter(RawColorJitter): return img +class RandomRotation(RawRandomRotation): + """RandomRotation. + """ + + def __init__(self, prob=0.5, *args, **kwargs): + super().__init__(*args, **kwargs) + self.prob = prob + + def __call__(self, img): + if np.random.random() < self.prob: + img = super()._apply_image(img) + return img + + class Pad(object): """ Pads the given PIL.Image on all sides with specified padding mode and fill value. adapted from: https://pytorch.org/vision/stable/_modules/torchvision/transforms/transforms.html#Pad """ - def __init__(self, padding: int, fill: int=0, - padding_mode: str="constant"): + def __init__(self, + padding: int, + fill: int=0, + padding_mode: str="constant", + backend: str="pil"): self.padding = padding self.fill = fill self.padding_mode = padding_mode + self.backend = backend + assert backend in [ + "pil", "cv2" + ], f"backend must in ['pil', 'cv2'], but got {backend}" def _parse_fill(self, fill, img, min_pil_version, name="fillcolor"): # Process fill color for affine transforms @@ -698,11 +727,21 @@ class Pad(object): return {name: fill} def __call__(self, img): - opts = self._parse_fill(self.fill, img, "2.3.0", name="fill") - if img.mode == "P": - palette = img.getpalette() - img = ImageOps.expand(img, border=self.padding, **opts) - img.putpalette(palette) + if self.backend == "pil": + opts = self._parse_fill(self.fill, img, "2.3.0", name="fill") + if img.mode == "P": + palette = img.getpalette() + img = ImageOps.expand(img, border=self.padding, **opts) + img.putpalette(palette) + return img + return ImageOps.expand(img, border=self.padding, **opts) + else: + img = cv2.copyMakeBorder( + img, + self.padding, + self.padding, + self.padding, + self.padding, + cv2.BORDER_CONSTANT, + value=(self.fill, self.fill, self.fill)) return img - - return ImageOps.expand(img, border=self.padding, **opts) diff --git a/ppcls/engine/engine.py b/ppcls/engine/engine.py index 50683bac357d96f2080886425dcae7ae0001ce33..e6977110afe35418d77467f2d3e383532977221c 100644 --- a/ppcls/engine/engine.py +++ b/ppcls/engine/engine.py @@ -114,6 +114,7 @@ class Engine(object): #TODO(gaotingquan): support rec class_num = config["Arch"].get("class_num", None) self.config["DataLoader"].update({"class_num": class_num}) + # build dataloader if self.mode == 'train': self.train_dataloader = build_dataloader( diff --git a/ppcls/engine/evaluation/retrieval.py b/ppcls/engine/evaluation/retrieval.py index 02cae1670bbe1255a84fcf80c3097c5c020c917f..ef4bbd24c9d4db6440b3ca0d6e8b948bef4b53f7 100644 --- a/ppcls/engine/evaluation/retrieval.py +++ b/ppcls/engine/evaluation/retrieval.py @@ -25,32 +25,35 @@ from ppcls.utils import logger def retrieval_eval(engine, epoch_id=0): engine.model.eval() - # step1. build gallery + # step1. build query & gallery if engine.gallery_query_dataloader is not None: gallery_feas, gallery_img_id, gallery_unique_id = cal_feature( engine, name='gallery_query') - query_feas, query_img_id, query_query_id = gallery_feas, gallery_img_id, gallery_unique_id + query_feas, query_img_id, query_unique_id = gallery_feas, gallery_img_id, gallery_unique_id else: gallery_feas, gallery_img_id, gallery_unique_id = cal_feature( engine, name='gallery') - query_feas, query_img_id, query_query_id = cal_feature( + query_feas, query_img_id, query_unique_id = cal_feature( engine, name='query') - # step2. do evaluation + # step2. split data into blocks so as to save memory sim_block_size = engine.config["Global"].get("sim_block_size", 64) sections = [sim_block_size] * (len(query_feas) // sim_block_size) if len(query_feas) % sim_block_size: sections.append(len(query_feas) % sim_block_size) + fea_blocks = paddle.split(query_feas, num_or_sections=sections) - if query_query_id is not None: - query_id_blocks = paddle.split( - query_query_id, num_or_sections=sections) - image_id_blocks = paddle.split(query_img_id, num_or_sections=sections) + if query_unique_id is not None: + query_unique_id_blocks = paddle.split( + query_unique_id, num_or_sections=sections) + query_img_id_blocks = paddle.split(query_img_id, num_or_sections=sections) metric_key = None + # step3. do evaluation if engine.eval_loss_func is None: metric_dict = {metric_key: 0.} else: + # do evaluation with re-ranking(k-reciprocal) reranking_flag = engine.config['Global'].get('re_ranking', False) logger.info(f"re_ranking={reranking_flag}") metric_dict = dict() @@ -70,9 +73,9 @@ def retrieval_eval(engine, epoch_id=0): query_feas, gallery_feas, k1=20, k2=6, lambda_value=0.3) # compute keep mask - query_id_mask = (query_query_id != gallery_unique_id.t()) + unique_id_mask = (query_unique_id != gallery_unique_id.t()) image_id_mask = (query_img_id != gallery_img_id.t()) - keep_mask = paddle.logical_or(query_id_mask, image_id_mask) + keep_mask = paddle.logical_or(image_id_mask, unique_id_mask) # set inf(1e9) distance to those exist in gallery distmat = distmat * keep_mask.astype("float32") @@ -85,24 +88,27 @@ def retrieval_eval(engine, epoch_id=0): for key in metric_tmp: metric_dict[key] = metric_tmp[key] else: + # do evaluation without re-ranking for block_idx, block_fea in enumerate(fea_blocks): similarity_matrix = paddle.matmul( block_fea, gallery_feas, transpose_y=True) # [n,m] - if query_query_id is not None: - query_id_block = query_id_blocks[block_idx] - query_id_mask = (query_id_block != gallery_unique_id.t()) + if query_unique_id is not None: + query_unique_id_block = query_unique_id_blocks[block_idx] + unique_id_mask = ( + query_unique_id_block != gallery_unique_id.t()) - image_id_block = image_id_blocks[block_idx] - image_id_mask = (image_id_block != gallery_img_id.t()) + query_img_id_block = query_img_id_blocks[block_idx] + image_id_mask = (query_img_id_block != gallery_img_id.t()) - keep_mask = paddle.logical_or(query_id_mask, image_id_mask) + keep_mask = paddle.logical_or(image_id_mask, + unique_id_mask) similarity_matrix = similarity_matrix * keep_mask.astype( "float32") else: keep_mask = None metric_tmp = engine.eval_metric_func( - similarity_matrix, image_id_blocks[block_idx], + similarity_matrix, query_img_id_blocks[block_idx], gallery_img_id, keep_mask) for key in metric_tmp: diff --git a/ppcls/loss/__init__.py b/ppcls/loss/__init__.py index 489aea7fb1ee4b4a8ff9388f3984e64965a1eac7..019eff71949972d1f72e02948fcce65ff96298ac 100644 --- a/ppcls/loss/__init__.py +++ b/ppcls/loss/__init__.py @@ -12,10 +12,12 @@ from .msmloss import MSMLoss from .npairsloss import NpairsLoss from .trihardloss import TriHardLoss from .triplet import TripletLoss, TripletLossV2 +from .tripletangularmarginloss import TripletAngularMarginLoss from .supconloss import SupConLoss from .pairwisecosface import PairwiseCosface from .dmlloss import DMLLoss from .distanceloss import DistanceLoss +from .softtargetceloss import SoftTargetCrossEntropy from .distillationloss import DistillationCELoss from .distillationloss import DistillationGTCELoss @@ -24,6 +26,7 @@ from .distillationloss import DistillationDistanceLoss from .distillationloss import DistillationRKDLoss from .distillationloss import DistillationKLDivLoss from .distillationloss import DistillationDKDLoss +from .distillationloss import DistillationWSLLoss from .distillationloss import DistillationMultiLabelLoss from .distillationloss import DistillationDISTLoss from .distillationloss import DistillationPairLoss diff --git a/ppcls/loss/distillationloss.py b/ppcls/loss/distillationloss.py index 5a924afe72fd7c6627b9f3b3be8ce3553932b535..af24f83015928984ccb30ae864adf73928657e5e 100644 --- a/ppcls/loss/distillationloss.py +++ b/ppcls/loss/distillationloss.py @@ -22,6 +22,7 @@ from .distanceloss import DistanceLoss from .rkdloss import RKdAngle, RkdDistance from .kldivloss import KLDivLoss from .dkdloss import DKDLoss +from .wslloss import WSLLoss from .dist_loss import DISTLoss from .multilabelloss import MultiLabelLoss from .mgd_loss import MGDLoss @@ -262,6 +263,34 @@ class DistillationDKDLoss(DKDLoss): return loss_dict +class DistillationWSLLoss(WSLLoss): + """ + DistillationWSLLoss + """ + + def __init__(self, + model_name_pairs=[], + key=None, + temperature=2.0, + name="wsl_loss"): + super().__init__(temperature) + self.model_name_pairs = model_name_pairs + self.key = key + self.name = name + + def forward(self, predicts, batch): + loss_dict = dict() + for idx, pair in enumerate(self.model_name_pairs): + out1 = predicts[pair[0]] + out2 = predicts[pair[1]] + if self.key is not None: + out1 = out1[self.key] + out2 = out2[self.key] + loss = super().forward(out1, out2, batch) + loss_dict[f"{self.name}_{pair[0]}_{pair[1]}"] = loss + return loss_dict + + class DistillationMultiLabelLoss(MultiLabelLoss): """ DistillationMultiLabelLoss diff --git a/ppcls/loss/softtargetceloss.py b/ppcls/loss/softtargetceloss.py new file mode 100644 index 0000000000000000000000000000000000000000..351db50e3434d333814a9ea49f1b0463b80432a7 --- /dev/null +++ b/ppcls/loss/softtargetceloss.py @@ -0,0 +1,16 @@ +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class SoftTargetCrossEntropy(nn.Layer): + def __init__(self): + super().__init__() + + def forward(self, x, target): + loss = paddle.sum(-target * F.log_softmax(x, axis=-1), axis=-1) + loss = loss.mean() + return {"SoftTargetCELoss": loss} + + def __str__(self, ): + return type(self).__name__ diff --git a/ppcls/loss/tripletangularmarginloss.py b/ppcls/loss/tripletangularmarginloss.py new file mode 100644 index 0000000000000000000000000000000000000000..3a91d2d499fa22aadc7ca15322f4048b978fb19d --- /dev/null +++ b/ppcls/loss/tripletangularmarginloss.py @@ -0,0 +1,115 @@ +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import paddle +import paddle.nn as nn + + +class TripletAngularMarginLoss(nn.Layer): + """A more robust triplet loss with hard positive/negative mining on angular margin instead of relative distance between d(a,p) and d(a,n). + + Args: + margin (float, optional): angular margin. Defaults to 0.5. + normalize_feature (bool, optional): whether to apply L2-norm in feature before computing distance(cos-similarity). Defaults to True. + reduction (str, optional): reducing option within an batch . Defaults to "mean". + add_absolute (bool, optional): whether add absolute loss within d(a,p) or d(a,n). Defaults to False. + absolute_loss_weight (float, optional): weight for absolute loss. Defaults to 1.0. + ap_value (float, optional): weight for d(a, p). Defaults to 0.9. + an_value (float, optional): weight for d(a, n). Defaults to 0.5. + feature_from (str, optional): which key feature from. Defaults to "features". + """ + + def __init__(self, + margin=0.5, + normalize_feature=True, + reduction="mean", + add_absolute=False, + absolute_loss_weight=1.0, + ap_value=0.9, + an_value=0.5, + feature_from="features"): + super(TripletAngularMarginLoss, self).__init__() + self.margin = margin + self.feature_from = feature_from + self.ranking_loss = paddle.nn.loss.MarginRankingLoss( + margin=margin, reduction=reduction) + self.normalize_feature = normalize_feature + self.add_absolute = add_absolute + self.ap_value = ap_value + self.an_value = an_value + self.absolute_loss_weight = absolute_loss_weight + + def forward(self, input, target): + """ + Args: + inputs: feature matrix with shape (batch_size, feat_dim) + target: ground truth labels with shape (num_classes) + """ + inputs = input[self.feature_from] + + if self.normalize_feature: + inputs = paddle.divide( + inputs, paddle.norm( + inputs, p=2, axis=-1, keepdim=True)) + + bs = inputs.shape[0] + + # compute distance(cos-similarity) + dist = paddle.matmul(inputs, inputs.t()) + + # hard negative mining + is_pos = paddle.expand(target, ( + bs, bs)).equal(paddle.expand(target, (bs, bs)).t()) + is_neg = paddle.expand(target, ( + bs, bs)).not_equal(paddle.expand(target, (bs, bs)).t()) + + # `dist_ap` means distance(anchor, positive) + # both `dist_ap` and `relative_p_inds` with shape [N, 1] + dist_ap = paddle.min(paddle.reshape( + paddle.masked_select(dist, is_pos), (bs, -1)), + axis=1, + keepdim=True) + # `dist_an` means distance(anchor, negative) + # both `dist_an` and `relative_n_inds` with shape [N, 1] + dist_an = paddle.max(paddle.reshape( + paddle.masked_select(dist, is_neg), (bs, -1)), + axis=1, + keepdim=True) + # shape [N] + dist_ap = paddle.squeeze(dist_ap, axis=1) + dist_an = paddle.squeeze(dist_an, axis=1) + + # Compute ranking hinge loss + y = paddle.ones_like(dist_an) + loss = self.ranking_loss(dist_ap, dist_an, y) + + if self.add_absolute: + absolut_loss_ap = self.ap_value - dist_ap + absolut_loss_ap = paddle.where(absolut_loss_ap > 0, + absolut_loss_ap, + paddle.zeros_like(absolut_loss_ap)) + + absolut_loss_an = dist_an - self.an_value + absolut_loss_an = paddle.where(absolut_loss_an > 0, + absolut_loss_an, + paddle.ones_like(absolut_loss_an)) + + loss = (absolut_loss_an.mean() + absolut_loss_ap.mean() + ) * self.absolute_loss_weight + loss.mean() + + return {"TripletAngularMarginLoss": loss} diff --git a/ppcls/loss/wslloss.py b/ppcls/loss/wslloss.py new file mode 100644 index 0000000000000000000000000000000000000000..8bdfaf8ccdacb475f209497bd96859b4c16dd760 --- /dev/null +++ b/ppcls/loss/wslloss.py @@ -0,0 +1,66 @@ +# copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + + +class WSLLoss(nn.Layer): + """ + Weighted Soft Labels Loss + paper: https://arxiv.org/pdf/2102.00650.pdf + code reference: https://github.com/bellymonster/Weighted-Soft-Label-Distillation + """ + + def __init__(self, temperature=2.0, use_target_as_gt=False): + super().__init__() + self.temperature = temperature + self.use_target_as_gt = use_target_as_gt + + def forward(self, logits_student, logits_teacher, target=None): + """Compute weighted soft labels loss. + Args: + logits_student: student's logits with shape (batch_size, num_classes) + logits_teacher: teacher's logits with shape (batch_size, num_classes) + target: ground truth labels with shape (batch_size) + """ + if target is None or self.use_target_as_gt: + target = logits_teacher.argmax(axis=-1) + + target = F.one_hot( + target.reshape([-1]), num_classes=logits_student[0].shape[0]) + + s_input_for_softmax = logits_student / self.temperature + t_input_for_softmax = logits_teacher / self.temperature + + ce_loss_s = -paddle.sum(target * + F.log_softmax(logits_student.detach()), + axis=1) + ce_loss_t = -paddle.sum(target * + F.log_softmax(logits_teacher.detach()), + axis=1) + + ratio = ce_loss_s / (ce_loss_t + 1e-7) + ratio = paddle.maximum(ratio, paddle.zeros_like(ratio)) + + kd_loss = -paddle.sum(F.softmax(t_input_for_softmax) * + F.log_softmax(s_input_for_softmax), + axis=1) + weight = 1 - paddle.exp(-ratio) + + weighted_kd_loss = (self.temperature**2) * paddle.mean(kd_loss * + weight) + + return weighted_kd_loss diff --git a/ppcls/metric/metrics.py b/ppcls/metric/metrics.py index 0c803ccfdbb29216381625ea3df4a4540c7b56c0..b6dc934f31c04b0df2a90e63fed48973dddff1ca 100644 --- a/ppcls/metric/metrics.py +++ b/ppcls/metric/metrics.py @@ -12,6 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +from cmath import nan import numpy as np import paddle import paddle.nn as nn @@ -97,6 +98,11 @@ class mAP(nn.Layer): num_rel = paddle.greater_than(num_rel, paddle.to_tensor(0.)) num_rel_index = paddle.nonzero(num_rel.astype("int")) num_rel_index = paddle.reshape(num_rel_index, [num_rel_index.shape[0]]) + + if paddle.numel(num_rel_index).item() == 0: + metric_dict["mAP"] = np.nan + return metric_dict + equal_flag = paddle.index_select(equal_flag, num_rel_index, axis=0) acc_sum = paddle.cumsum(equal_flag, axis=1) diff --git a/ppcls/optimizer/optimizer.py b/ppcls/optimizer/optimizer.py index c0403cf95cdaf442b6fdaeea54d21a2382e3858b..c446fc1dec9e01fe85dfac68dee829082655826a 100644 --- a/ppcls/optimizer/optimizer.py +++ b/ppcls/optimizer/optimizer.py @@ -272,3 +272,145 @@ class AdamW(object): def _apply_decay_param_fun(self, name): return name not in self.no_weight_decay_param_name_list + + +class AdamWDL(object): + """ + The AdamWDL optimizer is implemented based on the AdamW Optimization with dynamic lr setting. + Generally it's used for transformer model. + """ + + def __init__(self, + learning_rate=0.001, + beta1=0.9, + beta2=0.999, + epsilon=1e-8, + weight_decay=None, + multi_precision=False, + grad_clip=None, + layerwise_decay=None, + filter_bias_and_bn=True, + **args): + self.learning_rate = learning_rate + self.beta1 = beta1 + self.beta2 = beta2 + self.epsilon = epsilon + self.grad_clip = grad_clip + self.weight_decay = weight_decay + self.multi_precision = multi_precision + self.layerwise_decay = layerwise_decay + self.filter_bias_and_bn = filter_bias_and_bn + + class AdamWDLImpl(optim.AdamW): + def __init__(self, + learning_rate=0.001, + beta1=0.9, + beta2=0.999, + epsilon=1e-8, + parameters=None, + weight_decay=0.01, + apply_decay_param_fun=None, + grad_clip=None, + lazy_mode=False, + multi_precision=False, + layerwise_decay=1.0, + n_layers=12, + name_dict=None, + name=None): + if not isinstance(layerwise_decay, float) and \ + not isinstance(layerwise_decay, fluid.framework.Variable): + raise TypeError("coeff should be float or Tensor.") + self.layerwise_decay = layerwise_decay + self.name_dict = name_dict + self.n_layers = n_layers + self.set_param_lr_fun = self._layerwise_lr_decay + super().__init__( + learning_rate=learning_rate, + parameters=parameters, + beta1=beta1, + beta2=beta2, + epsilon=epsilon, + grad_clip=grad_clip, + name=name, + apply_decay_param_fun=apply_decay_param_fun, + weight_decay=weight_decay, + lazy_mode=lazy_mode, + multi_precision=multi_precision) + + def _append_optimize_op(self, block, param_and_grad): + if self.set_param_lr_fun is None: + return super(AdamLW, self)._append_optimize_op(block, + param_and_grad) + + self._append_decoupled_weight_decay(block, param_and_grad) + prev_lr = param_and_grad[0].optimize_attr["learning_rate"] + self.set_param_lr_fun(self.layerwise_decay, self.name_dict, + self.n_layers, param_and_grad[0]) + # excute Adam op + res = super(optim.AdamW, self)._append_optimize_op(block, + param_and_grad) + param_and_grad[0].optimize_attr["learning_rate"] = prev_lr + return res + + # Layerwise decay + def _layerwise_lr_decay(self, decay_rate, name_dict, n_layers, param): + """ + Args: + decay_rate (float): + The layer-wise decay ratio. + name_dict (dict): + The keys of name_dict is dynamic name of model while the value + of name_dict is static name. + Use model.named_parameters() to get name_dict. + n_layers (int): + Total number of layers in the transformer encoder. + """ + ratio = 1.0 + static_name = name_dict[param.name] + if "blocks" in static_name: + idx = static_name.find("blocks.") + layer = int(static_name[idx:].split(".")[1]) + ratio = decay_rate**(n_layers - layer) + elif "embed" in static_name: + ratio = decay_rate**(n_layers + 1) + param.optimize_attr["learning_rate"] *= ratio + + def __call__(self, model_list): + model = model_list[0] + if self.weight_decay and self.filter_bias_and_bn: + skip = {} + if hasattr(model, 'no_weight_decay'): + skip = model.no_weight_decay() + decay_dict = { + param.name: not (len(param.shape) == 1 or + name.endswith(".bias") or name in skip) + for name, param in model.named_parameters() + if not 'teacher' in name + } + parameters = [ + param for param in model.parameters() + if 'teacher' not in param.name + ] + weight_decay = 0. + else: + parameters = model.parameters() + + opt_args = dict( + learning_rate=self.learning_rate, weight_decay=self.weight_decay) + opt_args['parameters'] = parameters + if decay_dict is not None: + opt_args['apply_decay_param_fun'] = lambda n: decay_dict[n] + opt_args['epsilon'] = self.epsilon + opt_args['beta1'] = self.beta1 + opt_args['beta2'] = self.beta2 + if self.layerwise_decay and self.layerwise_decay < 1.0: + opt_args['layerwise_decay'] = self.layerwise_decay + name_dict = dict() + for n, p in model.named_parameters(): + name_dict[p.name] = n + opt_args['name_dict'] = name_dict + opt_args['n_layers'] = model.get_num_layers() + + optimizer = self.AdamWDLImpl(**opt_args) + + return optimizer diff --git a/test_tipc/README.md b/test_tipc/README.md index 5a3426bcbbb98f7f4f891cd0b72119939f4769a2..e110f475ade01c64c72d6bbbdd1df816e0cb5d13 100644 --- a/test_tipc/README.md +++ b/test_tipc/README.md @@ -110,7 +110,6 @@ bash test_tipc/test_train_inference_python.sh ./test_tipc/configs/MobileNetV3/Mo - [test_train_pact_inference_python 使用](docs/test_train_pact_inference_python.md):测试基于Python的模型PACT在线量化等基本功能。 - [test_train_ptq_inference_python 使用](docs/test_train_ptq_inference_python.md):测试基于Python的模型KL离线量化等基本功能。 - [test_inference_cpp 使用](docs/test_inference_cpp.md) :测试基于C++的模型推理。 -- [test_serving 使用](docs/test_serving.md) :测试基于Paddle Serving的服务化部署功能。 - [test_lite_arm_cpu_cpp 使用](docs/test_lite_arm_cpu_cpp.md): 测试基于Paddle-Lite的ARM CPU端c++预测部署功能. - [test_paddle2onnx 使用](docs/test_paddle2onnx.md):测试Paddle2ONNX的模型转化功能,并验证正确性。 - [test_serving_infer_python 使用](docs/test_serving_infer_python.md):测试python serving功能。 diff --git a/test_tipc/benchmark_train.sh b/test_tipc/benchmark_train.sh index 1ecae126f5b3298c74d4379c2eb78faf5e3798f7..ad952424c13b7bf9738304d759a6c049693e2e61 100644 --- a/test_tipc/benchmark_train.sh +++ b/test_tipc/benchmark_train.sh @@ -179,6 +179,11 @@ for batch_size in ${batch_size_list[*]}; do func_sed_params "$FILENAME" "${line_epoch}" "$epoch" gpu_id=$(set_gpu_id $device_num) + # It is needed that using dali, NHWC and 4 channels when training ResNet50 with AMPO2 + if [[ $model_name == "ResNet50" && $precision == "fp16" ]]; then + sed -i "s/ResNet50.yaml/ResNet50_amp_O2_ultra.yaml/g" $FILENAME + fi + # if bs is big, then copy train_list.txt to generate more train log # At least 25 log number would be good to calculate ips for benchmark system. # So the copy number for train_list is as follows: diff --git a/test_tipc/common_func.sh b/test_tipc/common_func.sh index 4aa3db6ca1d8091c99bf9a50a946417d94b7a791..0b8a660d5434937ac87ef08b076da61fa868e0a0 100644 --- a/test_tipc/common_func.sh +++ b/test_tipc/common_func.sh @@ -77,9 +77,10 @@ function status_check(){ run_command=$2 run_log=$3 model_name=$4 + log_path=$5 if [ $last_status -eq 0 ]; then - echo -e "\033[33m Run successfully with command - ${model_name} - ${run_command}! \033[0m" | tee -a ${run_log} + echo -e "\033[33m Run successfully with command - ${model_name} - ${run_command} - ${log_path} ! \033[0m" | tee -a ${run_log} else - echo -e "\033[33m Run failed with command - ${model_name} - ${run_command}! \033[0m" | tee -a ${run_log} + echo -e "\033[33m Run failed with command - ${model_name} - ${run_command} - ${log_path} ! \033[0m" | tee -a ${run_log} fi } diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_custom_sampler.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_custom_sampler.txt index c1dbc89617fba27a3abfc80d9eca461f4aee212c..f3c78bbd2b5321260969acc9ea1a79fa6f39b1cf 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_custom_sampler.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_custom_sampler.txt @@ -13,14 +13,14 @@ train_infer_img_dir:./dataset/ILSVRC2012/val null:null ## trainer:norm_train -norm_train:tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -o DataLoader.Train.sampler.name="PKSampler" -o DataLoader.Train.sampler.sample_per_id=2 +norm_train:tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml -o DataLoader.Train.sampler.name="PKSampler" -o DataLoader.Train.sampler.sample_per_id=2 pact_train:null fpgm_train:null distill_train:null null:null null:null ## -===========================eval_params=========================== +===========================eval_params=========================== eval:null null:null ## diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_infer_python.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_infer_python.txt index ff3bd088ddf22e20ca2b40aee1175c07961017f3..bdefd9476489bc61b870beb93a056dc2416d5213 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_infer_python.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_infer_python.txt @@ -20,7 +20,7 @@ distill_train:null null:null null:null ## -===========================eval_params=========================== +===========================eval_params=========================== eval:tools/eval.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml null:null ## @@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r infer_model:../inference/ infer_export:True infer_quant:Fasle -inference:python/predict_rec.py -c configs/inference_rec.yaml +inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer" -o Global.use_gpu:True|False -o Global.enable_mkldnn:False -o Global.cpu_num_threads:1 diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt index fdcc052e9bd53af443095586ab858af0049bd75b..2c7788ac6e31424061309e2e45116ea27fedf9a1 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r infer_model:../inference/ infer_export:True infer_quant:Fasle -inference:python/predict_rec.py -c configs/inference_rec.yaml +inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer" -o Global.use_gpu:True|False -o Global.enable_mkldnn:False -o Global.cpu_num_threads:1 diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt index c433838faa65875fdb48679ae7fca47466a2f5c9..b2510e269d8af8018dac9745190f66d3a80615e7 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r infer_model:../inference/ infer_export:True infer_quant:Fasle -inference:python/predict_rec.py -c configs/inference_rec.yaml +inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer" -o Global.use_gpu:True|False -o Global.enable_mkldnn:False -o Global.cpu_num_threads:6 diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_multicard_eval.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_multicard_eval.txt index 165cfa9fb60d9d578ccfe7a9bd0fba5548bed3be..a4ef01b886183e9c18f54d90ab01ad528c908116 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_multicard_eval.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_multicard_eval.txt @@ -20,7 +20,7 @@ distill_train:null null:null null:null ## -===========================eval_params=========================== +===========================eval_params=========================== eval:null null:null ## diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_no_eval.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_no_eval.txt index 1e167518433c4cfe42a931f4314361406941a87f..ba8ef39caaf7140aac059ae2b8a1aa9c962c301f 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_no_eval.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_no_eval.txt @@ -20,7 +20,7 @@ distill_train:null null:null null:null ## -===========================eval_params=========================== +===========================eval_params=========================== eval:null null:null ## diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_pact_infer_python.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_pact_infer_python.txt index f557b258f42d7a793825e5a237de960a8c2e1fd8..5e299948b7876381371668f50ed1cbc9b47a3a93 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_pact_infer_python.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_pact_infer_python.txt @@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r infer_model:../inference/ infer_export:True infer_quant:Fasle -inference:python/predict_rec.py -c configs/inference_rec.yaml +inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer" -o Global.use_gpu:True|False -o Global.enable_mkldnn:False -o Global.cpu_num_threads:1 diff --git a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_ptq_infer_python.txt b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_ptq_infer_python.txt index d5863d8c06a2647b2c52bee77fac02368023f52c..925ee91757acb4f48458933fa2b9163dc9c4557a 100644 --- a/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_ptq_infer_python.txt +++ b/test_tipc/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5_train_ptq_infer_python.txt @@ -20,7 +20,7 @@ distill_train:null null:null null:null ## -===========================eval_params=========================== +===========================eval_params=========================== eval:tools/eval.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml null:null ## @@ -37,7 +37,7 @@ pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/r infer_model:./general_PPLCNet_x2_5_lite_v1.0_infer/ infer_export:True infer_quant:Fasle -inference:python/predict_rec.py -c configs/inference_rec.yaml +inference:python/predict_rec.py -c configs/inference_rec.yaml -o Global.rec_inference_model_dir="./models/general_PPLCNet_x2_5_lite_v1.0_infer" -o Global.use_gpu:True|False -o Global.enable_mkldnn:False -o Global.cpu_num_threads:1 diff --git a/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f22f55a52f869fe7811c7722da5bc63c8111120f --- /dev/null +++ b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,16 @@ +===========================paddle2onnx_params=========================== +model_name:GeneralRecognitionV2_PPLCNetV2_base +python:python3.7 +2onnx: paddle2onnx +--model_dir:./deploy/models/general_PPLCNetV2_base_pretrained_v1.0_infer/ +--model_filename:inference.pdmodel +--params_filename:inference.pdiparams +--save_file:./deploy/models/general_PPLCNetV2_base_pretrained_v1.0_infer/inference.onnx +--opset_version:10 +--enable_onnx_checker:True +inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar +inference:./python/predict_rec.py +Global.use_onnx:True +Global.rec_inference_model_dir:./models/general_PPLCNetV2_base_pretrained_v1.0_infer +Global.use_gpu:False +-c:configs/inference_rec.yaml \ No newline at end of file diff --git a/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_infer_python.txt b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..81b59edbf4e18ac29579c4b45464d1f8a85a2024 --- /dev/null +++ b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_infer_python.txt @@ -0,0 +1,60 @@ +===========================train_params=========================== +model_name:GeneralRecognitionV2_PPLCNetV2_base +python:python3.7 +gpu_list:0|0,1 +-o Global.device:gpu +-o Global.auto_cast:null +-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120 +-o Global.output_dir:./output/ +-o DataLoader.Train.sampler.batch_size:8 +-o Global.pretrained_model:null +train_model_name:latest +train_infer_img_dir:./dataset/ILSVRC2012/val +null:null +## +trainer:norm_train +norm_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Global.eval_during_train=False -o Global.save_interval=2 -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o DataLoader.Train.loader.sampler.batch_size=8 +pact_train:null +fpgm_train:null +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +null:null +## +===========================infer_params========================== +-o Global.save_inference_dir:./inference +-o Global.pretrained_model: +norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +quant_export:null +fpgm_export:null +distill_export:null +kl_quant:null +export2:null +pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams +infer_model:../inference/ +infer_export:True +infer_quant:Fasle +inference:python/predict_rec.py -c configs/inference_rec.yaml +-o Global.use_gpu:True|False +-o Global.enable_mkldnn:False +-o Global.cpu_num_threads:1 +-o Global.batch_size:1 +-o Global.use_tensorrt:False +-o Global.use_fp16:False +-o Global.rec_inference_model_dir:../inference +-o Global.infer_imgs:../dataset/Aliproduct/demo_test/ +-o Global.save_log_path:null +-o Global.benchmark:False +null:null +null:null +===========================train_benchmark_params========================== +batch_size:256 +fp_items:fp32|fp16 +epoch:1 +--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile +flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096 +===========================infer_benchmark_params========================== +random_infer_input:[{float32,[3,224,224]}] \ No newline at end of file diff --git a/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..f1f32751d6ee296c341c0e8382e5fdae2b9335bb --- /dev/null +++ b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_linux_gpu_fleet_normal_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,54 @@ +===========================train_params=========================== +model_name:GeneralRecognition_PPLCNet_x2_5 +python:python3.7 +gpu_list:192.168.0.1,192.168.0.2;0,1 +-o Global.device:gpu +-o Global.auto_cast:null +-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120 +-o Global.output_dir:./output/ +-o DataLoader.Train.sampler.batch_size:8 +-o Global.pretrained_model:null +train_model_name:latest +train_infer_img_dir:./dataset/ILSVRC2012/val +null:null +## +trainer:norm_train +norm_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Global.eval_during_train=False -o Global.save_interval=2 -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o DataLoader.Train.loader.sampler.batch_size=8 +pact_train:null +fpgm_train:null +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +null:null +## +===========================infer_params========================== +-o Global.save_inference_dir:./inference +-o Global.pretrained_model: +norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +quant_export:null +fpgm_export:null +distill_export:null +kl_quant:null +export2:null +pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams +infer_model:../inference/ +infer_export:True +infer_quant:Fasle +inference:python/predict_rec.py -c configs/inference_rec.yaml +-o Global.use_gpu:True|False +-o Global.enable_mkldnn:False +-o Global.cpu_num_threads:1 +-o Global.batch_size:1 +-o Global.use_tensorrt:False +-o Global.use_fp16:False +-o Global.rec_inference_model_dir:../inference +-o Global.infer_imgs:../dataset/Aliproduct/demo_test/ +-o Global.save_log_path:null +-o Global.benchmark:False +null:null +null:null +===========================infer_benchmark_params========================== +random_infer_input:[{float32,[3,224,224]}] \ No newline at end of file diff --git a/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..6e436de4063ac5e4917815b639ae5684fce2831b --- /dev/null +++ b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_linux_gpu_normal_amp_infer_python_linux_gpu_cpu.txt @@ -0,0 +1,52 @@ +===========================train_params=========================== +model_name:GeneralRecognitionV2_PPLCNetV2_base +python:python3.7 +gpu_list:0|0,1 +-o Global.device:gpu +-o Global.auto_cast:null +-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=100 +-o Global.output_dir:./output/ +-o DataLoader.Train.sampler.batch_size:8 +-o Global.pretrained_model:null +train_model_name:latest +train_infer_img_dir:./dataset/ILSVRC2012/val +null:null +## +trainer:amp_train +amp_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o AMP.scale_loss=65536 -o AMP.use_dynamic_loss_scaling=True -o AMP.level=O2 -o Optimizer.multi_precision=True -o Global.eval_during_train=False -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o DataLoader.Train.loader.sampler.batch_size=8 +pact_train:null +fpgm_train:null +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +null:null +## +===========================infer_params========================== +-o Global.save_inference_dir:./inference +-o Global.pretrained_model: +norm_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +quant_export:null +fpgm_export:null +distill_export:null +kl_quant:null +export2:null +pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams +infer_model:../inference/ +infer_export:True +infer_quant:Fasle +inference:python/predict_rec.py -c configs/inference_rec.yaml +-o Global.use_gpu:True|False +-o Global.enable_mkldnn:False +-o Global.cpu_num_threads:6 +-o Global.batch_size:1 +-o Global.use_tensorrt:False +-o Global.use_fp16:False +-o Global.rec_inference_model_dir:../inference +-o Global.infer_imgs:../dataset/Aliproduct/demo_test/ +-o Global.save_log_path:null +-o Global.benchmark:False +null:null +null:null diff --git a/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_pact_infer_python.txt b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_pact_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..68e4c4ac8ec00cb688fefa7d0a0d27bab9fedbe1 --- /dev/null +++ b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_pact_infer_python.txt @@ -0,0 +1,54 @@ +===========================train_params=========================== +model_name:GeneralRecognitionV2_PPLCNetV2_base +python:python3.7 +gpu_list:0 +-o Global.device:gpu +-o Global.auto_cast:null +-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=100 +-o Global.output_dir:./output/ +-o DataLoader.Train.sampler.batch_size:8 +-o Global.pretrained_model:null +train_model_name:latest +train_infer_img_dir:./dataset/ILSVRC2012/val +null:null +## +trainer:pact_train +norm_train:null +pact_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -o Slim.quant.name=pact -o Optimizer.lr.learning_rate=0.006 -o Global.pretrained_model="pretrained_model/general_PPLCNetV2_base_pretrained_v1.0" -o DataLoader.Train.dataset.cls_label_path=./dataset/train_reg_all_data.txt -o AMP=None -o DataLoader.Train.sampler.batch_size=8 +fpgm_train:null +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Slim.quant.name=pact +null:null +## +===========================infer_params========================== +-o Global.save_inference_dir:./inference +-o Global.pretrained_model: +norm_export:null +quant_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Slim.quant.name=pact +fpgm_export:null +distill_export:null +kl_quant:null +export2:null +pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams +infer_model:../inference/ +infer_export:True +infer_quant:Fasle +inference:python/predict_rec.py -c configs/inference_rec.yaml +-o Global.use_gpu:True|False +-o Global.enable_mkldnn:False +-o Global.cpu_num_threads:1 +-o Global.batch_size:1 +-o Global.use_tensorrt:False +-o Global.use_fp16:False +-o Global.rec_inference_model_dir:../inference +-o Global.infer_imgs:../dataset/Aliproduct/demo_test/ +-o Global.save_log_path:null +-o Global.benchmark:True +null:null +null:null +===========================infer_benchmark_params========================== +random_infer_input:[{float32,[3,224,224]}] \ No newline at end of file diff --git a/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_ptq_infer_python.txt b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_ptq_infer_python.txt new file mode 100644 index 0000000000000000000000000000000000000000..d82e871fb30aa70093b1f91aab3e0327fd16eedf --- /dev/null +++ b/test_tipc/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base_train_ptq_infer_python.txt @@ -0,0 +1,54 @@ +===========================train_params=========================== +model_name:GeneralRecognitionV2_PPLCNetV2_base +python:python3.7 +gpu_list:0 +-o Global.device:gpu +-o Global.auto_cast:null +-o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=100 +-o Global.output_dir:./output/ +-o DataLoader.Train.sampler.batch_size:8 +-o Global.pretrained_model:null +train_model_name:latest +train_infer_img_dir:./dataset/ILSVRC2012/val +null:null +## +trainer:pact_train +norm_train:null +pact_train:tools/train.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False +fpgm_train:null +distill_train:null +null:null +null:null +## +===========================eval_params=========================== +eval:tools/eval.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +null:null +## +===========================infer_params========================== +-o Global.save_inference_dir:./inference +-o Global.pretrained_model: +norm_export:null +quant_export:tools/export_model.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml +fpgm_export:null +distill_export:null +kl_quant:deploy/slim/quant_post_static.py -c ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml -o Global.save_inference_dir=./general_PPLCNetV2_base_pretrained_v1.0_infer +export2:null +pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar +infer_model:./general_PPLCNetV2_base_pretrained_v1.0_infer +infer_export:True +infer_quant:Fasle +inference:python/predict_rec.py -c configs/inference_rec.yaml +-o Global.use_gpu:True|False +-o Global.enable_mkldnn:False +-o Global.cpu_num_threads:1 +-o Global.batch_size:1 +-o Global.use_tensorrt:False +-o Global.use_fp16:False +-o Global.rec_inference_model_dir:../inference +-o Global.infer_imgs:../dataset/Aliproduct/demo_test/ +-o Global.save_log_path:null +-o Global.benchmark:False +null:null +null:null +===========================infer_benchmark_params========================== +random_infer_input:[{float32,[3,224,224]}] \ No newline at end of file diff --git a/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..52a88a3183e1dde7e75177160e868c2617dc75cb --- /dev/null +++ b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_infer_cpp_linux_gpu_cpu.txt @@ -0,0 +1,19 @@ +===========================cpp_infer_params=========================== +model_name:PPShiTuV2 +cpp_infer_type:shitu +feature_inference_model_dir:./general_PPLCNetV2_base_pretrained_v1.0_infer/ +det_inference_model_dir:./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ +cls_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar +det_inference_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar +infer_quant:False +inference_cmd:./deploy/cpp_shitu/build/pp_shitu -c inference_drink.yaml +use_gpu:True|False +enable_mkldnn:False +cpu_threads:1 +batch_size:1 +use_tensorrt:False +precision:fp32 +data_dir:./dataset/drink_dataset_v2.0 +benchmark:True +generate_yaml_cmd:python3.7 test_tipc/generate_cpp_yaml.py +transform_index_cmd:python3.7 deploy/cpp_shitu/tools/transform_id_map.py -c inference_drink.yaml diff --git a/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..bc0c086b5c7c713392bbb09a34994a631035dc26 --- /dev/null +++ b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_serving_cpp_linux_gpu_cpu.txt @@ -0,0 +1,18 @@ +===========================serving_params=========================== +model_name:PPShiTuV2 +python:python3.7 +cls_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar +det_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar +trans_model:-m paddle_serving_client.convert +--dirname:./models/general_PPLCNetV2_base_pretrained_v1.0_infer/ +--dirname:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ +--model_filename:inference.pdmodel +--params_filename:inference.pdiparams +--serving_server:./models/general_PPLCNetV2_base_pretrained_v1.0_serving/ +--serving_client:./models/general_PPLCNetV2_base_pretrained_v1.0_client/ +--serving_server:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ +--serving_client:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/ +serving_dir:./paddleserving/recognition +web_service:null +--use_gpu:0|null +pipline:test_cpp_serving_client.py diff --git a/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..1c08a4cf3b21a9edbb2847b6ca4589f50136b60f --- /dev/null +++ b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt @@ -0,0 +1,18 @@ +===========================serving_params=========================== +model_name:PPShiTuV2 +python:python3.7 +cls_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar +det_inference_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar +trans_model:-m paddle_serving_client.convert +--dirname:./models/general_PPLCNetV2_base_pretrained_v1.0_infer/ +--dirname:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ +--model_filename:inference.pdmodel +--params_filename:inference.pdiparams +--serving_server:./models/general_PPLCNetV2_base_pretrained_v1.0_serving/ +--serving_client:./models/general_PPLCNetV2_base_pretrained_v1.0_client/ +--serving_server:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ +--serving_client:./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/ +serving_dir:./paddleserving/recognition +web_service:recognition_web_service.py +--use_gpu:0|null +pipline:pipeline_http_client.py diff --git a/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_mainbody_det_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_mainbody_det_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt new file mode 100644 index 0000000000000000000000000000000000000000..bfd24bb4106245d7b279d0e7c07ffbc39f28fe83 --- /dev/null +++ b/test_tipc/configs/PP-ShiTuV2/PPShiTuV2_mainbody_det_linux_gpu_normal_normal_paddle2onnx_python_linux_cpu.txt @@ -0,0 +1,16 @@ +===========================paddle2onnx_params=========================== +model_name:PP-ShiTu_mainbody_det +python:python3.7 +2onnx: paddle2onnx +--model_dir:./deploy/models/picodet_lcnet_x2_5_640_mainbody_infer/ +--model_filename:inference.pdmodel +--params_filename:inference.pdiparams +--save_file:./deploy/models/picodet_lcnet_x2_5_640_mainbody_infer/inference.onnx +--opset_version:11 +--enable_onnx_checker:True +inference_model_url:https://paddledet.bj.bcebos.com/models/picodet_lcnet_x2_5_640_mainbody_infer.tar +inference:null +Global.use_onnx:null +Global.inference_model_dir:null +Global.use_gpu:null +-c:null \ No newline at end of file diff --git a/test_tipc/docs/test_inference_cpp.md b/test_tipc/docs/test_inference_cpp.md index 256e6a5f4d10d2e64e71778fe43f84c1784b3000..86a486ae93ca9f6664f9db708406e840609e2948 100644 --- a/test_tipc/docs/test_inference_cpp.md +++ b/test_tipc/docs/test_inference_cpp.md @@ -12,6 +12,7 @@ Linux GPU/CPU C++ 推理功能测试的主程序为`test_inference_cpp.sh`,可 | MobileNetV3 | MobileNetV3_large_x1_0_KL | 支持 | 支持 | | MobileNetV3 | MobileNetV3_large_x1_0_PACT | 支持 | 支持 | | PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | +| PP-ShiTuV2 | PPShiTuV2_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 | | PPHGNet | PPHGNet_small | 支持 | 支持 | diff --git a/test_tipc/docs/test_serving_infer_cpp.md b/test_tipc/docs/test_serving_infer_cpp.md index 3ddd0c253b9d596697da8108e9cf563a21bf0cba..d1dc02ac375a46a888a66ed240304f241dee25cb 100644 --- a/test_tipc/docs/test_serving_infer_cpp.md +++ b/test_tipc/docs/test_serving_infer_cpp.md @@ -15,6 +15,7 @@ Linux GPU/CPU C++ 服务化部署测试的主程序为`test_serving_infer_cpp.sh | PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 | +| PP-ShiTuV2 | PPShiTuV2_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PPHGNet | PPHGNet_small | 支持 | 支持 | | PPHGNet | PPHGNet_small_KL | 支持 | 支持 | | PPHGNet | PPHGNet_small_PACT | 支持 | 支持 | diff --git a/test_tipc/docs/test_serving_infer_python.md b/test_tipc/docs/test_serving_infer_python.md index 9bd9dc4c2d65f2d55a0335d012fdef37b7097f54..6314031c6963662a0df2fd81dca57fd86f1ae9b0 100644 --- a/test_tipc/docs/test_serving_infer_python.md +++ b/test_tipc/docs/test_serving_infer_python.md @@ -15,6 +15,7 @@ Linux GPU/CPU PYTHON 服务化部署测试的主程序为`test_serving_infer_pyt | PP-ShiTu | PPShiTu_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_KL | 支持 | 支持 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5_PACT | 支持 | 支持 | +| PP-ShiTuV2 | PPShiTuV2_general_rec、PPShiTu_mainbody_det | 支持 | 支持 | | PPHGNet | PPHGNet_small | 支持 | 支持 | | PPHGNet | PPHGNet_small_KL | 支持 | 支持 | | PPHGNet | PPHGNet_small_PACT | 支持 | 支持 | diff --git a/test_tipc/docs/test_train_amp_inference_python.md b/test_tipc/docs/test_train_amp_inference_python.md index a6a5897d3f1afeab333ad40d7b8b8f926b92c68e..b9fd6cb68981938120b2b2e610a517cc47592f61 100644 --- a/test_tipc/docs/test_train_amp_inference_python.md +++ b/test_tipc/docs/test_train_amp_inference_python.md @@ -10,6 +10,7 @@ Linux GPU/CPU 混合精度训练推理测试的主程序为`test_train_inference | :-------------: | :-------------------------------------: | :----------: | :----------: | | MobileNetV3 | MobileNetV3_large_x1_0 | 混合精度训练 | 混合精度训练 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 混合精度训练 | 混合精度训练 | +| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | 混合精度训练 | 混合精度训练 | | PPHGNet | PPHGNet_small | 混合精度训练 | 混合精度训练 | | PPHGNet | PPHGNet_tiny | 混合精度训练 | 混合精度训练 | | PPLCNet | PPLCNet_x0_25 | 混合精度训练 | 混合精度训练 | @@ -31,6 +32,7 @@ Linux GPU/CPU 混合精度训练推理测试的主程序为`test_train_inference | :-------------: | :-------------------------------------: | :--------: | :--------: | :-------: | | MobileNetV3 | MobileNetV3_large_x1_0 | 支持 | 支持 | 1 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 支持 | 支持 | 1 | +| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_small | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_tiny | 支持 | 支持 | 1 | | PPLCNet | PPLCNet_x0_25 | 支持 | 支持 | 1 | diff --git a/test_tipc/docs/test_train_pact_inference_python.md b/test_tipc/docs/test_train_pact_inference_python.md index 6aeecad78a27172ea54aa2bbe318f68a5d0ee188..795c70c6447be933674b603130f93aed98c27e08 100644 --- a/test_tipc/docs/test_train_pact_inference_python.md +++ b/test_tipc/docs/test_train_pact_inference_python.md @@ -10,6 +10,7 @@ Linux GPU/CPU PACT量化训练推理测试的主程序为`test_train_inference_p | :-------------: | :-------------------------------------: | :----------: | | MobileNetV3 | MobileNetV3_large_x1_0 | PACT量化训练 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | PACT量化训练 | +| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | PACT量化训练 | | PPHGNet | PPHGNet_small | PACT量化训练 | | PPHGNet | PPHGNet_tiny | PACT量化训练 | | PPLCNet | PPLCNet_x0_25 | PACT量化训练 | @@ -31,6 +32,7 @@ Linux GPU/CPU PACT量化训练推理测试的主程序为`test_train_inference_p | :-------------: | :-------------------------------------: | :--------: | :--------: | :-------: | | MobileNetV3 | MobileNetV3_large_x1_0 | 支持 | 支持 | 1 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | 支持 | 支持 | 1 | +| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_small | 支持 | 支持 | 1 | | PPHGNet | PPHGNet_tiny | 支持 | 支持 | 1 | | PPLCNet | PPLCNet_x0_25 | 支持 | 支持 | 1 | diff --git a/test_tipc/docs/test_train_ptq_inference_python.md b/test_tipc/docs/test_train_ptq_inference_python.md index 29d5b9f59b31d96dd5fe2b325853fee911a87c97..18eddd127934fc63d82140c196913248572ae321 100644 --- a/test_tipc/docs/test_train_ptq_inference_python.md +++ b/test_tipc/docs/test_train_ptq_inference_python.md @@ -10,6 +10,7 @@ Linux GPU/CPU KL离线量化推理测试的主程序为`test_ptq_inference_pytho | :-------------: | :-------------------------------------: | :----------: | | MobileNetV3 | MobileNetV3_large_x1_0 | KL离线量化 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | KL离线量化 | +| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | KL离线量化 | | PPHGNet | PPHGNet_small | KL离线量化 | | PPHGNet | PPHGNet_tiny | KL离线量化 | | PPLCNet | PPLCNet_x0_25 | KL离线量化 | @@ -31,6 +32,7 @@ Linux GPU/CPU KL离线量化推理测试的主程序为`test_ptq_inference_pytho | :-------------: | :-------------------------------------: | :----------: | | MobileNetV3 | MobileNetV3_large_x1_0 | KL离线量化 | | PP-ShiTu | GeneralRecognition_PPLCNet_x2_5 | KL离线量化 | +| PP-ShiTuV2 | GeneralRecognitionV2_PPLCNetV2_base | KL离线量化 | | PPHGNet | PPHGNet_small | KL离线量化 | | PPHGNet | PPHGNet_tiny | KL离线量化 | | PPLCNet | PPLCNet_x0_25 | KL离线量化 | diff --git a/test_tipc/prepare.sh b/test_tipc/prepare.sh index 0ccb8a95878ab1f406586bc07b5bf8929601349f..81841b612d602b7d45768768ded4faa3ae6d781e 100644 --- a/test_tipc/prepare.sh +++ b/test_tipc/prepare.sh @@ -42,6 +42,10 @@ function func_get_url_file_name() { model_name=$(func_parser_value "${lines[1]}") +# install paddleclas whl +python_name=$(func_parser_value "${lines[2]}") +${python_name} setup.py install + if [[ ${MODE} = "cpp_infer" ]]; then if [ -d "./deploy/cpp/opencv-3.4.7/opencv3/" ] && [ $(md5sum ./deploy/cpp/opencv-3.4.7.tar.gz | awk -F ' ' '{print $1}') = "faa2b5950f8bee3f03118e600c74746a" ]; then echo "################### build opencv skipped ###################" @@ -139,6 +143,8 @@ if [[ ${MODE} = "cpp_infer" ]]; then cd dataset wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar tar -xf drink_dataset_v1.0.tar + wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar + tar -xf drink_dataset_v2.0.tar else echo "Wrong cpp type in config file in line 3. only support cls, shitu" fi @@ -167,8 +173,9 @@ if [[ $model_name == *ShiTu* ]]; then ln -s demo_test.txt val_list.txt cd ../../ eval "wget -nc $model_url_value --no-check-certificate" - mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams - exit 0 + if [[ -d "./general_PPLCNet_x2_5_pretrained_v1.0.pdparams" ]]; then + mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams + fi fi if [[ $FILENAME == *use_dali* ]]; then @@ -240,12 +247,12 @@ elif [[ ${MODE} = "whole_infer" ]]; then cd ../../ fi # download inference or pretrained model - eval "wget -nc $model_url_value" + eval "wget -nc ${model_url_value}" if [[ ${model_url_value} =~ ".tar" ]]; then tar_name=$(func_get_url_file_name "${model_url_value}") - echo $tar_name - rm -rf {tar_name} - tar xf ${tar_name} + echo ${tar_name} + eval "tar -xf ${tar_name}" + rm -f ${tar_name} fi if [[ $model_name == "SwinTransformer_large_patch4_window7_224" || $model_name == "SwinTransformer_large_patch4_window12_384" ]]; then cmd="mv ${model_name}_22kto1k_pretrained.pdparams ${model_name}_pretrained.pdparams" @@ -275,7 +282,7 @@ fi if [[ ${MODE} = "serving_infer" ]]; then # prepare serving env python_name=$(func_parser_value "${lines[2]}") - if [[ ${model_name} = "PPShiTu" ]]; then + if [[ ${model_name} =~ "PPShiTu" ]]; then cls_inference_model_url=$(func_parser_value "${lines[3]}") cls_tar_name=$(func_get_url_file_name "${cls_inference_model_url}") det_inference_model_url=$(func_parser_value "${lines[4]}") @@ -283,6 +290,8 @@ if [[ ${MODE} = "serving_infer" ]]; then cd ./deploy wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar --no-check-certificate tar -xf drink_dataset_v1.0.tar + wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar --no-check-certificate + tar -xf drink_dataset_v2.0.tar mkdir models cd models wget -nc ${cls_inference_model_url} && tar xf ${cls_tar_name} @@ -314,8 +323,9 @@ if [[ ${MODE} = "paddle2onnx_infer" ]]; then # prepare paddle2onnx env python_name=$(func_parser_value "${lines[2]}") inference_model_url=$(func_parser_value "${lines[10]}") - tar_name=${inference_model_url##*/} - + tar_name=$(func_get_url_file_name "$inference_model_url") + + ${python_name} -m pip install onnx ${python_name} -m pip install paddle2onnx ${python_name} -m pip install onnxruntime if [[ ${model_name} =~ "GeneralRecognition" ]]; then @@ -332,14 +342,12 @@ if [[ ${MODE} = "paddle2onnx_infer" ]]; then rm -rf val_list.txt ln -s demo_test.txt val_list.txt cd ../../ - eval "wget -nc $model_url_value --no-check-certificate" - mv general_PPLCNet_x2_5_pretrained_v1.0.pdparams GeneralRecognition_PPLCNet_x2_5_pretrained.pdparams fi cd deploy mkdir models cd models wget -nc ${inference_model_url} - tar xf ${tar_name} + eval "tar -xf ${tar_name}" cd ../../ fi diff --git a/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_amp_fp16_DP.sh index fabbb9fe62d23aea28d9b09a76bf3ce9174c3fb3..59bb145171eb739bd8f07897d150fb955c02fa50 100644 --- a/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N1C1 max_epochs=1 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_pure_fp16_DP.sh index 86242916f4fbf915f835bcc3f09178ed0fed5722..d4cc04fa451aef18dd856ea8cb8cd7ba7f67752c 100644 --- a/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C1/ResNet50_bs128_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N1C1 max_epochs=1 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_amp_fp16_DP.sh index 878cd119e1b2b108d6d7329b4ee650c9f7e9c926..05b924d2eb95c6a1b67e731e59fd77694a23bec0 100644 --- a/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N1C1 max_epochs=1 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_pure_fp16_DP.sh index 47e94fb4b8df66fcf14dcd9f0a9e644f47ed4f92..1eb86bbac5c76812c57b1d7d4f4f9131a9284938 100644 --- a/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C1/ResNet50_bs256_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N1C1 max_epochs=1 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_amp_fp16_DP.sh index 2d6f17c87589963bbadfdad64d5a698b9258c03f..39d0aa9248260398f6423fd4336540d78c5b1639 100644 --- a/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N1C1 max_epochs=1 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_pure_fp16_DP.sh index 52b8ccc5a3c18f45062d8c06b20fc8b523eec48d..285255a3deb2d8b5edaf83e4bc578d78d08ccba1 100644 --- a/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C1/ResNet50_bs64_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N1C1 max_epochs=1 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_amp_fp16_DP.sh index e12494ff5e3ddaed236835cc7d6d6cbae33b69af..b16379ca65295d46754df3d2167e7158a3b3f740 100644 --- a/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N1C8 max_epochs=8 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_pure_fp16_DP.sh index 08503177f89e75365ba0af8c26d69452067b9146..1965cd48b47008c2e140c149956c0358d1af9350 100644 --- a/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C8/ResNet50_bs128_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N1C8 max_epochs=8 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_amp_fp16_DP.sh index ecabf01177abc7781ad04690f6dfdb26aa1fdc3e..f3f95f6987641b67d087fa2ac8cdd6d92889b0ca 100644 --- a/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N1C8 max_epochs=8 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_pure_fp16_DP.sh index 0296cb7b200b959795840359bc734653471cb788..3f02f44576b5cf18e1932342bdc50e8b74b4d7d1 100644 --- a/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C8/ResNet50_bs256_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N1C8 max_epochs=8 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_amp_fp16_DP.sh index ba666114be5fd7053efafc0ab8a6bcdc43513457..b5b8e4c685d2e1032b1f9c5d9a505b896862f5cd 100644 --- a/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N1C8 max_epochs=8 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_pure_fp16_DP.sh index a7cd97ea80d30165729d19fa9f0b0a89cdf7e093..50b6fd7f886beb55174b254ef0af0871842ece73 100644 --- a/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N1C8/ResNet50_bs64_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N1C8 max_epochs=8 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_amp_fp16_DP.sh index c96d5603d03ebacafa02c3fb49e0c8bb6b89f531..e5a1d7e5f8b4363c2d936723b1b14f4cd44d6058 100644 --- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N4C32 max_epochs=32 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_pure_fp16_DP.sh index ef3876f1d3348c21df1de3526e162cc2ef34d8d1..7beb72e877a12e17f5c0eaacec15262cfa88ab3b 100644 --- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs128_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N4C32 max_epochs=32 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh index f9f2f76665cb030fbf11414d4f313345a539b12c..a0ebe2d0e42d79938b0aa36f3a0995e3d24f2278 100644 --- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N4C32 max_epochs=32 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh index bef8186ea5e10feda6d62e1df01a41303b5c3469..054693b123df98993d0dac490e8832be9a1a74fc 100644 --- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs256_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N4C32 max_epochs=32 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_amp_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_amp_fp16_DP.sh index 58f68d234d91bd98f4027d739070b045cd33a235..cd318ff71d3c58c3cb138baa595098cfbf3b9f2f 100644 --- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_amp_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_amp_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=amp_fp16 run_mode=DP device_num=N4C32 max_epochs=32 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_pure_fp16_DP.sh b/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_pure_fp16_DP.sh index c764a79e574bfaa5356d180321537dcb6d1fb5bb..75ca5d05270b09e39363a2a372909a43106a3a4e 100644 --- a/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_pure_fp16_DP.sh +++ b/test_tipc/static/ResNet50/N4C32/ResNet50_bs64_pure_fp16_DP.sh @@ -4,7 +4,7 @@ fp_item=pure_fp16 run_mode=DP device_num=N4C32 max_epochs=32 -num_workers=8 +num_workers=4 # get data bash test_tipc/static/${model_item}/benchmark_common/prepare.sh diff --git a/test_tipc/test_inference_cpp.sh b/test_tipc/test_inference_cpp.sh index 24d406b8f06e6fa38385b2252d37d35a87ecffbb..c56bc69602819cbbe3e7eeced13841cec0156085 100644 --- a/test_tipc/test_inference_cpp.sh +++ b/test_tipc/test_inference_cpp.sh @@ -37,7 +37,8 @@ cpp_benchmark_value=$(func_parser_value "${lines[16]}") generate_yaml_cmd=$(func_parser_value "${lines[17]}") transform_index_cmd=$(func_parser_value "${lines[18]}") -LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +CLS_ROOT_PATH=$(pwd) +LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}" mkdir -p ${LOG_PATH} status_log="${LOG_PATH}/results_cpp.log" # generate_yaml_cmd="python3 test_tipc/generate_cpp_yaml.py" @@ -70,7 +71,7 @@ function func_shitu_cpp_inference(){ command="${_script} > ${_save_log_path} 2>&1" eval $command last_status=${PIPESTATUS[0]} - status_check $last_status "${command}" "${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" done done done @@ -94,7 +95,7 @@ function func_shitu_cpp_inference(){ command="${_script} > ${_save_log_path} 2>&1" eval $command last_status=${PIPESTATUS[0]} - status_check $last_status "${command}" "${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" done done done @@ -126,13 +127,12 @@ function func_cls_cpp_inference(){ precison="int8" fi _save_log_path="${_log_path}/cpp_infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log" - command="${generate_yaml_cmd} --type cls --batch_size ${batch_size} --mkldnn ${use_mkldnn} --gpu ${use_gpu} --cpu_thread ${threads} --tensorrt False --precision ${precision} --data_dir ${_img_dir} --benchmark True --cls_model_dir ${cpp_infer_model_dir} --gpu_id ${GPUID}" eval $command command1="${_script} > ${_save_log_path} 2>&1" eval ${command1} last_status=${PIPESTATUS[0]} - status_check $last_status "${command1}" "${status_log}" "${model_name}" + status_check $last_status "${command1}" "${status_log}" "${model_name}" "${_save_log_path}" done done done @@ -155,7 +155,7 @@ function func_cls_cpp_inference(){ command="${_script} > ${_save_log_path} 2>&1" eval $command last_status=${PIPESTATUS[0]} - status_check $last_status "${command}" "${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" done done done diff --git a/test_tipc/test_inference_jeston.sh b/test_tipc/test_inference_jeston.sh index 56845003908c1a9cc8ac1b76e40ec108d33e8478..7fc8adf5b772264237bbd0070b7d98d1aee13027 100644 --- a/test_tipc/test_inference_jeston.sh +++ b/test_tipc/test_inference_jeston.sh @@ -42,7 +42,8 @@ infer_key1=$(func_parser_key "${lines[17]}") infer_value1=$(func_parser_value "${lines[17]}") -LOG_PATH="./test_tipc/output" +CLS_ROOT_PATH=$(pwd) +LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output" mkdir -p ${LOG_PATH} status_log="${LOG_PATH}/results_python.log" @@ -71,7 +72,7 @@ if [ ${MODE} = "whole_infer" ]; then echo $export_cmd eval $export_cmd status_export=$? - status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" + status_check $status_export "${export_cmd}" "${status_log}" "${model_name}" "" else save_infer_dir=${infer_model} fi diff --git a/test_tipc/test_lite_arm_cpu_cpp.sh b/test_tipc/test_lite_arm_cpu_cpp.sh index 919226eea5ce38b82fad6c2130a7c6467b6ee041..07fcbe209de9d8eb88580130e4077e8c0fa6063d 100644 --- a/test_tipc/test_lite_arm_cpu_cpp.sh +++ b/test_tipc/test_lite_arm_cpu_cpp.sh @@ -1,6 +1,5 @@ #!/bin/bash source test_tipc/common_func.sh -current_path=$PWD IFS=$'\n' @@ -33,7 +32,8 @@ num_threads_list=$(func_parser_value_lite "${tipc_lines[5]}" ":") batch_size_list=$(func_parser_value_lite "${tipc_lines[6]}" ":") precision_list=$(func_parser_value_lite "${tipc_lines[7]}" ":") -LOG_PATH=${current_path}"/output" +CLS_ROOT_PATH=$(pwd) +LOG_PATH="${CLS_ROOT_PATH}/output" mkdir -p ${LOG_PATH} status_log="${LOG_PATH}/results.log" @@ -65,9 +65,9 @@ function func_test_tipc(){ real_inference_cmd=$(echo ${inference_cmd} | awk -F " " '{print path $1" "path $2" "path $3}' path="$lite_arm_work_path") command1="adb push ${_basic_config} ${lite_arm_work_path}" eval ${command1} - command2="adb shell 'export LD_LIBRARY_PATH=${lite_arm_work_path}; ${real_inference_cmd}' > ${_save_log_path} 2>&1" + command2="adb shell 'export LD_LIBRARY_PATH=${lite_arm_work_path}; ${real_inference_cmd}' > ${_save_log_path} 2>&1" eval ${command2} - status_check $? "${command2}" "${status_log}" "${model_name}" + status_check $? "${command2}" "${status_log}" "${model_name}" "${_save_log_path}" done done done diff --git a/test_tipc/test_paddle2onnx.sh b/test_tipc/test_paddle2onnx.sh index d025fb2efd672baab42e4617a13dd127d90a73bc..d5687e1546a94bce1304575a7bcf50f718cfdada 100644 --- a/test_tipc/test_paddle2onnx.sh +++ b/test_tipc/test_paddle2onnx.sh @@ -2,7 +2,7 @@ source test_tipc/common_func.sh FILENAME=$1 -MODE=$2 +MODE="paddle2onnx_infer" # parser params dataline=$(awk 'NR==1, NR==16{print}' $FILENAME) @@ -36,7 +36,8 @@ inference_hardware_value=$(func_parser_value "${lines[14]}") inference_config_key=$(func_parser_key "${lines[15]}") inference_config_value=$(func_parser_value "${lines[15]}") -LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +CLS_ROOT_PATH=$(pwd) +LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}" mkdir -p ${LOG_PATH} status_log="${LOG_PATH}/results_paddle2onnx.log" @@ -46,27 +47,29 @@ function func_paddle2onnx(){ _script=$1 # paddle2onnx - _save_log_path=".${LOG_PATH}/paddle2onnx_infer_cpu.log" set_dirname=$(func_set_params "${infer_model_dir_key}" "${infer_model_dir_value}") set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_save_model=$(func_set_params "${save_file_key}" "${save_file_value}") set_opset_version=$(func_set_params "${opset_version_key}" "${opset_version_value}") set_enable_onnx_checker=$(func_set_params "${enable_onnx_checker_key}" "${enable_onnx_checker_value}") - trans_model_cmd="${padlle2onnx_cmd} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_save_model} ${set_opset_version} ${set_enable_onnx_checker}" + trans_log="${LOG_PATH}/trans_model.log" + trans_model_cmd="${padlle2onnx_cmd} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_save_model} ${set_opset_version} ${set_enable_onnx_checker} --enable_dev_version=False > ${trans_log} 2>&1" eval $trans_model_cmd last_status=${PIPESTATUS[0]} - status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log}" # python inference if [[ ${inference_py} != "null" ]]; then + _save_log_path="${LOG_PATH}/paddle2onnx_infer_cpu.log" set_model_dir=$(func_set_params "${inference_model_dir_key}" "${inference_model_dir_value}") set_use_onnx=$(func_set_params "${use_onnx_key}" "${use_onnx_value}") set_hardware=$(func_set_params "${inference_hardware_key}" "${inference_hardware_value}") set_inference_config=$(func_set_params "${inference_config_key}" "${inference_config_value}") + infer_model_cmd="cd deploy && ${python} ${inference_py} -o ${set_model_dir} -o ${set_use_onnx} -o ${set_hardware} ${set_inference_config} > ${_save_log_path} 2>&1 && cd ../" eval $infer_model_cmd - status_check $last_status "${infer_model_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${infer_model_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" fi } @@ -75,4 +78,4 @@ echo "################### run test ###################" export Count=0 IFS="|" -func_paddle2onnx \ No newline at end of file +func_paddle2onnx diff --git a/test_tipc/test_ptq_inference_python.sh b/test_tipc/test_ptq_inference_python.sh index 82c9816478f9ea993b2e53f8a685766e8dbf81d7..e6801f640b07df5f4694ef943404974f4fbf07fc 100644 --- a/test_tipc/test_ptq_inference_python.sh +++ b/test_tipc/test_ptq_inference_python.sh @@ -94,7 +94,8 @@ if [[ $MODE = 'benchmark_train' ]]; then epoch_num=1 fi -LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +CLS_ROOT_PATH=$(pwd) +LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}" mkdir -p ${LOG_PATH} status_log="${LOG_PATH}/results_python.log" @@ -123,7 +124,7 @@ function func_inference() { eval $command last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${command}" "../${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" done done done @@ -145,7 +146,7 @@ function func_inference() { eval $command last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${command}" "../${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" done done done @@ -168,6 +169,6 @@ if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ]; ln -s __params__ inference.pdiparams cd ../../deploy is_quant=True - func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "../${LOG_PATH}" "${infer_img_dir}" ${is_quant} + func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "${LOG_PATH}" "${infer_img_dir}" ${is_quant} cd .. fi diff --git a/test_tipc/test_serving_infer_cpp.sh b/test_tipc/test_serving_infer_cpp.sh index fdb7ef186bafd9dcd879150188e1f0450ca87211..01e5601cf10f2cf86c9396883f9baef011ca6ba0 100644 --- a/test_tipc/test_serving_infer_cpp.sh +++ b/test_tipc/test_serving_infer_cpp.sh @@ -38,10 +38,10 @@ pipeline_py=$(func_parser_value "${lines[13]}") function func_serving_cls(){ - LOG_PATH="test_tipc/output/${model_name}" + CLS_ROOT_PATH=$(pwd) + LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/serving_infer" mkdir -p ${LOG_PATH} - LOG_PATH="../../${LOG_PATH}" - status_log="${LOG_PATH}/results_serving.log" + status_log="${LOG_PATH}/results_cpp_serving.log" IFS='|' # pdserving @@ -53,8 +53,11 @@ function func_serving_cls(){ for python_ in ${python[*]}; do if [[ ${python_} =~ "python" ]]; then - trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" + trans_log="${LOG_PATH}/cpp_trans_model.log" + trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_log} 2>&1" eval ${trans_model_cmd} + last_status=${PIPESTATUS[0]} + status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log}" break fi done @@ -102,32 +105,34 @@ function func_serving_cls(){ for use_gpu in ${web_use_gpu_list[*]}; do if [[ ${use_gpu} = "null" ]]; then - web_service_cpp_cmd="${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 &" + server_log_path="${LOG_PATH}/cpp_server_cpu.log" + web_service_cpp_cmd="nohup ${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 > ${server_log_path} 2>&1 &" eval ${web_service_cpp_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 5s - _save_log_path="${LOG_PATH}/server_infer_cpp_cpu_pipeline_batchsize_1.log" + _save_log_path="${LOG_PATH}/cpp_client_cpu.log" pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 " eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" eval "${python_} -m paddle_serving_server.serve stop" sleep 5s else - web_service_cpp_cmd="${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 --gpu_id=${use_gpu} &" + server_log_path="${LOG_PATH}/cpp_server_gpu.log" + web_service_cpp_cmd="nohup ${python_} -m paddle_serving_server.serve --model ${serving_server_dir_name} --op GeneralClasOp --port 9292 --gpu_id=${use_gpu} > ${server_log_path} 2>&1 &" eval ${web_service_cpp_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 8s - _save_log_path="${LOG_PATH}/server_infer_cpp_gpu_pipeline_batchsize_1.log" + _save_log_path="${LOG_PATH}/cpp_client_gpu.log" pipeline_cmd="${python_} test_cpp_serving_client.py > ${_save_log_path} 2>&1 " eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" sleep 5s eval "${python_} -m paddle_serving_server.serve stop" fi @@ -136,10 +141,11 @@ function func_serving_cls(){ function func_serving_rec(){ - LOG_PATH="test_tipc/output/${model_name}" + CLS_ROOT_PATH=$(pwd) + LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/serving_infer" mkdir -p ${LOG_PATH} - LOG_PATH="../../../${LOG_PATH}" - status_log="${LOG_PATH}/results_serving.log" + status_log="${LOG_PATH}/results_cpp_serving.log" + trans_model_py=$(func_parser_value "${lines[5]}") cls_infer_model_dir_key=$(func_parser_key "${lines[6]}") cls_infer_model_dir_value=$(func_parser_value "${lines[6]}") @@ -181,20 +187,36 @@ function func_serving_rec(){ set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}") set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}") - cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" + trans_cls_log="${LOG_PATH}/cpp_trans_model_cls.log" + cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_cls_log} 2>&1" eval ${cls_trans_model_cmd} + last_status=${PIPESTATUS[0]} + status_check $last_status "${cls_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_cls_log}" set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}") set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}") set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}") - det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" + trans_det_log="${LOG_PATH}/cpp_trans_model_det.log" + det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_det_log} 2>&1" eval ${det_trans_model_cmd} - - cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ${cls_serving_server_value}" + last_status=${PIPESTATUS[0]} + status_check $last_status "${det_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_det_log}" + + OLD_IFS="${IFS}" + IFS='/' + tmp_arr=($cls_serving_server_value) + lastIndex=$((${#tmp_arr[@]}-1)) + cls_serving_server_dirname="${tmp_arr[lastIndex]}" + tmp_arr=($cls_serving_client_value) + lastIndex=$((${#tmp_arr[@]}-1)) + cls_serving_client_dirname="${tmp_arr[lastIndex]}" + IFS="${OLD_IFS}" + + cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/${cls_serving_server_dirname}/*.prototxt ${cls_serving_server_value}" eval ${cp_prototxt_cmd} - cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ${cls_serving_client_value}" + cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/${cls_serving_client_dirname}/*.prototxt ${cls_serving_client_value}" eval ${cp_prototxt_cmd} cp_prototxt_cmd="cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ${det_serving_client_value}" eval ${cp_prototxt_cmd} @@ -215,32 +237,34 @@ function func_serving_rec(){ for use_gpu in ${web_use_gpu_list[*]}; do if [ ${use_gpu} = "null" ]; then det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value") - web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 &" + server_log_path="${LOG_PATH}/cpp_server_cpu.log" + web_service_cpp_cmd="nohup ${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 > ${server_log_path} 2>&1 &" eval ${web_service_cpp_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 5s - _save_log_path="${LOG_PATH}/server_infer_cpp_cpu_batchsize_1.log" + _save_log_path="${LOG_PATH}/cpp_client_cpu.log" pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 " eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" eval "${python_} -m paddle_serving_server.serve stop" sleep 5s else det_serving_server_dir_name=$(func_get_url_file_name "$det_serving_server_value") + server_log_path="${LOG_PATH}/cpp_server_gpu.log" web_service_cpp_cmd="${python_interp} -m paddle_serving_server.serve --model ../../${det_serving_server_value} ../../${cls_serving_server_value} --op GeneralPicodetOp GeneralFeatureExtractOp --port 9400 --gpu_id=${use_gpu} &" eval ${web_service_cpp_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cpp_cmd}" "${status_log}" "${model_name}" ${server_log_path} sleep 5s - _save_log_path="${LOG_PATH}/server_infer_cpp_gpu_batchsize_1.log" + _save_log_path="${LOG_PATH}/cpp_client_gpu.log" pipeline_cmd="${python_interp} ${pipeline_py} > ${_save_log_path} 2>&1 " eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check ${last_status} "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" eval "${python_} -m paddle_serving_server.serve stop" sleep 5s fi diff --git a/test_tipc/test_serving_infer_python.sh b/test_tipc/test_serving_infer_python.sh index 050c3c89c9a454eb7f973f405ded37a3f1df042a..2c5a15e0592c9668a004aef2bf828ddc7df80935 100644 --- a/test_tipc/test_serving_infer_python.sh +++ b/test_tipc/test_serving_infer_python.sh @@ -36,13 +36,16 @@ web_service_py=$(func_parser_value "${lines[11]}") web_use_gpu_key=$(func_parser_key "${lines[12]}") web_use_gpu_list=$(func_parser_value "${lines[12]}") pipeline_py=$(func_parser_value "${lines[13]}") +use_mkldnn="False" +threads="1" function func_serving_cls(){ - LOG_PATH="test_tipc/output/${model_name}/${MODE}" + CLS_ROOT_PATH=$(pwd) + LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}" mkdir -p ${LOG_PATH} - LOG_PATH="../../${LOG_PATH}" status_log="${LOG_PATH}/results_serving.log" + IFS='|' # pdserving @@ -54,8 +57,11 @@ function func_serving_cls(){ for python_ in ${python[*]}; do if [[ ${python_} =~ "python" ]]; then - trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" + trans_log="${LOG_PATH}/python_trans_model.log" + trans_model_cmd="${python_} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_log} 2>&1" eval ${trans_model_cmd} + last_status=${PIPESTATUS[0]} + status_check $last_status "${trans_model_cmd}" "${status_log}" "${model_name}" "${trans_log}" break fi done @@ -96,19 +102,19 @@ function func_serving_cls(){ devices_line=27 set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml" eval ${set_devices_cmd} - - web_service_cmd="${python_} ${web_service_py} &" + server_log_path="${LOG_PATH}/python_server_cpu.log" + web_service_cmd="nohup ${python_} ${web_service_py} > ${server_log_path} 2>&1 &" eval ${web_service_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 5s for pipeline in ${pipeline_py[*]}; do - _save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_batchsize_1.log" + _save_log_path="${LOG_PATH}/python_client_cpu_${pipeline%_client*}_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log" pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1 " eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" sleep 5s done eval "${python_} -m paddle_serving_server.serve stop" @@ -130,19 +136,19 @@ function func_serving_cls(){ devices_line=27 set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml" eval ${set_devices_cmd} - - web_service_cmd="${python_} ${web_service_py} & " + server_log_path="${LOG_PATH}/python_server_gpu_usetrt_${use_trt}_precision_${precision}.log" + web_service_cmd="nohup ${python_} ${web_service_py} > ${server_log_path} 2>&1 &" eval ${web_service_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 5s for pipeline in ${pipeline_py[*]}; do - _save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_batchsize_1.log" + _save_log_path="${LOG_PATH}/python_client_gpu_${pipeline%_client*}_usetrt_${use_trt}_precision_${precision}_batchsize_1.log" pipeline_cmd="${python_} ${pipeline} > ${_save_log_path} 2>&1" eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" sleep 5s done eval "${python_} -m paddle_serving_server.serve stop" @@ -154,10 +160,11 @@ function func_serving_cls(){ function func_serving_rec(){ - LOG_PATH="test_tipc/output/${model_name}/${MODE}" + CLS_ROOT_PATH=$(pwd) + LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}" mkdir -p ${LOG_PATH} - LOG_PATH="../../../${LOG_PATH}" status_log="${LOG_PATH}/results_serving.log" + trans_model_py=$(func_parser_value "${lines[5]}") cls_infer_model_dir_key=$(func_parser_key "${lines[6]}") cls_infer_model_dir_value=$(func_parser_value "${lines[6]}") @@ -199,16 +206,22 @@ function func_serving_rec(){ set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_serving_server=$(func_set_params "${cls_serving_server_key}" "${cls_serving_server_value}") set_serving_client=$(func_set_params "${cls_serving_client_key}" "${cls_serving_client_value}") - cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" + trans_cls_log="${LOG_PATH}/python_trans_model_cls.log" + cls_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_cls_log} 2>&1" eval ${cls_trans_model_cmd} + last_status=${PIPESTATUS[0]} + status_check $last_status "${cls_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_cls_log}" set_dirname=$(func_set_params "${det_infer_model_dir_key}" "${det_infer_model_dir_value}") set_model_filename=$(func_set_params "${model_filename_key}" "${model_filename_value}") set_params_filename=$(func_set_params "${params_filename_key}" "${params_filename_value}") set_serving_server=$(func_set_params "${det_serving_server_key}" "${det_serving_server_value}") set_serving_client=$(func_set_params "${det_serving_client_key}" "${det_serving_client_value}") - det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client}" + trans_det_log="${LOG_PATH}/python_trans_model_det.log" + det_trans_model_cmd="${python_interp} ${trans_model_py} ${set_dirname} ${set_model_filename} ${set_params_filename} ${set_serving_server} ${set_serving_client} > ${trans_det_log} 2>&1" eval ${det_trans_model_cmd} + last_status=${PIPESTATUS[0]} + status_check $last_status "${det_trans_model_cmd}" "${status_log}" "${model_name}" "${trans_det_log}" # modify the alias_name of fetch_var to "outputs" server_fetch_var_line_cmd="sed -i '/fetch_var/,/is_lod_tensor/s/alias_name: .*/alias_name: \"features\"/' $cls_serving_server_value/serving_server_conf.prototxt" @@ -239,19 +252,19 @@ function func_serving_rec(){ devices_line=27 set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"\"/' config.yml" eval ${set_devices_cmd} - - web_service_cmd="${python} ${web_service_py} &" + server_log_path="${LOG_PATH}/python_server_cpu.log" + web_service_cmd="nohup ${python} ${web_service_py} > ${server_log_path} 2>&1 &" eval ${web_service_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 5s for pipeline in ${pipeline_py[*]}; do - _save_log_path="${LOG_PATH}/server_infer_cpu_${pipeline%_client*}_batchsize_1.log" - pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1 " + _save_log_path="${LOG_PATH}/python_client_cpu_${pipeline%_client*}_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_1.log" + pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1" eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" sleep 5s done eval "${python_} -m paddle_serving_server.serve stop" @@ -273,19 +286,19 @@ function func_serving_rec(){ devices_line=27 set_devices_cmd="sed -i '${devices_line}s/devices: .*/devices: \"${use_gpu}\"/' config.yml" eval ${set_devices_cmd} - - web_service_cmd="${python} ${web_service_py} & " + server_log_path="${LOG_PATH}/python_server_gpu_usetrt_${use_trt}_precision_${precision}.log" + web_service_cmd="nohup ${python} ${web_service_py} > ${server_log_path} 2>&1 &" eval ${web_service_cmd} last_status=${PIPESTATUS[0]} - status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${web_service_cmd}" "${status_log}" "${model_name}" "${server_log_path}" sleep 10s for pipeline in ${pipeline_py[*]}; do - _save_log_path="${LOG_PATH}/server_infer_gpu_${pipeline%_client*}_batchsize_1.log" + _save_log_path="${LOG_PATH}/python_client_gpu_${pipeline%_client*}_usetrt_${use_trt}_precision_${precision}_batchsize_1.log" pipeline_cmd="${python} ${pipeline} > ${_save_log_path} 2>&1" eval ${pipeline_cmd} last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" + status_check $last_status "${pipeline_cmd}" "${status_log}" "${model_name}" "${_save_log_path}" sleep 10s done eval "${python_} -m paddle_serving_server.serve stop" @@ -311,7 +324,7 @@ echo "################### run test ###################" export Count=0 IFS="|" -if [[ ${model_name} = "PPShiTu" ]]; then +if [[ ${model_name} =~ "PPShiTu" ]]; then func_serving_rec else func_serving_cls diff --git a/test_tipc/test_train_inference_python.sh b/test_tipc/test_train_inference_python.sh index 9ec79bb29ce69b908fb7c003086e013c55de2517..88274731aabb60a55bbfd0d0103269fd523e2224 100644 --- a/test_tipc/test_train_inference_python.sh +++ b/test_tipc/test_train_inference_python.sh @@ -95,7 +95,8 @@ if [[ $MODE = 'benchmark_train' ]]; then epoch_num=1 fi -LOG_PATH="./test_tipc/output/${model_name}/${MODE}" +CLS_ROOT_PATH=$(pwd) +LOG_PATH="${CLS_ROOT_PATH}/test_tipc/output/${model_name}/${MODE}" mkdir -p ${LOG_PATH} status_log="${LOG_PATH}/results_python.log" @@ -107,24 +108,27 @@ function func_inference() { _log_path=$4 _img_dir=$5 _flag_quant=$6 + _gpu=$7 # inference for use_gpu in ${use_gpu_list[*]}; do if [ ${use_gpu} = "False" ] || [ ${use_gpu} = "cpu" ]; then for use_mkldnn in ${use_mkldnn_list[*]}; do for threads in ${cpu_threads_list[*]}; do for batch_size in ${batch_size_list[*]}; do - _save_log_path="${_log_path}/infer_cpu_usemkldnn_${use_mkldnn}_threads_${threads}_batchsize_${batch_size}.log" - set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") - set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") - set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") - set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}") - set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}") - set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}") - command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 " - eval $command - last_status=${PIPESTATUS[0]} - eval "cat ${_save_log_path}" - status_check $last_status "${command}" "../${status_log}" "${model_name}" + for precision in ${precision_list[*]}; do + _save_log_path="${_log_path}/python_infer_cpu_gpus_${_gpu}_usemkldnn_${use_mkldnn}_threads_${threads}_precision_${precision}_batchsize_${batch_size}.log" + set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") + set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") + set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") + set_cpu_threads=$(func_set_params "${cpu_threads_key}" "${threads}") + set_model_dir=$(func_set_params "${infer_model_key}" "${_model_dir}") + set_infer_params1=$(func_set_params "${infer_key1}" "${infer_value1}") + command="${_python} ${_script} ${use_gpu_key}=${use_gpu} ${use_mkldnn_key}=${use_mkldnn} ${set_cpu_threads} ${set_model_dir} ${set_batchsize} ${set_infer_data} ${set_benchmark} ${set_infer_params1} > ${_save_log_path} 2>&1 " + eval $command + last_status=${PIPESTATUS[0]} + eval "cat ${_save_log_path}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" + done done done done @@ -135,7 +139,7 @@ function func_inference() { continue fi for batch_size in ${batch_size_list[*]}; do - _save_log_path="${_log_path}/infer_gpu_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log" + _save_log_path="${_log_path}/python_infer_gpu_gpus_${_gpu}_usetrt_${use_trt}_precision_${precision}_batchsize_${batch_size}.log" set_infer_data=$(func_set_params "${image_dir_key}" "${_img_dir}") set_benchmark=$(func_set_params "${benchmark_key}" "${benchmark_value}") set_batchsize=$(func_set_params "${batch_size_key}" "${batch_size}") @@ -146,7 +150,7 @@ function func_inference() { eval $command last_status=${PIPESTATUS[0]} eval "cat ${_save_log_path}" - status_check $last_status "${command}" "../${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${_save_log_path}" done done done @@ -161,17 +165,19 @@ if [[ ${MODE} = "whole_infer" ]]; then # for kl_quant if [ ${kl_quant_cmd_value} != "null" ] && [ ${kl_quant_cmd_value} != "False" ]; then echo "kl_quant" - command="${python} ${kl_quant_cmd_value}" + log_path="${LOG_PATH}/export.log" + command="${python} ${kl_quant_cmd_value} > ${log_path} 2>&1" echo ${command} eval $command last_status=${PIPESTATUS[0]} - status_check $last_status "${command}" "${status_log}" "${model_name}" + status_check $last_status "${command}" "${status_log}" "${model_name}" "${log_path}" cd ${infer_model_dir_list}/quant_post_static_model - ln -s __model__ inference.pdmodel - ln -s __params__ inference.pdiparams + ln -s model.pdmodel inference.pdmodel + ln -s model.pdiparams inference.pdiparams cd ../../deploy is_quant=True - func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "../${LOG_PATH}" "${infer_img_dir}" ${is_quant} + gpu=0 + func_inference "${python}" "${inference_py}" "../${infer_model_dir_list}/quant_post_static_model" "${LOG_PATH}" "${infer_img_dir}" "${is_quant}" "${gpu}" cd .. fi else @@ -240,7 +246,7 @@ else if [ ${#ips} -le 15 ]; then # if length of ips >= 15, then it is seen as multi-machine # 15 is the min length of ips info for multi-machine: 0.0.0.0,0.0.0.0 - save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}" + save_log="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_1" nodes=1 else IFS="," @@ -257,18 +263,23 @@ else set_save_model=$(func_set_params "${save_model_key}" "${save_log}") if [ ${#gpu} -le 2 ]; then # train with cpu or single gpu - cmd="${python} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} " + cmd="${python} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1} " elif [ ${#ips} -le 15 ]; then # train with multi-gpu - cmd="${python} -m paddle.distributed.launch --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1}" + cmd="${python} -m paddle.distributed.launch --devices=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_epoch} ${set_pretrain} ${set_autocast} ${set_batchsize} ${set_train_params1}" else # train with multi-machine - cmd="${python} -m paddle.distributed.launch --ips=${ips} --gpus=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_pretrain} ${set_epoch} ${set_autocast} ${set_batchsize} ${set_train_params1}" + cmd="${python} -m paddle.distributed.launch --ips=${ips} --devices=${gpu} ${run_train} ${set_use_gpu} ${set_save_model} ${set_pretrain} ${set_epoch} ${set_autocast} ${set_batchsize} ${set_train_params1}" fi # run train eval "unset CUDA_VISIBLE_DEVICES" # export FLAGS_cudnn_deterministic=True sleep 5 eval $cmd - status_check $? "${cmd}" "${status_log}" "${model_name}" + if [[ $FILENAME == *GeneralRecognition* ]]; then + eval "cat ${save_log}/RecModel/train.log >> ${save_log}.log" + else + eval "cat ${save_log}/${model_name}/train.log >> ${save_log}.log" + fi + status_check $? "${cmd}" "${status_log}" "${model_name}" "${save_log}.log" sleep 5 if [[ $FILENAME == *GeneralRecognition* ]]; then @@ -283,9 +294,10 @@ else # run eval if [ ${eval_py} != "null" ]; then set_eval_params1=$(func_set_params "${eval_key1}" "${eval_value1}") - eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1}" + eval_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_eval.log" + eval_cmd="${python} ${eval_py} ${set_eval_pretrain} ${set_use_gpu} ${set_eval_params1} > ${eval_log_path} 2>&1" eval $eval_cmd - status_check $? "${eval_cmd}" "${status_log}" "${model_name}" + status_check $? "${eval_cmd}" "${status_log}" "${model_name}" "${eval_log_path}" sleep 5 fi # run export model @@ -298,15 +310,16 @@ else set_export_weight=$(func_set_params "${export_weight}" "${save_log}/${model_name}/${train_model_name}") fi set_save_infer_key=$(func_set_params "${save_infer_key}" "${save_infer_path}") - export_cmd="${python} ${run_export} ${set_export_weight} ${set_save_infer_key}" + export_log_path="${LOG_PATH}/${trainer}_gpus_${gpu}_autocast_${autocast}_nodes_${nodes}_export.log" + export_cmd="${python} ${run_export} ${set_export_weight} ${set_save_infer_key} > ${export_log_path} 2>&1" eval $export_cmd - status_check $? "${export_cmd}" "${status_log}" "${model_name}" + status_check $? "${export_cmd}" "${status_log}" "${model_name}" "${export_log_path}" - #run inference + # run inference eval $env save_infer_path="${save_log}" cd deploy - func_inference "${python}" "${inference_py}" "../${save_infer_path}" "../${LOG_PATH}" "${infer_img_dir}" "${flag_quant}" + func_inference "${python}" "${inference_py}" "${save_infer_path}" "${LOG_PATH}" "${infer_img_dir}" "${flag_quant}" "${gpu}" cd .. fi eval "unset CUDA_VISIBLE_DEVICES" diff --git a/test_tipc/test_train_inference_python_npu.sh b/test_tipc/test_train_inference_python_npu.sh new file mode 100644 index 0000000000000000000000000000000000000000..e933eff5b3518e717de503932a901efdfcfd17c9 --- /dev/null +++ b/test_tipc/test_train_inference_python_npu.sh @@ -0,0 +1,49 @@ +#!/bin/bash +source test_tipc/common_func.sh + +function readlinkf() { + perl -MCwd -e 'print Cwd::abs_path shift' "$1"; +} + +function func_parser_config() { + strs=$1 + IFS=" " + array=(${strs}) + tmp=${array[2]} + echo ${tmp} +} + +BASEDIR=$(dirname "$0") +REPO_ROOT_PATH=$(readlinkf ${BASEDIR}/../) + +FILENAME=$1 + +# change gpu to npu in tipc txt configs +sed -i "s/Global.device:gpu/Global.device:npu/g" $FILENAME +sed -i "s/Global.use_gpu/Global.use_npu/g" $FILENAME +dataline=`cat $FILENAME` + +# parser params +IFS=$'\n' +lines=(${dataline}) + +# replace inference config file +inference_py=$(func_parser_value "${lines[39]}") +inference_config=$(func_parser_config ${inference_py}) +sed -i 's/use_gpu: True/use_npu: True/g' "$REPO_ROOT_PATH/deploy/$inference_config" + +# replace training config file +grep -n 'tools/.*yaml' $FILENAME | cut -d ":" -f 1 \ +| while read line_num ; do + train_cmd=$(func_parser_value "${lines[line_num-1]}") + trainer_config=$(func_parser_config ${train_cmd}) + sed -i 's/device: gpu/device: npu/g' "$REPO_ROOT_PATH/$trainer_config" +done + +# change gpu to npu in execution script +sed -i "s/\"gpu\"/\"npu\"/g" test_tipc/test_train_inference_python.sh + +# pass parameters to test_train_inference_python.sh +cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} $2" +echo $cmd +eval $cmd diff --git a/test_tipc/test_train_inference_python_xpu.sh b/test_tipc/test_train_inference_python_xpu.sh new file mode 100644 index 0000000000000000000000000000000000000000..124d39935c9cf618e8952ac46cb4568614f5cedd --- /dev/null +++ b/test_tipc/test_train_inference_python_xpu.sh @@ -0,0 +1,49 @@ +#!/bin/bash +source test_tipc/common_func.sh + +function readlinkf() { + perl -MCwd -e 'print Cwd::abs_path shift' "$1"; +} + +function func_parser_config() { + strs=$1 + IFS=" " + array=(${strs}) + tmp=${array[2]} + echo ${tmp} +} + +BASEDIR=$(dirname "$0") +REPO_ROOT_PATH=$(readlinkf ${BASEDIR}/../) + +FILENAME=$1 + +# change gpu to xpu in tipc txt configs +sed -i "s/Global.device:gpu/Global.device:xpu/g" $FILENAME +sed -i "s/Global.use_gpu/Global.use_xpu/g" $FILENAME +dataline=`cat $FILENAME` + +# parser params +IFS=$'\n' +lines=(${dataline}) + +# replace inference config file +inference_py=$(func_parser_value "${lines[39]}") +inference_config=$(func_parser_config ${inference_py}) +sed -i 's/use_gpu: True/use_xpu: True/g' "$REPO_ROOT_PATH/deploy/$inference_config" + +# replace training config file +grep -n 'tools/.*yaml' $FILENAME | cut -d ":" -f 1 \ +| while read line_num ; do + train_cmd=$(func_parser_value "${lines[line_num-1]}") + trainer_config=$(func_parser_config ${train_cmd}) + sed -i 's/device: gpu/device: xpu/g' "$REPO_ROOT_PATH/$trainer_config" +done + +# change gpu to npu in execution script +sed -i "s/\"gpu\"/\"npu\"/g" test_tipc/test_train_inference_python.sh + +# pass parameters to test_train_inference_python.sh +cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} $2" +echo $cmd +eval $cmd