PaddleClas is a toolset for image classification tasks prepared for the industry and academia. It helps users train better computer vision models and apply them in real scenarios.
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
**Recent update**
**Recent updates**
- 2021.06.16 PaddleClas release/2.2.
- 2021.06.16 PaddleClas release/2.2.
- Add metric learning and vector search module.
- Add metric learning and vector search modules.
- Add product recognition, cartoon character recognition, car recognition and logo recognition.
- Add product recognition, animation character recognition, vehicle recognition and logo recognition.
- Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- 2021.05.14
- 2021.05.14
...
@@ -21,467 +21,97 @@ PaddleClas is a toolset for image classification tasks prepared for the industry
...
@@ -21,467 +21,97 @@ PaddleClas is a toolset for image classification tasks prepared for the industry
-[more](./docs/en/update_history_en.md)
-[more](./docs/en/update_history_en.md)
## Features
## Features
- Rich model zoo. Based on the ImageNet-1k classification dataset, PaddleClas provides 29 series of classification network structures and training configurations, 134 models' pretrained weights and their evaluation metrics.
- A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
- SSLD Knowledge Distillation. Based on this SSLD distillation strategy, the top-1 acc of the distilled model is generally increased by more than 3%.
-Data augmentation: PaddleClas provides detailed introduction of 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, code reproduction and effect evaluation in a unified experimental environment.
-Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 34 series, among which 6 selected series of models support fast structural modification.
-Pretrained model with 100,000 categories: Based on `ResNet50_vd` model, Baidu open sourced the `ResNet50_vd` pretrained model trained on a 100,000-category dataset. In some practical scenarios, the accuracy based on the pretrained weights can be increased by up to 30%.
-Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
-A variety of training modes, including multi-machine training, mixed precision training, etc.
-SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
-A variety of inference and deployment solutions, including TensorRT inference, Paddle-Lite inference, model service deployment, model quantification, Paddle Hub, etc.
-Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
- Support Linux, Windows, macOS and other systems.
*Scan the QR code below with your Wechat and send the message `分类` out, then you will be invited into the official technical exchange group.
*You can also scan the QR code below to join the PaddleClas WeChat group to get more efficient answers to your questions and to communicate with developers from all walks of life. We look forward to hearing from you.
Based on the ImageNet-1k classification dataset, the 24 classification network structures supported by PaddleClas and the corresponding 122 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The evaluation environment is as follows.
* CPU evaluation environment is based on Snapdragon 855 (SD855).
* The GPU evaluation speed is measured by running 500 times under the FP32+TensorRT configuration (excluding the warmup time of the first 10 times).
Curves of accuracy to the inference time of common server-side models are shown as follows.
Curves of accuracy to the inference time and storage size of common mobile-side models are shown as follows.


<aname="SSLD_pretrained_series"></a>
### SSLD pretrained models
Accuracy and inference time of the prtrained models based on SSLD distillation are as follows. More detailed information can be refered to [SSLD distillation tutorial](./docs/en/advanced_tutorials/distillation/distillation_en.md).
* Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
<aname="ResNet_and_Vd_series"></a>
### ResNet and Vd series
Accuracy and inference time metrics of ResNet and Vd series models are shown as follows. More detailed information can be refered to [ResNet and Vd series tutorial](./docs/en/models/ResNet_and_vd_en.md).
Accuracy and inference time metrics of Mobile series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/Mobile_en.md).
Accuracy and inference time metrics of SEResNeXt and Res2Net series models are shown as follows. More detailed information can be refered to [SEResNext and_Res2Net series tutorial](./docs/en/models/SEResNext_and_Res2Net_en.md).
Accuracy and inference time metrics of DPN and DenseNet series models are shown as follows. More detailed information can be refered to [DPN and DenseNet series tutorial](./docs/en/models/DPN_DenseNet_en.md).
Accuracy and inference time metrics of HRNet series models are shown as follows. More detailed information can be refered to [Mobile series tutorial](./docs/en/models/HRNet_en.md).
Accuracy and inference time metrics of Inception series models are shown as follows. More detailed information can be refered to [Inception series tutorial](./docs/en/models/Inception_en.md).
Accuracy and inference time metrics of EfficientNet and ResNeXt101_wsl series models are shown as follows. More detailed information can be refered to [EfficientNet and ResNeXt101_wsl series tutorial](./docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md).
Accuracy and inference time metrics of ResNeSt and RegNet series models are shown as follows. More detailed information can be refered to [ResNeSt and RegNet series tutorial](./docs/en/models/ResNeSt_RegNet_en.md).
Accuracy and inference time metrics of ViT and DeiT series models are shown as follows. More detailed information can be refered to [ViT and DeiT series tutorial](./docs/en/models/ViT_and_DeiT_en.md).
Accuracy and inference time metrics of RepVGG series models are shown as follows. More detailed information can be refered to [RepVGG series tutorial](./docs/en/models/RepVGG_en.md).
Accuracy and inference time metrics of MixNet series models are shown as follows. More detailed information can be refered to [MixNet series tutorial](./docs/en/models/MixNet_en.md).
Accuracy and inference time metrics of ReXNet series models are shown as follows. More detailed information can be refered to [ReXNet series tutorial](./docs/en/models/ReXNet_en.md).
Accuracy and inference time metrics of SwinTransformer series models are shown as follows. More detailed information can be refered to [SwinTransformer series tutorial](./docs/en/models/SwinTransformer_en.md).
[1]:Based on imagenet22k dataset pre-training, and then in imagenet1k dataset transfer learning.
<aname="Others"></a>
### Others
Accuracy and inference time metrics of AlexNet, SqueezeNet series, VGG series and DarkNet53 models are shown as follows. More detailed information can be refered to [Others](./docs/en/models/Others_en.md).
<aname="Introduction to Image Recognition Systems"></a>
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
<aname="License"></a>
<aname="License"></a>
## License
PaddleClas is released under the <ahref="https://github.com/PaddlePaddle/PaddleClas/blob/master/LICENSE">Apache 2.0 license</a>
## License
PaddleClas is released under the Apache 2.0 license <ahref="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>
<aname="Contribution"></a>
<aname="Contribution"></a>
## Contribution
## Contribution
Contributions are highly welcomed and we would really appreciate your feedback!!
Contributions are highly welcomed and we would really appreciate your feedback!!
- Thank [nblib](https://github.com/nblib) to fix bug of RandErasing.
- Thank [nblib](https://github.com/nblib) to fix bug of RandErasing.
- Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas.
- Thank [chenpy228](https://github.com/chenpy228) to fix some typos PaddleClas.
- Thank [jm12138](https://github.com/jm12138) to add ViT, DeiT models and RepVGG models into PaddleClas.
- Thank [jm12138](https://github.com/jm12138) to add ViT, DeiT models and RepVGG models into PaddleClas.
- Thank [FutureSI](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/76563) to parse and summarize the PaddleClas code.
The mainbody detection technology is currently a very widely used detection technology, which refers to the detect one or some mainbody objects in the picture, crop the corresponding area in the image and carry out recognition, thereby completing the entire recognition process. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
This tutorial will introduce the dataset and model training for mainbody detection in PaddleClas.
## 1. Dataset
The datasets we used for mainbody detection task are shown in the following table.
| Dataset | Image number | Image number used in <<br>>mainbody detection | Scenarios | Dataset link |
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified to the category `foreground`, and the detection model we trained just contains one category (`foreground`).
## 2. Model Training
There are many types of object detection methods such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on.
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It deeply optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. Finally, it reached the state of the art in terms of "speed-precision". Specifically, the optimization strategy is as follows.
- Better backbone: ResNet50vd-DCN
- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md)
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) used for the model training, in which the dagtaset path is modified to the mainbody detection dataset.
The final inference model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).
### 2.1 Download and Unzip the Inference Model and Demo Data
### 2.1 Download and Unzip the Inference Model and Demo Data
Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands.
Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands.
```shell
```shell
mkdir models
mkdir models
...
@@ -73,20 +74,20 @@ cd models
...
@@ -73,20 +74,20 @@ cd models
# Download the generic detection inference model and unzip it
# Download the generic detection inference model and unzip it
Once unpacked, the `dataset` folder should have the following file structure.
Once unpacked, the `dataset` folder should have the following file structure.
```
```
├── logo_demo_data_v1.0
├── product_demo_data_v1.0
│ ├── data_file.txt
│ ├── data_file.txt
│ ├── gallery
│ ├── gallery
│ ├── index
│ ├── index
...
@@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler
...
@@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler
The `models` folder should have the following file structure.
The `models` folder should have the following file structure.
```
```
├── logo_rec_ResNet50_Logo3K_v1.0_infer
├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
│ └── inference.pdmodel
...
@@ -109,35 +110,44 @@ The `models` folder should have the following file structure.
...
@@ -109,35 +110,44 @@ The `models` folder should have the following file structure.
│ └── inference.pdmodel
│ └── inference.pdmodel
```
```
<aname="Logo_recognition_and_retrival"></a>
<aname="Product_recognition_and_retrival"></a>
### 2.2 Logo Recognition and Retrival
### 2.2 Product Recognition and Retrival
Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
<aname="recognition_of_single_image"></a>
<aname="recognition_of_single_image"></a>
#### 2.2.1 Single Image Recognition
#### 2.2.1 Single Image Recognition
Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval
Run the following command to identify and retrieve the image `./dataset/product_demo_data_v1.0/query/wangzai.jpg` for recognition and retrieval
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
There are 4 `旺仔牛奶` results in 5, the recognition result is correct.
The detection result is also saved in the folder `output`, which is shown as follows.
@@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th
...
@@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
...
@@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th
...
@@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining.
### 3.1 Build the Base Library Based on Your Own Dataset
### 3.1 Prepare for the new images and labels
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows.
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one.
Then add some new lines into the new label file, which is shown as follows.
```
gallery/anmuxi/001.jpg 安慕希酸奶
gallery/anmuxi/002.jpg 安慕希酸奶
gallery/anmuxi/003.jpg 安慕希酸奶
gallery/anmuxi/004.jpg 安慕希酸奶
gallery/anmuxi/005.jpg 安慕希酸奶
gallery/anmuxi/006.jpg 安慕希酸奶
```
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `space` here.
First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomation,recording the category of the original images and the label information)store in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt`
<aname="build_a_new_index_library"></a>
### 3.2 Build a new Index Base Library
Then use the following command to build the index to accelerate the retrieval process after recognition.
Use the following command to build the index to accelerate the retrieval process after recognition.
Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index.
Finally, the new index information is stored in the folder`./dataset/product_demo_data_v1.0/index_update`. Use the new index database for the above index.