PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
- 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy comparable with SwinTransformer. We also release 9 practical models covering pedestrian, vehicle and OCR.
- 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs.
...
...
@@ -19,24 +33,12 @@ For the introduction of PP-LCNet, please refer to [paper](https://arxiv.org/pdf/
## Features
- A practical image recognition system consist of detection, feature learning and retrieval modules, widely applicable to all types of image recognition tasks.
Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition and animation character recognition.
- Rich library of pre-trained models: Provide a total of 164 ImageNet pre-trained models in 35 series, among which 6 selected series of models support fast structural modification.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
PaddleClas release PP-HGNet、PP-LCNetv2、 PP-LCNet and **S**imple **S**emi-supervised **L**abel **D**istillation algorithms, and support plenty of
image classification and image recognition algorithms.
Based on th algorithms above, PaddleClas release PP-ShiTu image recognition system and [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](docs/en/PULC/PULC_quickstart_en.md).
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with detailed introduction, code replication and evaluation of effectiveness in a unified experimental environment.
PULC solutions consists of PP-LCNet light-weight backbone, SSLD pretrained models, Ensemble of Data Augmentation strategy and SKL-UGI knowledge distillation.
PULC models inference within 3ms on CPU devices, with accuracy comparable with SwinTransformer. We also release 9 practical models covering pedestrian, vehicle and OCR.
@@ -97,8 +109,13 @@ Image recognition can be divided into three steps:
For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
The PULC model zoo is provided here, mainly providing indicators, model storage size, and download links of the model. The pre-trained model can be used for fine-tuning training, and the inference model can be directly used for prediction and deployment.
* The backbone of all the above models is PPLCNet_x1_0. The different sizes of some models are caused by the different output sizes of the classification layer. The inference time is tested on the Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz. During the test process, the MKLDNN acceleration strategy is turned on, and the number of threads is 10. There will be slight fluctuations during the speed test process.
* The evaluation indicators of person_exists, safety_helmet, and car_exists are TprAtFpr. The evaluation indicators of person_attribute and vehicle_attribute are ma. The evaluation indicators of traffic_sign, text_image_orientation, textline_orientation and language_classification are Top-1 Acc.
Please refer to [PaddlePaddle Installation](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html) for more information about installation, for examples other versions.
<aname="1.2"></a>
### 1.2 PaddleClas wheel Installation
```bash
pip3 install paddleclas
```
<aname="2"></a>
## 2. Quick Start
PaddleClas provides a series of test cases, which contain demos of different scenes about people, cars, OCR, etc. Click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download the data.
`Nobody` means there is no one in the image, `someone` means there is someone in the image. Therefore, the prediction result indicates that there is no one in the figure.
**Note**: The "--infer_imgs" argument specify the image(s) to be predict, and you can also specify a directoy contains images. If use other model, you can specify the `--model_name` argument. Please refer to [2.3 Supported Model List](#2.3) for the supported model list.
**Note**: `model.predict()` is a generator, so `next()` or `for` is needed to call it. This would to predict by batch that length is `batch_size`, default by 1. You can specify the argument `batch_size` and `model_name` when instantiating PaddleClas object, for example: `model = paddleclas.PaddleClas(model_name="person_exists", batch_size=2)`. Please refer to [2.3 Supported Model List](#2.3) for the supported model list.
| language_classification | Language Classification |
<aname="3"></a>
## 3. Summary
The PULC series models have been verified to be effective in different scenarios about people, vehicles, OCR, etc. The ultra lightweight model can achieve the accuracy close to SwinTransformer model, and the speed is increased by 40+ times. And PULC also provides the whole process of dataset getting, model training, model compression and deployment. Please refer to [Human Exists Classification](PULC_person_exists_en.md)、[Pedestrian Attribute Classification](PULC_person_attribute_en.md)、[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)、[Traffic Sign Classification](PULC_traffic_sign_en.md)、[Vehicle Attribute Classification](PULC_vehicle_attribute_en.md)、[Car Exists Classification](PULC_car_exists_en.md)、[Text Image Orientation Classification](PULC_text_image_orientation_en.md)、[Text-line Orientation Classification](PULC_textline_orientation_en.md)、[Language Classification](PULC_language_classification_en.md) for more information about different scenarios.
-[3. Training with standard classification configuration](#3)
-[3.1 PP-LCNet as backbone](#3.1)
-[3.2 SSLD pretrained model](#3.2)
-[3.3 EDA strategy](#3.3)
-[3.4 SKL-UGI knowledge distillation](#3.4)
-[3.5 Summary](#3.5)
-[4. Hyperparameter Search](#4)
-[4.1 Search based on default configuration](#4.1)
-[4.2 Custom search configuration](#4.2)
<aname="1"></a>
### 1. Introduction of PULC solution
Image classification is one of the basic algorithms of computer vision, and it is also the most common algorithm in enterprise applications, and further, it is also an important part of many CV applications. In recent years, the backbone network model has developed rapidly, and the accuracy record of ImageNet has been continuously refreshed. However, the performance of these models in practical scenarios is sometimes unsatisfactory. On the one hand, models with high precision tend to have large storage and slow inference speed, which are often difficult to meet actual deployment requirements; on the other hand, after selecting a suitable model, experienced engineers are often required to adjust parameters, which is time-consuming and labor-intensive. In order to solve the problems of enterprise application and make the training and parameter adjustment of classification models easier, PaddleClas summarized and launched a Practical Ultra Lightweight Classification (PULC) solution. PULC integrates various state-of-the-art algorithms such as backbone network, data augmentation and distillation, etc., and finally can automatically obtain a lightweight and high-precision image classification model.
The PULC solution has been verified to be effective in many scenarios, such as human-related scenarios, car-related scenarios, and OCR-related scenarios. With an ultra-lightweight model, the accuracy close to SwinTransformer can be achieved, and the inference speed can be 40+ times faster.
The solution mainly includes 4 parts, namely: PP-LCNet lightweight backbone network, SSLD pre-trained model, Ensemble Data Augmentation (EDA) and SKL-UGI knowledge distillation algorithm. In addition, we also adopt the method of hyperparameter search to efficiently optimize the hyperparameters in training. Below, we take the person exists or not scene as an example to illustrate the solution.
**Note**:For some specific scenarios, we provide basic training documents for reference, such as [person exists or not classification model](PULC_person_exists_en.md), etc. You can find these documents [here](./PULC_model_list_en.md). If the methods in these documents do not meet your needs, or if you need a custom training task, you can refer to this document.
<aname="2"></a>
### 2. Data preparation
<aname="2.1"></a>
#### 2.1 Dataset format description
PaddleClas uses the `txt` format file to specify the training set and validation set. Take the person exists or not scene as an example, you need to specify `train_list.txt` and `val_list.txt` as the data labels of the training set and validation set. The format is in the form of as follows:
```
# Each line uses "space" to separate the image path and label
train/1.jpg 0
train/10.jpg 1
...
```
If you want to get more information about common classification datasets, you can refer to the document [PaddleClas Classification Dataset Format Description](../data_preparation/classification_dataset_en.md).
<aname="2.2"></a>
#### 2.2 Annotation file generation method
If you already have the data in the actual scene, you can label it according to the format in the previous section. Here, we provide a script to quickly generate annotation files. You only need to put different categories of data in folders and run the script to generate annotation files.
First, assume that the path where you store the data is `./train`, `train/` contains the data of each category, the category number starts from 0, and the folder of each category contains specific image data.
```shell
train
├── 0
│ ├── 0.jpg
│ ├── 1.jpg
│ └── ...
└── 1
├── 0.jpg
├── 1.jpg
└── ...
└── ...
```
```shell
tree -r-i-f train | grep-E"jpg|JPG|jpeg|JPEG|png|PNG" | awk-F"/"'{print $0" "$2}'> train_list.txt
```
Among them, if more image name suffixes are involved, the content after `grep -E` can be added, and the `2` in `$2` is the level of the category number folder.
**Note:** The above is an introduction to the method of dataset acquisition and generation. Here you can directly download the person exists or not scene data to quickly start the experience.
Go to the PaddleClas directory.
```
cd path_to_PaddleClas
```
Go to the `dataset/` directory, download and unzip the data.
### 3. Training with standard classification configuration
<aname="3.1"></a>
#### 3.1 PP-LCNet as backbone
PULC adopts the lightweight backbone network PP-LCNet, which is 50% faster than other networks with the same accuracy. You can view the detailed introduction of the backbone network in [PP-LCNet Introduction](../models/PP-LCNet_en.md).
For performance comparison, we also provide configuration files for the large model SwinTransformer_tiny and the lightweight model MobileNetV3_small_x0_35, which you can train with the command:
| MobileNetV3_small_x0_35 | 68.25 | 2.85 | 1.6 | Use ImageNet pretrained model |
| PPLCNet_x1_0 | 89.57 | 2.12 | 6.5 | Use ImageNet pretrained model |
It can be seen that PP-LCNet is much faster than SwinTransformer, but the accuracy is also slightly lower. Below we improve the accuracy of the PP-LCNet model through a series of optimizations.
<aname="3.2"></a>
#### 3.2 SSLD pretrained model
SSLD is a semi-supervised distillation algorithm developed by Baidu. On the ImageNet dataset, the model accuracy can be improved by 3-7 points. You can find a detailed introduction in [SSLD introduction](../advanced_tutorials/distillation/distillation_en.md). We found that using SSLD pre-trained weights can effectively improve the accuracy of the applied classification model. In addition, using a smaller resolution in training can effectively improve model accuracy. At the same time, we also optimize the learning rate.
Based on the above three improvements, the accuracy of our trained model is 92.1%, an increase of 2.6%.
<aname="3.3"></a>
#### 3.3 EDA strategy
Data augmentation is a commonly used optimization strategy in vision algorithms, which can significantly improve the accuracy of the model. In addition to the traditional RandomCrop, RandomFlip, etc. methods, we also apply RandomAugment and RandomErasing. You can find a detailed introduction at [Data Augmentation Introduction](../advanced_tutorials/DataAugmentation_en.md).
Since these two kinds of data augmentation greatly modify the picture, making the classification task more difficult, it may lead to under-fitting of the model on some datasets. We will set the probability of enabling these two methods in advance.
Based on the above improvements, we obtained a model accuracy of 93.43%, an increase of 1.3%.
<aname="3.4"></a>
#### 3.4 SKL-UGI knowledge distillation
Knowledge distillation is a method that can effectively improve the accuracy of small models. You can find a detailed introduction in [Introduction to Knowledge Distillation](../advanced_tutorials/distillation/distillation_en.md). We choose ResNet101_vd as the teacher model for distillation. In order to adapt to the distillation process, we also adjust the learning rate of different stages of the network here. Based on the above improvements, we trained the model to get a model accuracy of 95.6%, an increase of 1.4%.
<aname="3.5"></a>
#### 3.5 Summary
After the optimization of the above methods, the final accuracy of PP-LCNet reaches 95.6%, reaching the accuracy level of the large model. We summarize the experimental results in the following table:
It can be seen from the results that the PULC scheme can improve the model accuracy in multiple application scenarios. Using the PULC scheme can greatly reduce the workload of model optimization and quickly obtain models with higher accuracy.
<aname="4"></a>
### 4. Hyperparameter Search
In the above training process, we adjusted parameters such as learning rate, data augmentation probability, and stage learning rate mult list. The optimal values of these parameters may not be the same in different scenarios. We provide a quick hyperparameter search script to automate the process of hyperparameter tuning. This script traverses the parameters in the search value list to replace the parameters in the default configuration, then trains in sequence, and finally selects the parameters corresponding to the model with the highest accuracy as the search result.
<aname="4.1"></a>
#### 4.1 Search based on default configuration
The configuration file [search.yaml](../../../ppcls/configs/PULC/person_exists/search.yaml) defines the configuration of hyperparameter search in person exists or not scenarios. Use the following commands to complete hyperparameter search.
**Note**:Regarding the search part, we are also constantly improving, so stay tuned.
<aname="4.2"></a>
#### 4.2 Custom search configuration
You can also modify the configuration of hyperparameter search based on training results or your parameter tuning experience.
Modify the `search_values` field in `lrs` to modify the list of learning rate search values;
Modify the `search_values` field in `resolutions` to modify the search value list of resolutions;
Modify the `search_values` field in `ra_probs` to modify the search value list of RandAugment activation probability;
Modify the `search_values` field in `re_probs` to modify the search value list of RnadomErasing on probability;
Modify the `search_values` field in `lr_mult_list` to modify the lr_mult search value list;
Modify the `search_values` field in `teacher` to modify the search list of the teacher model.
After the search is completed, the final results will be generated in `output/search_person_exists`, where, except for `search_res`, the directories in `output/search_person_exists` are the weights and training log files of the results of the corresponding hyperparameters of each search training, ` search_res` corresponds to the result of knowledge distillation, that is, the final model. The weights of the model are stored in `output/output_dir/search_person_exists/DistillationModel/best_model_student.pdparams`.
Paddleclas supports Python WHL package for prediction. At present, WHL package only supports image classification, but does not support subject detection, feature extraction and vector retrieval.
PaddleClas supports Python wheel package for prediction. At present, PaddleClas wheel supports image classification including ImagetNet1k models and PULC models, but does not support mainbody detection, feature extraction and vector retrieval.
---
...
...
@@ -9,7 +9,7 @@ Paddleclas supports Python WHL package for prediction. At present, WHL package o
-[1. Installation](#1)
-[2. Quick Start](#2)
-[3. Definition of Parameters](#3)
-[4. Usage](#4)
-[4. More usage](#4)
-[4.1 View help information](#4.1)
-[4.2 Prediction using inference model provide by PaddleClas](#4.2)
-[4.3 Prediction using local model files](#4.3)
...
...
@@ -20,6 +20,7 @@ Paddleclas supports Python WHL package for prediction. At present, WHL package o
-[4.8 Specify the mapping between class id and label name](#4.8)
<aname="1"></a>
## 1. Installation
* installing from pypi
...
...
@@ -36,8 +37,14 @@ pip3 install dist/*
```
<aname="2"></a>
## 2. Quick Start
* Using the `ResNet50` model provided by PaddleClas, the following image(`'docs/images/inference_deployment/whl_demo.jpg'`) as an example.
<aname="2.1"></a>
### 2.1 ImageNet1k models
Using the `ResNet50` model provided by PaddleClas, the following image(`'docs/images/inference_deployment/whl_demo.jpg'`) as an example.
PULC integrates various state-of-the-art algorithms such as backbone network, data augmentation and distillation, etc., and finally can automatically obtain a lightweight and high-precision image classification model.
PaddleClas provides a series of test cases, which contain demos of different scenes about people, cars, OCR, etc. Click [here](https://paddleclas.bj.bcebos.com/data/PULC/pulc_demo_imgs.zip) to download the data.
Prection using the PULC "Human Exists Classification" model provided by PaddleClas:
`Nobody` means there is no one in the image, `someone` means there is someone in the image. Therefore, the prediction result indicates that there is no one in the figure.
**Note**: `model.predict()` is a generator, so `next()` or `for` is needed to call it. This would to predict by batch that length is `batch_size`, default by 1. You can specify the argument `batch_size` and `model_name` when instantiating PaddleClas object, for example: `model = paddleclas.PaddleClas(model_name="person_exists", batch_size=2)`. Please refer to [Supported Model List](#PULC_Models) for the supported model list.
**Note**: The "--infer_imgs" argument specify the image(s) to be predict, and you can also specify a directoy contains images. If use other model, you can specify the `--model_name` argument. Please refer to [Supported Model List](#PULC_Models) for the supported model list.
| language_classification | Language Classification |
Please refer to [Human Exists Classification](../PULC/PULC_person_exists_en.md)、[Pedestrian Attribute Classification](../PULC/PULC_person_attribute_en.md)、[Classification of Wheather Wearing Safety Helmet](../PULC/PULC_safety_helmet_en.md)、[Traffic Sign Classification](../PULC/PULC_traffic_sign_en.md)、[Vehicle Attribute Classification](../PULC/PULC_vehicle_attribute_en.md)、[Car Exists Classification](../PULC/PULC_car_exists_en.md)、[Text Image Orientation Classification](../PULC/PULC_text_image_orientation_en.md)、[Text-line Orientation Classification](../PULC/PULC_textline_orientation_en.md)、[Language Classification](../PULC/PULC_language_classification_en.md) for more information about different scenarios.
<aname="3"></a>
## 3. Definition of Parameters
The following parameters can be specified in Command Line or used as parameters of the constructor when instantiating the PaddleClas object in Python.
* model_name(str): If using inference model based on ImageNet1k provided by Paddle, please specify the model's name by the parameter.
* inference_model_dir(str): Local model files directory, which is valid when `model_name` is not specified. The directory should contain `inference.pdmodel` and `inference.pdiparams`.
* infer_imgs(str): The path of image to be predicted, or the directory containing the image files, or the URL of the image from Internet.
* use_gpu(bool): Whether to use GPU or not, default by `True`.
* gpu_mem(int): GPU memory usages,default by `8000`。
* use_tensorrt(bool): Whether to open TensorRT or not. Using it can greatly promote predict preformance, default by `False`.
* enable_mkldnn(bool): Whether enable MKLDNN or not, default `False`.
* cpu_num_threads(int): Assign number of cpu threads, valid when `--use_gpu` is `False` and `--enable_mkldnn` is `True`, default by `10`.
* batch_size(int): Batch size, default by `1`.
* resize_short(int): Resize the minima between height and width into `resize_short`, default by `256`.
* crop_size(int): Center crop image to `crop_size`, default by `224`.
* topk(int): Print (return) the `topk` prediction results, default by `5`.
*class_id_map_file(str): The mapping file between class ID and label, default by `ImageNet1K` dataset's mapping.
*pre_label_image(bool): whether prelabel or not, default=False.
* save_dir(str): The directory to save the prediction results that can be used as pre-label, default by `None`, that is, not to save.
* use_gpu(bool): Whether to use GPU or not.
* gpu_mem(int): GPU memory usages.
* use_tensorrt(bool): Whether to open TensorRT or not. Using it can greatly promote predict preformance.
* enable_mkldnn(bool): Whether enable MKLDNN or not.
* cpu_num_threads(int): Assign number of cpu threads, valid when `--use_gpu` is `False` and `--enable_mkldnn` is `True`.
* batch_size(int): Batch size.
* resize_short(int): Resize the minima between height and width into `resize_short`.
* crop_size(int): Center crop image to `crop_size`.
* topk(int): Print (return) the `topk` prediction results when Topk postprocess is used.
*threshold(float): The threshold of ThreshOutput when postprocess is used.
*class_id_map_file(str): The mapping file between class ID and label.
* save_dir(str): The directory to save the prediction results that can be used as pre-label.
**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`. The following is a demo.
@@ -110,6 +182,7 @@ PaddleClas provides two ways to use:
2. Bash command line programming.
<aname="4.1"></a>
### 4.1 View help information
* CLI
...
...
@@ -118,6 +191,7 @@ paddleclas -h
```
<aname="4.2"></a>
### 4.2 Prediction using inference model provide by PaddleClas
You can use the inference model provided by PaddleClas to predict, and only need to specify `model_name`. In this case, PaddleClas will automatically download files of specified model and save them in the directory `~/.paddleclas/`.
You can use the local model files trained by yourself to predict, and only need to specify `inference_model_dir`. Note that the directory must contain `inference.pdmodel` and `inference.pdiparams`.
You can predict the Internet image, only need to specify URL of Internet image by `infer_imgs`. In this case, the image file will be downloaded and saved in the directory `~/.paddleclas/images/`.
In Python code, you can predict the `NumPy.array` format image, only need to use the `infer_imgs` to transfer variable of image data. Note that the models in PaddleClas only support to predict 3 channels image data, and channels order is `RGB`.
...
...
@@ -205,6 +283,7 @@ print(next(result))
```
<aname="4.7"></a>
### 4.7 Save the prediction result(s)
You can save the prediction result(s) as pre-label, only need to use `pre_label_out_dir` to specify the directory to save.
### 4.8 Specify the mapping between class id and label name
You can specify the mapping between class id and label name, only need to use `class_id_map_file` to specify the mapping file. PaddleClas uses ImageNet1K's mapping by default.