The mainbody detection technology is currently a very widely used detection technology, which refers to the detect one or some mainbody objects in the picture, crop the corresponding area in the image and carry out recognition, thereby completing the entire recognition process. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
This tutorial will introduce the dataset and model training for mainbody detection in PaddleClas.
## 1. Dataset
The datasets we used for mainbody detection task are shown in the following table.
| Dataset | Image number | Image number used in <<br>>mainbody detection | Scenarios | Dataset link |
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified to the category `foreground`, and the detection model we trained just contains one category (`foreground`).
## 2. Model Training
There are many types of object detection methods such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on.
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It deeply optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. Finally, it reached the state of the art in terms of "speed-precision". Specifically, the optimization strategy is as follows.
- Better backbone: ResNet50vd-DCN
- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md)
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) used for the model training, in which the dagtaset path is modified to the mainbody detection dataset.
The final inference model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).
### 2.1 Download and Unzip the Inference Model and Demo Data
Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands.
Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands.
```shell
mkdir models
...
...
@@ -73,20 +74,20 @@ cd models
# Download the generic detection inference model and unzip it
Once unpacked, the `dataset` folder should have the following file structure.
```
├── logo_demo_data_v1.0
├── product_demo_data_v1.0
│ ├── data_file.txt
│ ├── gallery
│ ├── index
...
...
@@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler
The `models` folder should have the following file structure.
```
├── logo_rec_ResNet50_Logo3K_v1.0_infer
├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
...
...
@@ -109,35 +110,44 @@ The `models` folder should have the following file structure.
│ └── inference.pdmodel
```
<aname="Logo_recognition_and_retrival"></a>
### 2.2 Logo Recognition and Retrival
<aname="Product_recognition_and_retrival"></a>
### 2.2 Product Recognition and Retrival
Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
<aname="recognition_of_single_image"></a>
#### 2.2.1 Single Image Recognition
Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval
Run the following command to identify and retrieve the image `./dataset/product_demo_data_v1.0/query/wangzai.jpg` for recognition and retrieval
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
There are 4 `旺仔牛奶` results in 5, the recognition result is correct.
The detection result is also saved in the folder `output`, which is shown as follows.
@@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
...
...
@@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining.
### 3.1 Build the Base Library Based on Your Own Dataset
<aname="3.1"></a>
### 3.1 Prepare for the new images and labels
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows.
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one.
Then add some new lines into the new label file, which is shown as follows.
```
gallery/anmuxi/001.jpg 安慕希酸奶
gallery/anmuxi/002.jpg 安慕希酸奶
gallery/anmuxi/003.jpg 安慕希酸奶
gallery/anmuxi/004.jpg 安慕希酸奶
gallery/anmuxi/005.jpg 安慕希酸奶
gallery/anmuxi/006.jpg 安慕希酸奶
```
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `space` here.
First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomation,recording the category of the original images and the label information)store in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt`
<aname="build_a_new_index_library"></a>
### 3.2 Build a new Index Base Library
Then use the following command to build the index to accelerate the retrieval process after recognition.
Use the following command to build the index to accelerate the retrieval process after recognition.
Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index.
Finally, the new index information is stored in the folder`./dataset/product_demo_data_v1.0/index_update`. Use the new index database for the above index.