The mainbody detection technology is currently a widely used detection technology, which refers to a whole image recognition process of identifying the coordinate position of one or more objects and then cropping down the corresponding area for recognition. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
This tutorial will introduce the technology from three aspects, namely, the datasets, model selection and model training.
## Dataset
The datasets we used for mainbody detection tasks are shown in the following table.
| Dataset | Image Number | Image Number Used in Mainbody Detection | Scenarios | Dataset Link |
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified as `foreground`, and the detection model we trained only contains one category (`foreground`).
## Model Selection
There are a wide variety of object detection methods, such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on. PaddleDetection has its self-developed PP-YOLO models for server-side scenarios and PicoDet models for end-side scenarios (CPU and mobile), which all take the lead in the area.
Build on the studies above, PaddleClas provides lightweight and server-side main body detection models for end-side scenarios and server-side scenarios respectively. The table below presents the average mAP of the 5 datasets and the comparison of their model sizes and inference speed.
| Model | Model Structure | Download Link of Pre-trained Model | Download Link of Inference Model | mAP | Size of Inference Model (MB) | Inference Time per Image (preprocessing excluded)(ms) |
- Detailed information of the CPU of the speed evaluation machine:`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.The speed indicator is the testing result when mkldnn is on and the number of threads is set to 10.
- Mainbody detection has a time-consuming preprocessing procedure, with an average time of about 40 to 55 ms per image in the above machine. Therefore, it is not included in the inference time.
### Lightweight Mainbody Detection Model
PicoDet, introduced by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), is an object detection algorithm applied to CPU or mobile-side scenarios. It integrates the following optimization algorithm.
For more details of optimized PicoDet and benchmark, you can refer to [Tutorials of PicoDet Models](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md).
To balance the detection speed and effects in lightweight mainbody detection tasks, we adopt PPLCNet_x2_5 as the backbone of the model and revise the image scale for training and inference to 640x640, with the rest configured the same as [picodet_m_shufflenetv2_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/picodet_m_shufflenetv2_416_coco.yml). The final detection model is obtained after the training of customized mainbody detection datasets.
### Server-side Mainbody Detection Model
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It greatly optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. It reaches the state of the art in terms of "speed-precision". The optimization strategy is as follows.
- Better backbone: ResNet50vd-DCN
- Larger training batch size of 8 GPUs and mini-batch size of 24 on each GPU, which is corresponding to learning rate and the number of iterations.
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md).
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml), in which the dataset path is modified to the customized mainbody detection dataset. The final detection model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).
## Model Training
This section mainly talks about how to train your own mainbody detection model using PaddleDetection on your own datasets.
### Prepare For the Environment
Download PaddleDetection and install requirements.
For more installation tutorials, please refer to [Installation Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL.md)
### Prepare For the Dataset
For customized dataset, you should convert it to COCO format. Please refer to [Customized Dataset Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md) to build your own datasets with COCO format.
In mainbody detection task, all the objects belong to foregroud. Therefore, `category_id` of all the objects in the annotation file should be modified to 1. And the `categories` map should be modified as follows, in which just class `foregroud` is included.
`ppyolov2_r50vd_dcn_365e_coco.yml` depends on other configuration files, their meanings are as follows.
```
coco_detection.yml:path of train/eval/test dataset.
runtime.yml:public runtime parameters, including whethre to use GPU, epoch number for checkpoint saving, etc.
optimizer_365e.yml:learning rate and optimizer.
ppyolov2_r50vd_dcn.yml:model architecture and backbone.
ppyolov2_reader.yml:train/eval/test reader, such as batch size, the number of concurrently loaded sub-processes, etc., and includes post-read pre-processing operations, such as resize, data enhancement, etc.
```
In mainbody detection task, you need to modify `num_classes` in `datasets/coco_detection.yml` to 1 (only `foreground` is included), while modify the paths of the training and testing datasets to those of the customized datasets.
In addition, the above files can also be modified according to real situations, for example, if the video memory is overflowing, the batch size and learning rate can be reduced in equal proportion.
### Begin the Training Process
PaddleDetection supports many ways of training process.
`--draw_threshold` is an optional parameter. According to NMS calculation, different thresholds will produce different results. `keep_top_k` indicates the maximum number of output targets, with a default value of 100 that can be modified according to their actual situation.
The inference model will be saved under the directory `inference/ppyolov2_r50vd_dcn_365e_coco`, which contains`infer_cfg.yml` (optional for mainbody detection), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`.
Note: Inference model that `PaddleDetection` exports is named `model.xxx`,if you want to keep it consistent with PaddleClas,you can rename `model.xxx` to `inference.xxx` for subsequent inference deployment of mainbody detection.
For more model export tutorials, please refer to [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md).
The final directory contains `inference/ppyolov2_r50vd_dcn_365e_coco`, `inference.pdiparams`, `inference.pdiparams.info`, and `inference.pdmodel`,among which`inference.pdiparams` refers to saved weight files of the inference model while `inference.pdmodel` stands for structural files.
After exporting the model, the path of the detection model can be changed to the inference model path to complete the prediction task.
Take product recognition as an example,you can modify the field `Global.det_inference_model_dir` in its config file [inference_product.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/configs/inference_product.yaml) to the directory of exported inference model, and then finish the detection and recognition of the product with reference to [Quick Start for Image Recognition](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN_tmp/tutorials/quick_start_recognition.md).
## FAQ
#### Q: Is it compatible with other mainbody detection models?
- A: Yes, but the current preprocessing process only supports PicoDet and YOLO models, so it is recommended to use these two for training. If you want to use other models such as Faster RCNN, you need to revise the logic of preprocessing in accordance with that of PaddleDetection. You are welcomed to resort to Github Issue or WeChat group for any needs or questions.
#### Q: Can I modify the prediction scale of mainbody detection?
- A: Yes, but there are 2 things that require attention
- The mainbody detection model provided in PaddleClas is trained based on `640x640` resolution, so this is also the default value of prediction process. The accuracy will be reduced if other resolutions are used.
- When exporting the model, it is recommended to modify the resolution of the exported model to keep it consistent with the prediction process.
Vector search finds wide applications in image recognition and image retrieval. It aims to obtain the similarity ranking for a given query vector by performing a similarity or distance calculation of feature vectors with all the vectors to be queried in an established vector library. In the image recognition system, [Faiss](https://github.com/facebookresearch/faiss) is adopted for corresponding support, please check [the official website of Faiss](https://github.com/facebookresearch/faiss for more information ) for more details. The main advantages of `Faiss` can be generalized as the following:
- Great adaptability: support Windows, Linux, and MacOS systems
- Easy installation: support `python` interface and direct installation with `pip`
- Rich algorithms: support a variety of search algorithms to cover different scenarios
- Support both CPU and GPU, which accelerates the search process
It is worth noting that the current version of `PaddleClas`**only uses CPU for vector retrieval** for the moment in pursuit of better adaptability.
As shown in the figure above, two parts constitute the vector search in the whole `PP-ShiTu` system.
- The green part: the establishment of search libraries for the search query, while providing functions such as adding and deleting images.
- The blue part: the search function, i.e., given the feature vector of a picture and return the label of similar images in the library.
This document mainly introduces the installation of the search module in PaddleClas, the adopted search algorithms, the library building process, and the parameters in the relevant configuration files.
------
## Contents
-[1. Installation of the Search Library](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#1)
-[3. Introduction of and Configuration Files](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#3)
-[3.1 Parameters of Library Building and Configuration Files](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#3.1)
-[3.2 Parameters of Search Configuration Files](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#3.2)
## 1. Installation of the Search Library
`Faiss` can be installed as follows:
```
pip install faiss-cpu==1.7.1post2
```
If the above cannot be properly used, please `uninstall` and then `install` again, especially when you are using`windows`.
## 2. Search Algorithms
Currently, the search module in `PaddleClas` supports the following three search algorithms:
-**HNSW32**: A graph indexing method boasts high retrieval accuracy and fast speed. However, the feature library only supports the function of adding images, not deleting image features. (Default method)
-**IVF**: An inverted index search method with fast speed but slightly lower precision. The feature library supports functions of adding and deleting image features.
-**FLAT**: A violent search algorithm presenting the highest precision, but slower retrieval speed in face of large data volume. The feature library supports functions of adding and deleting image features.
Each search algorithm can find its right place in different scenarios. `HNSW32`, as the default method, strikes a balance between accuracy and speed, see its detailed introduction in the [official document](https://github.com/facebookresearch/faiss/wiki).
## 3. Introduction of Configuration Files
Configuration files involving the search module are under `deploy/configs/`, where `build_*.yaml` is related to building the feature library, and `inference_*.yaml` is the inference file for retrieval or classification.
### 3.1 Parameters of Library Building and Configuration Files
The building of the library is detailed as follows:
```
# Enter deploy directory
cd deploy
# Change the yaml file to the specific one you need
The `yaml` file is configured as follows for library building, please make necessary corrections to fit the real operation. The construction will extract the features of the images under `image_root` according to the image list in `data_file` and store them under `index_dir` for subsequent search.
The `data_file` stores the path and label of the image file, with each line presenting the format `image_path label`. The intervals are spaced by the `delimiter` parameter in the `yaml` file.
The specific model parameters for feature extraction can be found in the `yaml` file.
-**index_method**: the search algorithm. It currently supports three, HNSW32, IVF, and Flat.
-**index_dir**: the folder where the built feature library is stored.
-**image_root**: the location of the folder where the annotated images needed to build the feature library are stored.
-**data_file**: the data list of the annotated images needed to build the feature library, the format of each line: relative_path label.
-**index_operation**: the operation to build a library: `new` for initiating an operation, `append` for adding the image feature of data_file to the feature library, `remove` for deleting the image of data_file from the feature library.
-**delimiter**: delimiter for each line in **data_file**
-**dist_type**: the method of similarity calculation adopted in feature matching. For example, Inner Product(`IP`) and Euclidean distance(`L2`).
-**embedding_size**: feature dimensionality
### 3.2 Parameters of Search Configuration Files
To integrate the search into the overall `PP-ShiTu` process, please refer to `The Introduction of PP-ShiTu Image Recognition System` in [README](https://github.com/PaddlePaddle/PaddleClas/blob/develop/README_ch.md). Please check the [Quick Start for Image Recognition](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_recognition.md) for the specific operation of the search.
The search part is configured as follows. Please refer to `deploy/configs/inference_*.yaml` for the complete version.