diff --git a/docs/en/application/mainbody_detection_en.md b/docs/en/application/mainbody_detection_en.md new file mode 100644 index 0000000000000000000000000000000000000000..09276fbfbd57894e98e8d84456dfd6edefc27725 --- /dev/null +++ b/docs/en/application/mainbody_detection_en.md @@ -0,0 +1,47 @@ +# Mainbody Detection + +The mainbody detection technology is currently a very widely used detection technology, which refers to the detect one or some mainbody objects in the picture, crop the corresponding area in the image and carry out recognition, thereby completing the entire recognition process. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy. + + +This tutorial will introduce the dataset and model training for mainbody detection in PaddleClas. + + +## 1. Dataset + +The datasets we used for mainbody detection task are shown in the following table. + + +| Dataset | Image number | Image number used in <
>mainbody detection | Scenarios | Dataset link | +| ------------ | ------------- | -------| ------- | -------- | +| Objects365 | 170W | 6k | General Scenarios | [link](https://www.objects365.org/overview.html) | +| COCO2017 | 12W | 5k | General Scenarios | [link](https://cocodataset.org/) | +| iCartoonFace | 2k | 2k | Cartoon Face | [link](https://github.com/luxiangju-PersonAI/iCartoonFace) | +| LogoDet-3k | 3k | 2k | Logo | [link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| RPC | 3k | 3k | Product | [link](https://rpc-dataset.github.io/) | + + +In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified to the category `foreground`, and the detection model we trained just contains one category (`foreground`). + +## 2. Model Training + + +There are many types of object detection methods such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on. + +PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It deeply optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. Finally, it reached the state of the art in terms of "speed-precision". Specifically, the optimization strategy is as follows. + +- Better backbone: ResNet50vd-DCN +- Larger training batch size: 8 GPUs and mini-batch size as 24 on each GPU +- [Drop Block](https://arxiv.org/abs/1810.12890) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) +- [Grid Sensitive](https://arxiv.org/abs/2004.10934) +- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) +- [CoordConv](https://arxiv.org/abs/1807.03247) +- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) +- Better ImageNet pretrain weights + +For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md) + + +In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) used for the model training, in which the dagtaset path is modified to the mainbody detection dataset. +The final inference model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar). diff --git a/docs/en/tutorials/quick_start_recognition_en.md b/docs/en/tutorials/quick_start_recognition_en.md index 6bdfb644b5c58c4918c277050ea66d7564a09f57..fdd14589c5325f277fbc492f5406402a9d0d36be 100644 --- a/docs/en/tutorials/quick_start_recognition_en.md +++ b/docs/en/tutorials/quick_start_recognition_en.md @@ -9,18 +9,19 @@ If the image category already exists in the image index database, then you can t * [1. Enviroment Preparation](#enviroment_preperation ) * [2. Image Recognition Experience](#image_recognition_experience) * [2.1 Download and Unzip the Inference Model and Demo Data](#download_and_unzip_the_inference_model_and_demo_data) - * [2.2 Logo Recognition and Retrieval](#Logo_recognition_and_retrival) + * [2.2 Product Recognition and Retrieval](#Product_recognition_and_retrival) * [2.2.1 Single Image Recognition](#recognition_of_single_image) * [2.2.2 Folder-based Batch Recognition](#folder_based_batch_recognition) * [3. Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience) - * [3.1 Build the Base Library Based on Our Own Dataset](#build_the_base_library_based_on_your_own_dataset) - * [3.2 ecognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library) + * [3.1 Prepare for the new images and labels](#3.1) + * [3.2 Build a new Index Library](#build_a_new_index_library) + * [3.3 Recognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library) ## 1. Enviroment Preparation -* Installation:Please take a reference to [Quick Installation ](./installation.md)to configure the PaddleClas environment. +* Installation:Please take a reference to [Quick Installation ](./install_en.md)to configure the PaddleClas environment. * Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`. @@ -65,7 +66,7 @@ cd .. ### 2.1 Download and Unzip the Inference Model and Demo Data -Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands. +Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands. ```shell mkdir models @@ -73,20 +74,20 @@ cd models # Download the generic detection inference model and unzip it wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar # Download and unpack the inference model -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar cd .. mkdir dataset cd dataset # Download the demo data and unzip it -wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar +wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar cd .. ``` Once unpacked, the `dataset` folder should have the following file structure. ``` -├── logo_demo_data_v1.0 +├── product_demo_data_v1.0 │ ├── data_file.txt │ ├── gallery │ ├── index @@ -99,7 +100,7 @@ The `data_file.txt` is images list used to build the index database, the `galler The `models` folder should have the following file structure. ``` -├── logo_rec_ResNet50_Logo3K_v1.0_infer +├── product_ResNet50_vd_aliproduct_v1.0_infer │ ├── inference.pdiparams │ ├── inference.pdiparams.info │ └── inference.pdmodel @@ -109,35 +110,44 @@ The `models` folder should have the following file structure. │ └── inference.pdmodel ``` - -### 2.2 Logo Recognition and Retrival + +### 2.2 Product Recognition and Retrival -Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。 +Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。 #### 2.2.1 Single Image Recognition -Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval +Run the following command to identify and retrieve the image `./dataset/product_demo_data_v1.0/query/wangzai.jpg` for recognition and retrieval ```shell -python3.7 python/predict_system.py -c configs/inference_logo.yaml +python3.7 python/predict_system.py -c configs/inference_product.yaml ``` The image to be retrieved is shown below.
- +
The final output is shown below. ``` -[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}] +[{'bbox': [305, 226, 776, 930], 'rec_docs': ['旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '旺仔牛奶', '康师傅方便面'], 'rec_scores': array([1328.1072998 , 1185.92248535, 846.88220215, 746.28546143 622.2668457 ])} ``` + where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity. +There are 4 `旺仔牛奶` results in 5, the recognition result is correct. + +The detection result is also saved in the folder `output`, which is shown as follows. + +
+ +
+ #### 2.2.2 Folder-based Batch Recognition @@ -145,7 +155,7 @@ where bbox indicates the location of the detected subject, rec_docs indicates th If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter. ```shell -python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query" +python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/" ``` Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field. @@ -154,56 +164,83 @@ Furthermore, the recognition inference model path can be changed by modifying th ## 3. Recognize Images of Unknown Category -To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows: +To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows: ```shell -python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" +python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" ``` The image to be retrieved is shown below.
- +
The output is as follows: ``` -[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}] +[{'bbox': [243, 80, 523, 522], 'rec_docs': ['娃哈哈AD钙奶', '旺仔牛奶', '娃哈哈AD钙奶', '农夫山泉矿泉水', '红牛'], 'rec_scores': array([548.33282471, 411.85687256, 408.39770508, 400.89404297, 360.41540527])}] ``` Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database. When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining. - -### 3.1 Build the Base Library Based on Your Own Dataset + +### 3.1 Prepare for the new images and labels + +First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows. + +```shell +cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./dataset/product_demo_data_v1.0/gallery/ +``` + +Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one. + +```shell +# copy the file +cp dataset/product_demo_data_v1.0/data_file.txt dataset/product_demo_data_v1.0/data_file_update.txt +``` + +Then add some new lines into the new label file, which is shown as follows. + +``` +gallery/anmuxi/001.jpg 安慕希酸奶 +gallery/anmuxi/002.jpg 安慕希酸奶 +gallery/anmuxi/003.jpg 安慕希酸奶 +gallery/anmuxi/004.jpg 安慕希酸奶 +gallery/anmuxi/005.jpg 安慕希酸奶 +gallery/anmuxi/006.jpg 安慕希酸奶 +``` + +Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `space` here. -First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomation,recording the category of the original images and the label information)store in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt` + +### 3.2 Build a new Index Base Library -Then use the following command to build the index to accelerate the retrieval process after recognition. +Use the following command to build the index to accelerate the retrieval process after recognition. ```shell -python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update" +python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./dataset/product_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update" ``` -Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index. +Finally, the new index information is stored in the folder`./dataset/product_demo_data_v1.0/index_update`. Use the new index database for the above index. ### 3.2 Recognize the Unknown Category Images -To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows. +To recognize the image `./dataset/product_demo_data_v1.0/query/anmuxi.jpg`, run the command as follows. ```shell -python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update" +python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./dataset/product_demo_data_v1.0/query/anmuxi.jpg" -o IndexProcess.index_path="./dataset/product_demo_data_v1.0/index_update" ``` The output is as follows: ``` -[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}] +[{'bbox': [243, 80, 523, 522], 'rec_docs': ['安慕希酸奶', '娃哈哈AD钙奶', '安慕希酸奶', '安慕希酸奶', '安慕希酸奶'], 'rec_scores': array([1214.9597168 , 548.33282471, 547.82104492, 535.13201904, 471.52706909])}] ``` -The recognition result is correct. +There are 4 `安慕希酸奶` results in 5, the recognition result is correct. diff --git a/docs/zh_CN/application/mainbody_detection.md b/docs/zh_CN/application/mainbody_detection.md index a6aba2ae3c3bfeaa2ccb64c9985f9aecfe4f1f1c..5c58afaaec879194d7153f8755059a46fb70bdb1 100644 --- a/docs/zh_CN/application/mainbody_detection.md +++ b/docs/zh_CN/application/mainbody_detection.md @@ -1,6 +1,6 @@ # 主体检测 -主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中最突出的主体坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。 +主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。 本部分主要从数据集、模型训练2个方面对该部分内容进行介绍。