From a1481422c38f8e4a5b4847d30ed41784a3c94372 Mon Sep 17 00:00:00 2001 From: dongshuilong Date: Tue, 7 Dec 2021 14:24:55 +0800 Subject: [PATCH] add English doc for ImageClassifcation,ImageRecognition, VectorSearch, MainBodyDetection --- .../mainbody_detection_en.md | 206 +++++++++++ .../vector_search_en.md | 114 +++++++ .../image_classification_en.md | 318 +++++++++++++++++ .../models_training/image_recognition_en.md | 319 ++++++++++++++++++ 4 files changed, 957 insertions(+) create mode 100644 docs/en/image_recognition_pipeline/mainbody_detection_en.md create mode 100644 docs/en/image_recognition_pipeline/vector_search_en.md create mode 100644 docs/en/models_training/image_classification_en.md create mode 100644 docs/en/models_training/image_recognition_en.md diff --git a/docs/en/image_recognition_pipeline/mainbody_detection_en.md b/docs/en/image_recognition_pipeline/mainbody_detection_en.md new file mode 100644 index 00000000..b89bc375 --- /dev/null +++ b/docs/en/image_recognition_pipeline/mainbody_detection_en.md @@ -0,0 +1,206 @@ +# Mainbody Detection + +The mainbody detection technology is currently a widely used detection technology, which refers to a whole image recognition process of identifying the coordinate position of one or more objects and then cropping down the corresponding area for recognition. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy. + +This tutorial will introduce the technology from three aspects, namely, the datasets, model selection and model training. + +## Dataset + +The datasets we used for mainbody detection tasks are shown in the following table. + +| Dataset | Image Number | Image Number Used in Mainbody Detection | Scenarios | Dataset Link | +| ------------ | ------------ | --------------------------------------- | ----------------- | ---------------------------------------------------------- | +| Objects365 | 170W | 6k | General Scenarios | [Link](https://www.objects365.org/overview.html) | +| COCO2017 | 12W | 5k | General Scenarios | [Link](https://cocodataset.org/) | +| iCartoonFace | 2k | 2k | Cartoon Face | [Link](https://github.com/luxiangju-PersonAI/iCartoonFace) | +| LogoDet-3k | 3k | 2k | Logo | [Link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| RPC | 3k | 3k | Product | [Link](https://rpc-dataset.github.io/) | + +In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified as `foreground`, and the detection model we trained only contains one category (`foreground`). + +## Model Selection + +There are a wide variety of object detection methods, such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on. PaddleDetection has its self-developed PP-YOLO models for server-side scenarios and PicoDet models for end-side scenarios (CPU and mobile), which all take the lead in the area. + +Build on the studies above, PaddleClas provides lightweight and server-side main body detection models for end-side scenarios and server-side scenarios respectively. The table below presents the average mAP of the 5 datasets and the comparison of their model sizes and inference speed. + +| Model | Model Structure | Download Link of Pre-trained Model | Download Link of Inference Model | mAP | Size of Inference Model (MB) | Inference Time per Image (preprocessing excluded)(ms) | +| ------------------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----- | ---------------------------- | ----------------------------------------------------- | +| Lightweight Mainbody Detection Model | PicoDet | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | 40.1% | 30.1 | 29.8 | +| Server-side Mainbody Detection Model | PP-YOLOv2 | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams) | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | 42.5% | 210.5 | 466.6 | + +Notes: + +- Detailed information of the CPU of the speed evaluation machine:`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.The speed indicator is the testing result when mkldnn is on and the number of threads is set to 10. +- Mainbody detection has a time-consuming preprocessing procedure, with an average time of about 40 to 55 ms per image in the above machine. Therefore, it is not included in the inference time. + +### Lightweight Mainbody Detection Model + +PicoDet, introduced by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), is an object detection algorithm applied to CPU or mobile-side scenarios. It integrates the following optimization algorithm. + +- [ATSS](https://arxiv.org/abs/1912.02424) +- [Generalized Focal Loss](https://arxiv.org/abs/2006.04388) +- Cosine learning rate decay +- Cycle-EMA +- Lightweight detection head + +For more details of optimized PicoDet and benchmark, you can refer to [Tutorials of PicoDet Models](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md). + +To balance the detection speed and effects in lightweight mainbody detection tasks, we adopt PPLCNet_x2_5 as the backbone of the model and revise the image scale for training and inference to 640x640, with the rest configured the same as [picodet_m_shufflenetv2_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/picodet_m_shufflenetv2_416_coco.yml). The final detection model is obtained after the training of customized mainbody detection datasets. + +### Server-side Mainbody Detection Model + +PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It greatly optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. It reaches the state of the art in terms of "speed-precision". The optimization strategy is as follows. + +- Better backbone: ResNet50vd-DCN +- Larger training batch size of 8 GPUs and mini-batch size of 24 on each GPU, which is corresponding to learning rate and the number of iterations. +- [Drop Block](https://arxiv.org/abs/1810.12890) +- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp) +- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf) +- [Grid Sensitive](https://arxiv.org/abs/2004.10934) +- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf) +- [CoordConv](https://arxiv.org/abs/1807.03247) +- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729) +- Better Pre-trained Model + +For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md). + +In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml), in which the dataset path is modified to the customized mainbody detection dataset. The final detection model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar). + +## Model Training + +This section mainly talks about how to train your own mainbody detection model using PaddleDetection on your own datasets. + +### Prepare For the Environment + +Download PaddleDetection and install requirements. + +```shell +cd +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +cd PaddleDetection +# install requirements +pip install -r requirements.txt +``` + +For more installation tutorials, please refer to [Installation Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL.md) + +### Prepare For the Dataset + +For customized dataset, you should convert it to COCO format. Please refer to [Customized Dataset Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md) to build your own datasets with COCO format. + +In mainbody detection task, all the objects belong to foregroud. Therefore, `category_id` of all the objects in the annotation file should be modified to 1. And the `categories` map should be modified as follows, in which just class `foregroud` is included. + +``` +[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}] +``` + +### Configuration Files + +We use `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` to train the model, mode details are as follows. + + [![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/det/PaddleDetection_config.png)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/det/PaddleDetection_config.png) + + + +`ppyolov2_r50vd_dcn_365e_coco.yml` depends on other configuration files, their meanings are as follows. + +``` +coco_detection.yml:path of train/eval/test dataset. + +runtime.yml:public runtime parameters, including whethre to use GPU, epoch number for checkpoint saving, etc. + +optimizer_365e.yml:learning rate and optimizer. + +ppyolov2_r50vd_dcn.yml:model architecture and backbone. + +ppyolov2_reader.yml:train/eval/test reader, such as batch size, the number of concurrently loaded sub-processes, etc., and includes post-read pre-processing operations, such as resize, data enhancement, etc. +``` + +In mainbody detection task, you need to modify `num_classes` in `datasets/coco_detection.yml` to 1 (only `foreground` is included), while modify the paths of the training and testing datasets to those of the customized datasets. + +In addition, the above files can also be modified according to real situations, for example, if the video memory is overflowing, the batch size and learning rate can be reduced in equal proportion. + +### Begin the Training Process + +PaddleDetection supports many ways of training process. + +- Training using single GPU + +``` +# not needed for windows and Mac +export CUDA_VISIBLE_DEVICES=0 +python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml +``` + +- Training using multiple GPUs + +``` +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval +``` + +--eval: evaluation while training + +- (**Recommend**) Model finetune If you want to finetune the trained model in PaddleClas on your own datasets, you can run the following command. + +``` +export CUDA_VISIBLE_DEVICES=0 +# assign pretrain_weights, load the general mainbody-detection pretrained model +python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pretrain_weights=https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams +``` + +- Resume training + + you can use `-r` to load checkpoints and resume training. + +``` +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval -r output/ppyolov2_r50vd_dcn_365e_coco/10000 +``` + +Note: If `Out of memory error` occurs, you can try to decrease `batch_size` in `ppyolov2_reader.yml` while reducing learning rate in equal proportion. + +### Model Prediction + +Use the following command to finish the prediction process. + +``` +export CUDA_VISIBLE_DEVICES=0 +python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final +``` + +`--draw_threshold` is an optional parameter. According to NMS calculation, different thresholds will produce different results. `keep_top_k` indicates the maximum number of output targets, with a default value of 100 that can be modified according to their actual situation. + +### Model Export and Inference Deployment + +Use the following to export the inference model: + +``` +python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams +``` + +The inference model will be saved under the directory `inference/ppyolov2_r50vd_dcn_365e_coco`, which contains`infer_cfg.yml` (optional for mainbody detection), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`. + +Note: Inference model that `PaddleDetection` exports is named `model.xxx`,if you want to keep it consistent with PaddleClas,you can rename `model.xxx` to `inference.xxx` for subsequent inference deployment of mainbody detection. + +For more model export tutorials, please refer to [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md). + +The final directory contains `inference/ppyolov2_r50vd_dcn_365e_coco`, `inference.pdiparams`, `inference.pdiparams.info`, and `inference.pdmodel`,among which`inference.pdiparams` refers to saved weight files of the inference model while `inference.pdmodel` stands for structural files. + +After exporting the model, the path of the detection model can be changed to the inference model path to complete the prediction task. + +Take product recognition as an example,you can modify the field `Global.det_inference_model_dir` in its config file [inference_product.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/configs/inference_product.yaml) to the directory of exported inference model, and then finish the detection and recognition of the product with reference to [Quick Start for Image Recognition](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN_tmp/tutorials/quick_start_recognition.md). + +## FAQ + +#### Q: Is it compatible with other mainbody detection models? + +- A: Yes, but the current preprocessing process only supports PicoDet and YOLO models, so it is recommended to use these two for training. If you want to use other models such as Faster RCNN, you need to revise the logic of preprocessing in accordance with that of PaddleDetection. You are welcomed to resort to Github Issue or WeChat group for any needs or questions. + +#### Q: Can I modify the prediction scale of mainbody detection? + +- A: Yes, but there are 2 things that require attention + - The mainbody detection model provided in PaddleClas is trained based on `640x640` resolution, so this is also the default value of prediction process. The accuracy will be reduced if other resolutions are used. + - When exporting the model, it is recommended to modify the resolution of the exported model to keep it consistent with the prediction process. diff --git a/docs/en/image_recognition_pipeline/vector_search_en.md b/docs/en/image_recognition_pipeline/vector_search_en.md new file mode 100644 index 00000000..c642a907 --- /dev/null +++ b/docs/en/image_recognition_pipeline/vector_search_en.md @@ -0,0 +1,114 @@ +# Vector Search + +Vector search finds wide applications in image recognition and image retrieval. It aims to obtain the similarity ranking for a given query vector by performing a similarity or distance calculation of feature vectors with all the vectors to be queried in an established vector library. In the image recognition system, [Faiss](https://github.com/facebookresearch/faiss) is adopted for corresponding support, please check [the official website of Faiss](https://github.com/facebookresearch/faiss for more information ) for more details. The main advantages of `Faiss` can be generalized as the following: + +- Great adaptability: support Windows, Linux, and MacOS systems +- Easy installation: support `python` interface and direct installation with `pip` +- Rich algorithms: support a variety of search algorithms to cover different scenarios +- Support both CPU and GPU, which accelerates the search process + +It is worth noting that the current version of `PaddleClas` **only uses CPU for vector retrieval** for the moment in pursuit of better adaptability. + +[![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/structure.jpg)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/structure.jpg) + +As shown in the figure above, two parts constitute the vector search in the whole `PP-ShiTu` system. + +- The green part: the establishment of search libraries for the search query, while providing functions such as adding and deleting images. +- The blue part: the search function, i.e., given the feature vector of a picture and return the label of similar images in the library. + +This document mainly introduces the installation of the search module in PaddleClas, the adopted search algorithms, the library building process, and the parameters in the relevant configuration files. + +------ + +## Contents + +- [1. Installation of the Search Library](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#1) +- [2. Search Algorithms](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#2) +- [3. Introduction of and Configuration Files](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#3) + - [3.1 Parameters of Library Building and Configuration Files](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#3.1) + - [3.2 Parameters of Search Configuration Files](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/vector_search.md#3.2) + + + +## 1. Installation of the Search Library + +`Faiss` can be installed as follows: + +``` +pip install faiss-cpu==1.7.1post2 +``` + +If the above cannot be properly used, please `uninstall` and then `install` again, especially when you are using`windows`. + +## 2. Search Algorithms + +Currently, the search module in `PaddleClas` supports the following three search algorithms: + +- **HNSW32**: A graph indexing method boasts high retrieval accuracy and fast speed. However, the feature library only supports the function of adding images, not deleting image features. (Default method) +- **IVF**: An inverted index search method with fast speed but slightly lower precision. The feature library supports functions of adding and deleting image features. +- **FLAT**: A violent search algorithm presenting the highest precision, but slower retrieval speed in face of large data volume. The feature library supports functions of adding and deleting image features. + +Each search algorithm can find its right place in different scenarios. `HNSW32`, as the default method, strikes a balance between accuracy and speed, see its detailed introduction in the [official document](https://github.com/facebookresearch/faiss/wiki). + +## 3. Introduction of Configuration Files + +Configuration files involving the search module are under `deploy/configs/`, where `build_*.yaml` is related to building the feature library, and `inference_*.yaml` is the inference file for retrieval or classification. + +### 3.1 Parameters of Library Building and Configuration Files + +The building of the library is detailed as follows: + +``` +# Enter deploy directory +cd deploy +# Change the yaml file to the specific one you need +python python/build_gallery.py -c configs/build_***.yaml +``` + +The `yaml` file is configured as follows for library building, please make necessary corrections to fit the real operation. The construction will extract the features of the images under `image_root` according to the image list in `data_file` and store them under `index_dir` for subsequent search. + +The `data_file` stores the path and label of the image file, with each line presenting the format `image_path label`. The intervals are spaced by the `delimiter` parameter in the `yaml` file. + +The specific model parameters for feature extraction can be found in the `yaml` file. + +``` +# indexing engine config +IndexProcess: + index_method: "HNSW32" # supported: HNSW32, IVF, Flat + index_dir: "./recognition_demo_data_v1.1/gallery_product/index" + image_root: "./recognition_demo_data_v1.1/gallery_product/" + data_file: "./recognition_demo_data_v1.1/gallery_product/data_file.txt" + index_operation: "new" # suported: "append", "remove", "new" + delimiter: "\t" + dist_type: "IP" + embedding_size: 512 +``` + +- **index_method**: the search algorithm. It currently supports three, HNSW32, IVF, and Flat. +- **index_dir**: the folder where the built feature library is stored. +- **image_root**: the location of the folder where the annotated images needed to build the feature library are stored. +- **data_file**: the data list of the annotated images needed to build the feature library, the format of each line: relative_path label. +- **index_operation**: the operation to build a library: `new` for initiating an operation, `append` for adding the image feature of data_file to the feature library, `remove` for deleting the image of data_file from the feature library. +- **delimiter**: delimiter for each line in **data_file** +- **dist_type**: the method of similarity calculation adopted in feature matching. For example, Inner Product(`IP`) and Euclidean distance(`L2`). +- **embedding_size**: feature dimensionality + + + +### 3.2 Parameters of Search Configuration Files + +To integrate the search into the overall `PP-ShiTu` process, please refer to `The Introduction of PP-ShiTu Image Recognition System` in [README](https://github.com/PaddlePaddle/PaddleClas/blob/develop/README_ch.md). Please check the [Quick Start for Image Recognition](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_recognition.md) for the specific operation of the search. + +The search part is configured as follows. Please refer to `deploy/configs/inference_*.yaml` for the complete version. + +``` +IndexProcess: + index_dir: "./recognition_demo_data_v1.1/gallery_logo/index/" + return_k: 5 + score_thres: 0.5 +``` + +The following are new parameters other than those of the library building configuration file: + +- `return_k`: `k` results are returned +- `score_thres`: the threshold for retrieval and match diff --git a/docs/en/models_training/image_classification_en.md b/docs/en/models_training/image_classification_en.md new file mode 100644 index 00000000..84829164 --- /dev/null +++ b/docs/en/models_training/image_classification_en.md @@ -0,0 +1,318 @@ +# Image Classification + +------ + +Image Classification is a fundamental task that classifies the image by semantic information and assigns it to a specific label. Image Classification is the foundation of Computer Vision tasks, such as object detection, image segmentation, object tracking and behavior analysis. Image Classification has comprehensive applications, including face recognition and smart video analysis in the security and protection field, traffic scenario recognition in the traffic field, image retrieval and electronic photo album classification in the internet industry, and image recognition in the medical industry. + +Generally speaking, Image Classification attempts to comprehend an entire image as a whole by feature engineering and assigns labels by a classifier. Hence, how to extract the features of image is the essential part. Before deep learning, the most used classification method is the Bag of Words model. However, Image Classification based on deep learning can learn the hierarchical feature description by supervised and unsupervised learning, replacing the manually image feature selection. Recently, Convolution Neural Network in deep learning has an awesome performance in the image field. CNN uses the pixel information as the input to get the all information of images. Additionally, since the model uses convolution to extract features, and the output is classification result. Thus, this kind of end-to-end method achieves ideal performance and is applied widely. + +Image Classification is a very basic but important field in the subject of computer vision. Its research results have always influenced the development of computer vision and even deep learning. Image classification has many sub-fields, such as multi-label image classification and fine-grained image classification. Here is only a brief description of single-label image classification. + +See [here](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/algorithm_introduction/image_classification.md) for the detailed introduction of image classification algorithms. + +## Contents + +- [1. Dataset Introduction](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#1) + - [1.1 ImageNet-1k](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#1.1) + - [1.2 CIFAR-10/CIFAR-100](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#1.2) +- [2. Image Classification Process](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#2) + - [2.1 Data and Its Preprocessing](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#2.1) + - [2.2 Prepare the Model](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#2.2) + - [2.3 Train the Model](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#2.3) + - [2.4 Evaluate the Model](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#2.4) +- [3. Application Methods](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3) + - [3.1 Training and Evaluation on CPU or Single GPU](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.1) + - [3.1.1 Model Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.1.1) + - [3.1.2 Model Finetuning](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.1.2) + - [3.1.3 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.1.3) + - [3.1.4 Model Evaluation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.1.4) + - [3.2 Training and Evaluation on Linux+ Multi-GPU](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.2) + - [3.2.1 Model Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.2.1) + - [3.2.2 Model Finetuning](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.2.2) + - [3.2.3 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.2.3) + - [3.2.4 Model Evaluation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.2.4) + - [3.3 Use the Pre-trained Model to Predict](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.3) + - [3.4 Use the Inference Model to Predict](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.4) + + + +## 1. Dataset Introduction + +### 1.1 ImageNet-1k + +The ImageNet is a large-scale visual database for the research of visual object recognition. More than 14 million images have been annotated manually to point out objects in the picture in this project, and at least more than 1 million images provide bounding box. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories. The training set contains 1281167 image data, and the validation set contains 50,000 image data. Since 2010, the ImageNet project has held an image classification competition every year, which is the ImageNet Large-scale Visual Recognition Challenge (ILSVRC). The dataset used in the challenge is ImageNet-1k. So far, ImageNet-1k has become one of the most important data sets for the development of computer vision, and it promotes the development of the entire computer vision. The initialization models of many computer vision downstream tasks are based on the weights trained on this dataset. + +### 1.2 CIFAR-10/CIFAR-100 + +The CIFAR-10 dataset consists of 60,000 color images in 10 categories, with an image resolution of 32x32, and each category has 6000 images, including 5000 in the training set and 1000 in the validation set. 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks. The CIFAR-100 data set is an extension of CIFAR-10. It consists of 60,000 color images in 100 classes, with an image resolution of 32x32, and each class has 600 images, including 500 in the training set and 100 in the validation set. Researchers can try different algorithms quickly because these two data sets are small in scale. These two datasets are also commonly used data sets for testing the quality of models in the image classification field. + +## 2. Image Classification Process + +The prepared training data is preprocessed and then passed through the image classification model. The output of the model and the real label are used in a cross-entropy loss function. This loss function describes the convergence direction of the model. Then the corresponding gradient descent for the final loss function is calculated and returned to the model, which update the weight of the model by optimizers. Finally, an image classification model can be obtained. + +### 2.1 Data Preprocessing + +The quality and quantity of data often determine the performance of a model. In the field of image classification, data includes images and labels. In most cases, labeled data is scarce, so the amount of data is difficult to reach the level of saturation of the model. In order to enable the model to learn more image features, a lot of image transformation or data augmentation is required before the image enters the model, so as to ensure the diversity of input image data and ensure that the model has better generalization capabilities. PaddleClas provides standard image transformation for training ImageNet-1k, and also provides 8 data augmentation methods. For related codes, please refer to [data preprocess](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/ppcls/data/preprocess),The configuration file refer to [Data Augmentation Configuration File](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/ppcls/configs/ImageNet/DataAugment). + +### 2.2 Prepare the Model + +After the data is determined, the model often determines the upper limit of the final accuracy. In the field of image classification, classic models emerge in an endless stream. PaddleClas provides 35 series and a total of 164 ImageNet pre-trained models. For specific accuracy, speed and other indicators, please refer to [Backbone Network Introduction](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/models). + +### 2.3 Train + +After preparing the data and model, you can start training the model and update the parameters of the model. After many iterations, a trained model can finally be obtained for image classification tasks. The training process of image classification requires a lot of experience and involves the setting of many hyperparameters. PaddleClas provides a series of [training tuning methods](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/models/Tricks_en.md), which can quickly help you obtain a high-precision model. + +### 2.4 Evaluation + +After a model is trained, the evaluation results of the model on the validation set can determine the performance of the model. The evaluation index is generally Top1-Acc or Top5-Acc. The higher the index, the better the model performance. + +## 3. Application Methods + +Please refer to [Installation](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/install_en.md) to setup environment at first, and prepare flower102 dataset by following the instruction mentioned in the [Quick Start](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/quick_start_en.md). + +So far, PaddleClas supports the following training/evaluation environments: + +``` +└── CPU/Single GPU + ├── Linux + └── Windows + +└── Multi card GPU + └── Linux +``` + +### 3.1 Training and Evaluation on CPU or Single GPU + +If training and evaluation are performed on CPU or single GPU, it is recommended to use the `tools/train.py` and `tools/eval.py`. For training and evaluation in multi-GPU environment on Linux, please refer to [3.2 Training and evaluation on Linux+GPU](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/getting_started_en.md#2-training-and-evaluation-on-linuxgpu). + +#### 3.1.1 Model Training + +After preparing the configuration file, The training process can be started in the following way. + +```shell +python3 tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Arch.pretrained=False \ + -o Global.device=gpu +``` + +Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models. `-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`. + +Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/config_description_en.md). + +The output log examples are as follows: + +- If mixup or cutmix is used in training, top-1 and top-k (default by 5) will not be printed in the log: + + ``` + ... + [Train][Epoch 3/20][Avg]CELoss: 6.46287, loss: 6.46287 + ... + [Eval][Epoch 3][Avg]CELoss: 5.94309, loss: 5.94309, top1: 0.01961, top5: 0.07941 + ... + ``` + +- If mixup or cutmix is not used during training, in addition to the above information, top-1 and top-k (The default is 5) will also be printed in the log: + + ``` + ... + [Train][Epoch 3/20][Avg]CELoss: 6.12570, loss: 6.12570, top1: 0.01765, top5: 0.06961 + ... + [Eval][Epoch 3][Avg]CELoss: 5.40727, loss: 5.40727, top1: 0.07549, top5: 0.20980 + ... + ``` + +During training, you can view loss changes in real time through `VisualDL`, see [VisualDL](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/extension/VisualDL_en.md) for details. + +#### 3.1.2 Model Finetuning + +After correcting config file, you can load pretrained model weight to finetune. The command is as follows: + +```shell +python3 tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Arch.pretrained=True \ + -o Global.device=gpu +``` + +Among them,`Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k. + +We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/models/models_intro_en.md). + +#### 3.1.3 Resume Training + +If the training process is terminated for some reasons, you can also load the checkpoints to continue training. + +```shell +python3 tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \ + -o Global.device=gpu +``` + +The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. + +**Note**: + +- The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes. Files in the output directory are structured as follows: + + ``` + output + ├── MobileNetV3_large_x1_0 + │ ├── best_model.pdopt + │ ├── best_model.pdparams + │ ├── best_model.pdstates + │ ├── epoch_1.pdopt + │ ├── epoch_1.pdparams + │ ├── epoch_1.pdstates + . + . + . + ``` + +#### 3.1.4 Model Evaluation + +The model evaluation process can be started as follows. + +```shell +python3 tools/eval.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model +``` + +The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above. + +Some of the configurable evaluation parameters are described as follows: + +- `Arch.name`:Model name +- `Global.pretrained_model`:The path of the model file to be evaluated + +**Note:** When loading the model to be evaluated, you only need to specify the path of the model file stead of the suffix. PaddleClas will automatically add the `.pdparams` suffix, such as [3.1.3 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/ develop/docs/zh_CN/models_training/classification.md#3.1.3). + +When loading the model to be evaluated, you only need to specify the path of the model file stead of the suffix. PaddleClas will automatically add the `.pdparams` suffix, such as [3.1.3 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/ develop/docs/zh_CN/models_training/classification.md#3.1.3). + +### 3.2 Training and Evaluation on Linux+ Multi-GPU + +If you want to run PaddleClas on Linux with GPU, it is highly recommended to use `paddle.distributed.launch` to start the model training script(`tools/train.py`) and evaluation script(`tools/eval.py`), which can start on multi-GPU environment more conveniently. + +#### 3.2.1 Model Training + +The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`: + +```shell +# PaddleClas initiates multi-card multi-process training via launch + +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml +``` + +The format of output log information is the same as above, see [3.1.1 Model training](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/getting_started_en.md#11-model-training) for details. + +#### 3.2.2 Model Finetuning + +After configuring the yaml file, you can finetune it by loading the pretrained weights. The command is as below. + +```shell +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Arch.pretrained=True +``` + +Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. + +There contains a lot of examples of model finetuning in the [new user version](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_classification_new_user.md) and [professional version](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_classification_professional.md) of PaddleClas Trial in 30 mins. You can refer to this tutorial to finetune the model on a specific dataset. + +#### 3.2.3 Resume Training + +If the training process is terminated for some reasons, you can also load the checkpoints to continue training. + +```shell +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \ + -o Global.device=gpu +``` + +The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [3.1.3 Resume training](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/getting_started_en.md#13-resume-training). + +#### 3.2.4 Model Evaluation + +The model evaluation process can be started as follows. + +```shell +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + tools/eval.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model +``` + +About parameter description, see [3.1.4 Model evaluation](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/getting_started_en.md#14-model-evaluation) for details. + +### 3.3 Use the Pre-trained Model to Predict + +After the training is completed, you can predict by using the pre-trained model obtained by the training. A complete example is provided in `tools/infer/infer.py` of the model library, run the following command to conduct model prediction: + +``` +python3 tools/infer.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \ + -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model +``` + +Parameters: + +- `Infer.infer_imgs`:The path of the image file or folder to be predicted. +- `Global.pretrained_model`:Weight file path, such as`./output/MobileNetV3_large_x1_0/best_model` + +### 3.4 Use the Inference Model to Predict + +By exporting the inference model,PaddlePaddle supports inference using prediction engines, which will be introduced next. Firstly, you should export inference model using `tools/export_model.py`. + +```shell +python3 tools/export_model.py \ + -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \ + -o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model +``` + +Among them, `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.(such as [3.1.3 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/classification.md#3.1.3))。 + +The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference: + +Go to the deploy directory: + +``` +cd deploy +``` + +Using inference engine to inference. Because the mapping file of ImageNet1k dataset is `class_id_map_file` by default, here it should be set to None. + +```shell +python3 python/predict_cls.py \ + -c configs/inference_cls.yaml \ + -o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \ + -o Global.inference_model_dir=../inference/ \ + -o PostProcess.Topk.class_id_map_file=None +``` + +Among them: + +- `Global.infer_imgs`:The path of the image file to be predicted. +- `Global.inference_model_dir`:Model structure file path, such as `../inference/`. +- `Global.use_tensorrt`:Whether to use the TesorRT, default by `False`. +- `Global.use_gpu`:Whether to use the GPU, default by `True`. +- `Global.enable_mkldnn`:Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`. +- `Global.use_fp16`:Whether to enable FP16, default by `False`. + +Note: If you want to use `Transformer` series models, such as `DeiT_***_384`, `ViT_***_384`, etc.,please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`. + +If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU. diff --git a/docs/en/models_training/image_recognition_en.md b/docs/en/models_training/image_recognition_en.md new file mode 100644 index 00000000..4bd48fc5 --- /dev/null +++ b/docs/en/models_training/image_recognition_en.md @@ -0,0 +1,319 @@ +# Image Recognition + + + +Image recognition, in PaddleClas, means that the system is able to recognize the label of a given query image. Broadly speaking, image classification falls under image recognition. But unlike ordinary image recognition, it can only discriminate the learned categories and require retraining to add new ones. The image recognition in PaddleClas, however, only need to update the corresponding search library to identify the category of the unfamiliar images without retraining the model, which not only significantly promotes the usability of the recognition system but also reduces the demand for model updates, facilitating users' deployment of the application. + +For an image to be queried, the image recognition process in PaddleClas is divided into three main parts: + +1. Mainbody Detection: for a given query image, the mainbody detector first identifies the object, thus removing useless background information to improve the recognition accuracy. +2. Feature Extraction: for each candidate region of mainbody detection, feature extraction is performed by the feature model +3. Vector Search: the extracted features are compared with the vectors in the feature gallery for similarity to obtain their label information + +The feature gallery is built in advance using the labeled image datasets. The complete image recognition system is shown in the figure below. + +[![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/structure.jpg)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/structure.jpg) + +To experience the whole image recognition system, or learn how to build a feature gallery, please refer to [Quick Start of Image Recognition](. /quick_start/quick_start_recognition.md), which explains the overall application process. The following parts expound on the training part of the above three steps. + +Please first refer to the [Installation Guide](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md) to configure the runtime environment. + +## Contents + +- [1. Mainbody Detection](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#1) +- [2. Feature Model Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2) + - [2.1. Data Preparation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.1) + - [2. 2 Single GPU-based Training and Evaluation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.2) + - [2.2.1 Model Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.2.2) + - [2.2.2 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.2.2) + - [2.2.3 Model Evaluation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.2.3) + - [2.3 Export Inference Model](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.3) +- [3. Vector Search](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#3) +- [4. Basic Knowledge](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#4) + + + +## 1. Mainbody Detection + +The mainbody detection training is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop), the only difference is that all the detection boxes in the mainbody detection task belong to the foreground, but it is necessary to modify `category_id` of the detection box in the annotation file to 1, while changing the `categories` mapping table in the whole annotation file to the following format, i.e., the whole category mapping table contains only `foreground`. + +``` +[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}] +``` + +For more information about the training method of mainbody detection, please refer to: [PaddleDetection Training Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#4-训练). + +For more information on the introduction and download of the model provided in PaddleClas for body detection, please refer to: [PaddleDetection Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md). + + + +## 2. Feature Model Training + + + +### 2.1 Data Preparation + +- Go to PaddleClas directory. + +``` +## linux or mac, $path_to_PaddleClas indicates the root directory of PaddleClas,which the user needs to modify according to their real directory +cd $path_to_PaddleClas +``` + +- Go to the `dataset`. which the user needs to modify according to their real directory [CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf), which is a fine grid dataset with 200 different types of birds. Firstly, we need to download the dataset. For download, please refer to [Official Website](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html). + +```shell +# linux or mac +cd dataset + +# Copy the downloaded data into a directory. +cp {Data storage path}/CUB_200_2011.tgz . + +# Unzip +tar -xzvf CUB_200_2011.tgz + +# Go to CUB_200_2011 +cd CUB_200_2011 +``` + +When using the dataset for image retrieval, we usually use the first 100 classes as the training set, and the last 100 classes as the testing set, so we need to process those data so as to adapt the model training of image retrieval. + +```shell +# Create train and test directories +mkdir train && mkdir test + +# Divide data into training set with the first 100 classes and testing set with the last 100 classes. +ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh +ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh + +# Generate train_list and test_list +tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt +tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt +``` + +So far, we have the training set (in the `train` catalog), testing set (in the `test` catalog), `train_list.txt` and `test_list.txt` of `CUB_200_2011`. + +After data preparation, the `train` directory of `CUB_200_2011` should be: + +``` +├── 1 +│ ├── Black_Footed_Albatross_0001_796111.jpg +│ ├── Black_Footed_Albatross_0002_55.jpg + ... +├── 10 +│ ├── Red_Winged_Blackbird_0001_3695.jpg +│ ├── Red_Winged_Blackbird_0005_5636.jpg +... +``` + +`train_list.txt` should be: + +``` +train/99/Ovenbird_0137_92639.jpg 99 1 +train/99/Ovenbird_0136_92859.jpg 99 2 +train/99/Ovenbird_0135_93168.jpg 99 3 +train/99/Ovenbird_0131_92559.jpg 99 4 +train/99/Ovenbird_0130_92452.jpg 99 5 +... +``` + +The separators are shown as spaces, and the meaning of those three columns of data are the directory, label and unique id of training sets. + +The format of testing set is the same as the one of training set. + +**Note**: + +- When the gallery dataset and query dataset are the same, in order to remove the first data retrieved (the retrieved images themselves do not need to be evaluated), each data needs to correspond to a unique id for subsequent evaluation of metrics such as mAP, recall@1, etc. Please refer to [Introduction to image retrieval datasets](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#图像检索数据集介绍) for the analysis of gallery datasets and query datasets, and [Image retrieval evaluation metrics](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#图像检索评价指标) for the evaluation of mAP, recall@1, etc. + +Back to `PaddleClas` root directory. + +```shell +# linux or mac +cd ../../ +``` + +### 2.2 Single GPU-based Training and Evaluation + +For training and evaluation on a single GPU, the `tools/train.py` and `tools/eval.py` scripts are recommended. + +#### 2.2.1 Model Training + +Once you have prepared the configuration file, you can start training the image retrieval task in the following way. the method used by PaddleClas to train the image retrieval is metric learning, referring to [metric learning](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/getting_started_retrieval_en.md#Metric-Learning) for more explanations. + +```shell +# Single GPU +python3 tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Arch.Backbone.pretrained=True \ + -o Global.device=gpu +# Multi GPU +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Arch.Backbone.pretrained=True \ + -o Global.device=gpu +``` + +`-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model. In addtion,`Arch.Backbone.pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`. + +For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/config_description_en.md) for specific configuration parameters. + +Run the above commands to check the output log, an example is as follows: + +```` +``` +... +[Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154 +... +[Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238 +... +``` +```` + +The Backbone here is MobileNetV1, if you want to use other backbone, you can rewrite the parameter `Arch.Backbone.name`, for example by adding `-o Arch.Backbone.name={other Backbone}` to the command. In addition, as the input dimension of the `Neck` section differs between models, replacing a Backbone may require rewriting the input size here in a similar way to replacing the Backbone's name. + +In the Training Loss section, [CELoss](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/ppcls/loss/celoss.py) and [TripletLossV2](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/ppcls/loss/triplet.py) are used here with the following configuration files: + +``` +Loss: + Train: + - CELoss: + weight: 1.0 + - TripletLossV2: + weight: 1.0 + margin: 0.5 +``` + +The final total Loss is a weighted sum of all Losses, where weight defines the weight of a particular Loss in the final total. If you want to replace other Losses, you can also change the Loss field in the configuration file, for the currently supported Losses please refer to [Loss](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/ppcls/loss). + +#### 2.2.2 Resume Training + +If the training task is terminated for some reasons, it can be recovered by loading the checkpoints weights file and continue training: + +```shell +# Single card +python3 tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Global.checkpoints="./output/RecModel/epoch_5" \ + -o Global.device=gpu +# Multi card +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch tools/train.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Global.checkpoints="./output/RecModel/epoch_5" \ + -o Global.device=gpu +``` + +There is no need to modify the configuration file, just set the `Global.checkpoints` parameter when continuing training, indicating the path to the loaded breakpoint weights file, using this parameter will load both the saved checkpoints weights and information about the learning rate, optimizer, etc. + +**Note**: + +- The `-o Global.checkpoints` parameter need not contain the suffix name of the checkpoint weights file, the above training command will generate the breakpoint weights file as shown below during training, if you want to continue training from breakpoint `5` then the `Global.checkpoints` parameter just needs to be set to `". /output/RecModel/epoch_5"` and PaddleClas will automatically supplement the suffix name. + + ``` + output/ + └── RecModel + ├── best_model.pdopt + ├── best_model.pdparams + ├── best_model.pdstates + ├── epoch_1.pdopt + ├── epoch_1.pdparams + ├── epoch_1.pdstates + . + . + . + ``` + +#### 2.2.3 Model Evaluation + +Model evaluation can be carried out with the following commands. + +```shell +# Single card +python3 tools/eval.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Global.pretrained_model=./output/RecModel/best_model +# Multi card +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch tools/eval.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Global.pretrained_model=./output/RecModel/best_model +``` + +The above command will use `./configs/quick_start/MobileNetV1_retrieval.yaml` as a configuration file to evaluate the model obtained from the above training `./output/RecModel/best_model` for evaluation. You can also set up the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above. + +Some of the configurable evaluation parameters are introduced as follows. + +- `Arch.name`:the name of the model +- `Global.pretrained_model`:path to the pre-trained model file of the model to be evaluated, unlike `Global.Backbone.pretrained`, the pre-trained model is the weight of the whole model instead of the Backbone only. When it is time to do model evaluation, the weights of the whole model need to be loaded. +- `Metric.Eval`:the metric to be evaluated, by default evaluates recall@1, recall@5, mAP. when you are not going to evaluate a metric, you can remove the corresponding trial marker from the configuration file; when you want to add a certain evaluation metric, you can also refer to [Metric](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/ppcls/metric/metrics.py) section to add the relevant metric to the configuration file `Metric.Eval`. + +**Note:** + +- When loading the model to be evaluated, the path to the model file needs to be specified, but it is not necessary to include the file suffix, PaddleClas will automatically complete the `.pdparams` suffix, e.g. [2.2.2 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/getting_started_retrieval_en.md#Resume-Training). +- Metric learning are generally not evaluated for TopkAcc. + +### 2.3 Export Inference Model + +By exporting the inference model, PaddlePaddle supports the transformation of the trained model using prediction with inference engine. + +```shell +python3 tools/export_model.py \ + -c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \ + -o Global.pretrained_model=output/RecModel/best_model \ + -o Global.save_inference_dir=./inference +``` + +`Global.pretrained_model` is used to specify the model file path, which still does not need to contain the model file suffix (e.g.[2.2.2 Model Recovery Training](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/models_training/recognition.md#2.2.2)). When executed, it will generate the `./inference` directory, which contains the `inference.pdiparams`,`inference.pdiparams.info`, and`inference.pdmodel` files.`Global.save_inference_dir` allows you to specify the path to export the inference model. The inference model saved here is truncated at the embedding feature level, i.e. the final output of the model is n-dimensional embedding features. + +The above command will generate the model structure file (`inference.pdmodel`) and the model weights file (`inference.pdiparams`), which can then be used for inference using the inference engine. The process of inference using the inference model can be found in [Predictive inference based on the Python prediction engine](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/tutorials/@shengyu). + +## 3. Vector Search + +Vector search in PaddleClas currently supports the following environments: + +``` +└── CPU + ├── Linux + ├── MacOS + └── Windows +``` + +[Faiss](https://github.com/facebookresearch/faiss) is adopted as a search library, which is an efficient one for feature search and clustering. A variety of similarity search algorithms are integrated in this library to meet different scenarios. In PaddleClas, three search algorithms are supported. + +- **HNSW32**: A graph indexing method boasts high retrieval accuracy and fast speed. However, the feature library only supports the function of adding images, not deleting image features. (Default method) +- **IVF**: An inverted index search method with fast speed but slightly lower precision. The feature library supports functions of adding and deleting image features. +- **FLAT**: A violent search algorithm presenting the highest precision, but slower retrieval speed in face of large data volume. The feature library supports functions of adding and deleting image features. + +See its detailed introduction in the [official document](https://github.com/facebookresearch/faiss/wiki). + +`Faiss` can be installed as follows: + +``` +pip install faiss-cpu==1.7.1post2 +``` + +If the above cannot be properly referenced, please `uninstall` and then `install` again, especially when you are using`windows`. + +## 4. Basic Knowledge + +Image retrieval refers to a query image given a specific instance (e.g. a specific target, scene, item, etc.) that contains the same instance from a database image. Unlike image classification, image retrieval solves an open set problem where the training set may not contain the class of the image being recognised. The overall process of image retrieval is: firstly, the images are represented in a suitable feature vector, secondly, a nearest neighbour search is performed on these image feature vectors using Euclidean or Cosine distances to find similar images in the base, and finally, some post-processing techniques can be used to fine-tune the retrieval results and determine information such as the category of the image being recognised. Therefore, the key to determining the performance of an image retrieval algorithm lies in the goodness of the feature vectors corresponding to the images. + +- Metric Learning + +Metric learning studies how to learn a distance function on a particular task so that the distance function can help nearest-neighbour based algorithms (kNN, k-means, etc.) to achieve better performance. Deep Metric Learning is a method of metric learning that aims to learn a mapping from the original features to a low-dimensional dense vector space (embedding space) such that similar objects on the embedding space are closer together using commonly used distance functions (Euclidean distance, cosine distance, etc.) ) on the embedding space, while the distances between objects of different classes are not close to each other. Deep metric learning has achieved very successful applications in the field of computer vision, such as face recognition, commodity recognition, image retrieval, pedestrian re-identification, etc. See [HERE](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/algorithm_introduction/metric_learning.md) for detailed information. + +- Introduction to Image Retrieval Datasets + - Training Dataset: used to train the model so that it can learn the image features of the collection. + - Gallery Dataset: used to provide the gallery data for the image retrieval task. The gallery dataset can be the same as the training set or the test set, or different. + - Test Set (Query Dataset): used to test the goodness of the model, usually each test image in the test set is extracted with features, and then matched with the features of the underlying data to obtain recognition results, and then the metrics of the whole test set are calculated based on the recognition results. + +- Image Retrieval Evaluation Metrics + + - recall: indicates the number of predicted positive cases with positive labels / the number of cases with positive labels + - recall@1: Number of predicted positive cases in top-1 with positive label / Number of cases with positive label + - recall@5: Number of all predicted positive cases in top-5 retrieved with positive label / Number of cases with positive label + + - mean Average Precision(mAP) + - AP: AP refers to the average precision on different recall rates + - mAP: Average of the APs for all images in the test set -- GitLab