diff --git a/docs/en/application/feature_learning_en.md b/docs/en/application/feature_learning_en.md new file mode 100644 index 0000000000000000000000000000000000000000..9e931c2b69a86b0494d51f8fa6f6f34253fe0776 --- /dev/null +++ b/docs/en/application/feature_learning_en.md @@ -0,0 +1,19 @@ +# Feature Learning + +This part mainly explains the training mode of feature learning, which is `RecModel` training mode in code. The main purpose of feature learning is to support the application, such as vehicle recognition (vehicle fine-grained classification, vehicle Reid), logo recognition, cartoon character recognition , product recognition, which needs to learn robust features to identify objects. Different from training classification network on Imagenet, this feature learning part mainly has the following features: + +- Support to truncate the `backbone`, which means feature of any intermediate layer can be extracted + +- Support to add configurable layers after `backbone` output, namely `Neck` + +- Support `Arcface Loss` and other `metric learning`loss functions to improve feature learning ability + +# Pipeline + +![](../../images/recognition/rec_pipeline.png) + +The overall structure of feature learning is shown in the figure above, which mainly includes `Data Augmentation`, `Backbone`, `Neck`, `Metric Learning` and so on. The `Neck` part is a freely added layers, such as `Embedding layer`. Of course, this module can be omitted if not needed. During training, the loss of `Metric Learning` is used to optimize the model. Generally speaking, the output of the `Neck` is used as the feature output when in inference stage. + +## Config Description + +The feature learning config file description can be found in [yaml description](../tutorials/config_en.md). diff --git a/docs/en/application/product_recognition_en.md b/docs/en/application/product_recognition_en.md new file mode 100644 index 0000000000000000000000000000000000000000..fe5e6558c6a29d7c68b3df3e2e96f895a25589d8 --- /dev/null +++ b/docs/en/application/product_recognition_en.md @@ -0,0 +1,39 @@ +# Product Recognition + +Product recogniton is now widely used . The way of shopping by taking a photo has been adopted by many people. And the unmanned settlement platform has entered the major supermarkets, which is also supported by product recognition technology. The technology is about the process of "product detection + product identification". The product detection module is responsible for detecting potential product areas, and the product identification model is responsible for identifying the main body detected by the product detection module. The recognition module uses the retrieval method to get the similarity rank of product in database and the query image . This document mainly introduces the feature extraction part of product pictures. + +## 1 Pipeline + +See the pipline of [feature learning](./feature_learning_en.md) for details. + +The config file: [ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml) + + The details are as follows. + +### 1.1 Data Augmentation + +- `RandomCrop`: 224x224 +- `RandomFlip` +- `Normlize`: normlize images to 0~1 + +### 1.2 Backbone + +Using `ResNet50_vd` as the backbone, whicle is pretrained on ImageNet. + +### 1.3 Neck + + A 512 dimensional embedding FC layer without batchnorm and activation is used. + +### 1.4 Metric Learning Losses + + At present, `CELoss` is used. In order to obtain more robust features, other loss will be used for training in the future. Please look forward to it. + +## 2 Experiment + + This scheme is tested on Aliproduct [1] dataset. This dataset is an open source dataset of Tianchi competition, which is the largest open source product data set at present. It has more than 50000 identification categories and about 2.5 million training pictures. + + On this data, the single model Top1 Acc: 85.67%. + +## 3 References + +[1] Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV, 2020. diff --git a/docs/en/application/vehicle_recognition_en.md b/docs/en/application/vehicle_recognition_en.md new file mode 100644 index 0000000000000000000000000000000000000000..879eb3c6f683e8203044876fa9b70dc7333c4a83 --- /dev/null +++ b/docs/en/application/vehicle_recognition_en.md @@ -0,0 +1,105 @@ +# Vehicle Recognition + +This part mainly includes two parts: vehicle fine-grained classification and vehicle Reid. + +The goal of fine-grained classification is to recognize images belonging to multiple subordinate categories of a super-category, e.g., different species of animals/plants, different models of cars, different kinds of retail products. Obviously, fine-grained vehicle classification is to classify different sub categories of vehicles. + +Vehicle ReID aims to re-target vehicle images across non-overlapping camera views given a query image. It has many practical applications, such as for analyzing and managing the traffic flows in Intelligent Transport System. In this process, how to extract robust features is particularly important. + +In this document, the same training scheme is used to try the two application respectively. + +## 1 Pipeline + +See the pipline of [feature learning](./feature_learning_en.md) for details. + +The config file of Vehicle ReID: [ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml). + +The config file of Vehicle fine-grained classification:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml). + + The details are as follows. + +### 1.1 Data Augmentation + +Different from classification, this part mainly uses the following methods: + +- `Resize` to 224. Especially for ReID, the vehicle image is already croped using bbox by detector. So if `CenterCrop` is used, more vehicle information will be lost. +- [AugMix](https://arxiv.org/abs/1912.02781v1):Simulation of lighting changes, camera position changes and other real scenes. +- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):Simulate occlusion. + +### 1.2 Backbone + +使用`ResNet50`作为backbone,同时做了如下修改: + + Using `ResNet50` as backbone, and make the following modifications: + +- Last stage stride = 1, keep the size of the final output feature map to 14x14. At the cost of increasing a small amount of calculation, the ability of feature expression is greatly improved. + +code:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py) + +### 1.3 Neck + +In order to reduce the complexity of calculating feature distance in inferencne, an embedding convolution layer is added, and the feature dimension is 512. + +### 1.4 Metric Learning Losses + +- In vehicle ReID,[SupConLoss](../../../ppcls/loss/supconloss.py) , [ArcLoss](../../../ppcls/arch/gears/arcmargin.py) are used. The weight ratio of two losses is 1:1. +- In vehicle fine-grained classification, [TtripLet Loss](../../../ppcls/loss/triplet.py), [ArcLoss](../../../ppcls/arch/gears/arcmargin.py) are used. The weight ratio of two losses is 1:1. + +## Experiment + +### 2.1 Vehicle ReID + + + +This method is used in VERI-Wild dataset. This dataset was captured in a large CCTV monitoring system in an unrestricted scenario for a month (30 * 24 hours). The system consists of 174 cameras, which are distributed in large area of more than 200 square kilometers. The original vehicle image set contains 12 million vehicle images. After data cleaning and labeling, 416314 images and 40671 vehicle ids are collected. [See the paper for details]( https://github.com/PKU-IMRE/VERI-Wild). + +| **Methods** | **Small** | | | +| :--------------------------: | :-------: | :-------: | :-------: | +| | mAP | Top1 | Top5 | +| Strong baesline(Resnet50)[1] | 76.61 | 90.83 | 97.29 | +| HPGN(Resnet50+PGN)[2] | 80.42 | 91.37 | - | +| GLAMOR(Resnet50+PGN)[3] | 77.15 | 92.13 | 97.43 | +| PVEN(Resnet50)[4] | 79.8 | 94.01 | 98.06 | +| SAVER(VAE+Resnet50)[5] | 80.9 | 93.78 | 97.93 | +| PaddleClas baseline1 | 65.6 | 92.37 | 97.23 | +| PaddleClas baseline2 | 80.09 | **93.81** | **98.26** | + + Baseline1 is the released, and baseline2 will be released soon. + +### 2.2 Vehicle Fine-grained Classification + + In this applications, we use [CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html) as train dataset. + +![](../../images/recognition/vehicle/CompCars.png) + +The images in the dataset mainly come from the network and monitoring data. The network data includes 163 automobile manufacturers and 1716 automobile models, which includes **136726** full vehicle images and **27618** partial vehicle images. The network car data includes the information of bounding box, perspective and five attributes (maximum speed, displacement, number of doors, number of seats and car type) for vehicles. The monitoring data includes **50000** front view images. + + It is worth noting that this dataset needs to generate labels according to its own needs. For example, in this demo, vehicles of the same model produced in different years are regarded as the same category. Therefore, the total number of categories is 431. + +| **Methods** | Top1 Acc | +| :-----------------------------: | :-------: | +| ResNet101-swp[6] | 97.6% | +| Fine-Tuning DARTS[7] | 95.9% | +| Resnet50 + COOC[8] | 95.6% | +| A3M[9] | 95.4% | +| PaddleClas baseline (ResNet50) | **97.1**% | + +## 3 References + +[1] Bag of Tricks and a Strong Baseline for Deep Person Re-Identification.CVPR workshop 2019. + +[2] Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification. In arXiv preprint arXiv:2005.14684 + +[3] GLAMORous: Vehicle Re-Id in Heterogeneous Cameras Networks with Global and Local Attention. In arXiv preprint arXiv:2002.02256 + +[4] Parsing-based view-aware embedding network for vehicle re-identification. CVPR 2020. + +[5] The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification. In ECCV 2020. + +[6] Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition. IEEE Transactions on Intelligent Transportation Systems, 2017. + +[7] Fine-Tuning DARTS for Image Classification. 2020. + +[8] Fine-Grained Vehicle Classification with Unsupervised Parts Co-occurrence Learning. 2018 + +[9] Attribute-Aware Attention Model for Fine-grained Representation Learning. 2019. diff --git a/docs/zh_CN/application/vehicle_recognition.md b/docs/zh_CN/application/vehicle_recognition.md index e07bc3f8749e565ace2225b3d88d4edaa2e30ec0..14b28ca6a864f06a24bf27053cfe93a1ae8f3f27 100644 --- a/docs/zh_CN/application/vehicle_recognition.md +++ b/docs/zh_CN/application/vehicle_recognition.md @@ -46,8 +46,6 @@ ReID,也就是 Re-identification,其定义是利用算法,在图像库中 ### 2.1 车辆ReID - - 此方法在VERI-Wild数据集上进行了实验。此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)