From 604a3dbed063812a13b985c30fe5ebaf0309ac61 Mon Sep 17 00:00:00 2001 From: dongshuilong Date: Fri, 18 Jun 2021 15:01:45 +0800 Subject: [PATCH] add en docs for feature learning --- .../cartoon_character_recognition_en.md | 41 +++++++++++++++ docs/en/application/feature_learning_en.md | 4 +- docs/en/application/logo_recognition_en.md | 52 +++++++++++++++++++ docs/en/application/vehicle_recognition_en.md | 6 +-- docs/zh_CN/application/feature_learning.md | 4 +- 5 files changed, 99 insertions(+), 8 deletions(-) create mode 100644 docs/en/application/cartoon_character_recognition_en.md create mode 100644 docs/en/application/logo_recognition_en.md diff --git a/docs/en/application/cartoon_character_recognition_en.md b/docs/en/application/cartoon_character_recognition_en.md new file mode 100644 index 00000000..582f67ed --- /dev/null +++ b/docs/en/application/cartoon_character_recognition_en.md @@ -0,0 +1,41 @@ +# Cartoon Character Recognition + +Since the 1970s, face recognition has become one of the most important topics in the field of computer vision and biometrics. In recent years, traditional face recognition methods have been replaced by the deep learning method based on convolutional neural network (CNN). At present, face recognition technology is widely used in security, commerce, finance, intelligent self-service terminal, entertainment and other fields. With the strong demand of industry application, animation media has been paid more and more attention, and face recognition of animation characters has become a new research field. + +## 1 Pipeline + +See the pipline of [feature learning](./feature_learning_en.md) for details. It is worth noting that the `Neck` module is not used in this process. + +The config file: [ResNet50_icartoon.yaml](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml) + + The details are as follows. + +### 1.1 Data Augmentation + +- `RandomCrop`: 224x224 +- `RandomFlip` +- `Normlize`: normlize images to 0~1 + +### 1.2 Backbone + +`ResNet50` is used as the backbone. And Large model was used for distillation. + +### 1.3 Metric Learning Losses + +`CELoss` is used for training. + +## 2 Experiment + + This method is validated on icartoonface [1] dataset. The dataset consists of 389678 images of 5013 cartoon characters with ID, bounding box, pose and other auxiliary attributes. The dataset is the largest cartoon media dataset in the field of image recognition. + +Compared with other datasets, icartoonface has obvious advantages in both image quantity and entity number. Among them, training set inclues 5013 classes, 389678 images. The query dataset has 2500 images and gallery dataset has 20000 images. + +![icartoon](../../images/icartoon1.png) + +It is worth noting that, compared with the face recognition task, the accessories, props, hairstyle and other factors of cartoon characters' head portraits can significantly improve the recognition accuracy. Therefore, based on the annotation box of the original dataset, we double the length and width of bbox to get a more comprehensive cartoon character image. + + On this dataset, the recall1 of this method reaches 83.24%. + +## 3 References + +[1] Cartoon Face Recognition: A Benchmark Dataset. 2020. [download](https://github.com/luxiangju-PersonAI/iCartoonFace) diff --git a/docs/en/application/feature_learning_en.md b/docs/en/application/feature_learning_en.md index 9e931c2b..3c065290 100644 --- a/docs/en/application/feature_learning_en.md +++ b/docs/en/application/feature_learning_en.md @@ -8,12 +8,12 @@ This part mainly explains the training mode of feature learning, which is `RecMo - Support `Arcface Loss` and other `metric learning`loss functions to improve feature learning ability -# Pipeline +# 1 Pipeline ![](../../images/recognition/rec_pipeline.png) The overall structure of feature learning is shown in the figure above, which mainly includes `Data Augmentation`, `Backbone`, `Neck`, `Metric Learning` and so on. The `Neck` part is a freely added layers, such as `Embedding layer`. Of course, this module can be omitted if not needed. During training, the loss of `Metric Learning` is used to optimize the model. Generally speaking, the output of the `Neck` is used as the feature output when in inference stage. -## Config Description +## 2 Config Description The feature learning config file description can be found in [yaml description](../tutorials/config_en.md). diff --git a/docs/en/application/logo_recognition_en.md b/docs/en/application/logo_recognition_en.md new file mode 100644 index 00000000..cf379cf3 --- /dev/null +++ b/docs/en/application/logo_recognition_en.md @@ -0,0 +1,52 @@ +# Logo Recognition + +Logo recognition is a field that is widely used in real life, such as whether the Adidas or Nike logo appears in a photo, or whether the Starbucks or Coca-Cola logo appears on a cup. Usually, when the number of logo categories is large, the two-stage method of detection and recognition is often used. The detection module is responsible for detecting the potential logo area, and then feed the logo area to the recognition module to identify the category. The recognition module mostly adopts retrieval-based method, and sorts the similarity of the query and the gallery to obtain the predicted category. This document mainly introduces the feature learning part. + +## 1 Pipeline + +See the pipline of [feature learning](./feature_learning_en.md) for details. + +The config file of logo recognition: [ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml). + +The details are as follows. + +### 1.1 Data Augmentation + +Different from classification, this part mainly uses the following methods: + +- `Resize` to 224. The input image is already croped using bbox by a logo detector. +- [AugMix](https://arxiv.org/abs/1912.02781v1):Simulate lighting changes, camera position changes and other real scenes. +- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):Simulate occlusion. + +### 1.2 Backbone + +Using `ResNet50` as backbone, and make the following modifications: + +- Last stage stride = 1, keep the size of the final output feature map to 14x14. At the cost of increasing a small amount of calculation, the ability of feature representation is greatly improved. +- Use pretrained weights of ImageNet + +code:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py) + +### 1.3 Neck + +In order to reduce the complexity of calculating feature distance in inference, an embedding convolution layer is added, and the feature dimension is set to 512. + +### 1.4 Metric Learning Losses + +[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py) , [CircleMargin](../../../ppcls/arch/gears/circlemargin.py) [1] are used. The weight ratio of two losses is 1:1. + +## 2 Experiment + + + +LogoDet-3K[2] dataset is used for experiments. The dataset is fully labeled, with 3000 logo categories, about 200,000 high-quality manually labeled logo objects and 158,652 images. + +Since the dataset is original desigined for detection task, only the cropped logo area is used in the logo recognition stage. Therefore, the labeled bbox annotations are used to crop the logo area to form the training set, eliminating the influence of the background in the recognition stage. After cropping preprocessing, the dataset was splited to 155,427 images as training sets, covering 3000 logo categories (also used as the gallery during testing), and 3225 as test sets, which were used as query sets. The cropped dataset is available [download here](https://arxiv.org/abs/2008.05359) + +On this data, the single model Recall@1 Acc: 89.8%. + +## 3 References + +[1] Circle loss: A unified perspective of pair similarity optimization. *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*. 2020. + +[2] LogoDet-3K: A Large-Scale Image Dataset for Logo Detection[J]. arXiv preprint arXiv:2008.05359, 2020. diff --git a/docs/en/application/vehicle_recognition_en.md b/docs/en/application/vehicle_recognition_en.md index 879eb3c6..f59e5e54 100644 --- a/docs/en/application/vehicle_recognition_en.md +++ b/docs/en/application/vehicle_recognition_en.md @@ -28,8 +28,6 @@ Different from classification, this part mainly uses the following methods: ### 1.2 Backbone -使用`ResNet50`作为backbone,同时做了如下修改: - Using `ResNet50` as backbone, and make the following modifications: - Last stage stride = 1, keep the size of the final output feature map to 14x14. At the cost of increasing a small amount of calculation, the ability of feature expression is greatly improved. @@ -38,14 +36,14 @@ code:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models ### 1.3 Neck -In order to reduce the complexity of calculating feature distance in inferencne, an embedding convolution layer is added, and the feature dimension is 512. +In order to reduce the complexity of calculating feature distance in inference, an embedding convolution layer is added, and the feature dimension is set to 512. ### 1.4 Metric Learning Losses - In vehicle ReID,[SupConLoss](../../../ppcls/loss/supconloss.py) , [ArcLoss](../../../ppcls/arch/gears/arcmargin.py) are used. The weight ratio of two losses is 1:1. - In vehicle fine-grained classification, [TtripLet Loss](../../../ppcls/loss/triplet.py), [ArcLoss](../../../ppcls/arch/gears/arcmargin.py) are used. The weight ratio of two losses is 1:1. -## Experiment +## 2 Experiment ### 2.1 Vehicle ReID diff --git a/docs/zh_CN/application/feature_learning.md b/docs/zh_CN/application/feature_learning.md index ef3fa9c4..30d79a2d 100644 --- a/docs/zh_CN/application/feature_learning.md +++ b/docs/zh_CN/application/feature_learning.md @@ -6,7 +6,7 @@ - 支持在`backbone`的feature输出层后,添加可配置的网络层,即`Neck`部分 - 支持`ArcFace Loss`等`metric learning` 相关loss函数,提升特征学习能力 -## 整体流程 +## 1 整体流程 ![](../../images/recognition/rec_pipeline.png) @@ -14,6 +14,6 @@ 针对不同的应用,可以根据需要,对每一部分自由选择。每一部分的具体配置,如数据增强、Backbone、Neck、Metric Learning相关Loss等设置,详见具体应用:[车辆识别](./vehicle_recognition.md)、[Logo识别](./logo_recognition.md)、[动漫人物识别](./cartoon_character_recognition.md)、[商品识别](./product_recognition.md) -## 配置文件说明 +## 2 配置文件说明 配置文件说明详见[yaml配置文件说明文档](../tutorials/config.md)。其中模型结构配置,详见文档中**识别模型结构配置**部分。 -- GitLab