提交 8dea22da 编写于 作者: W weishengyu

Merge branch 'develop' of https://github.com/weisy11/PaddleClas into develop

......@@ -53,19 +53,20 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
- [骨干网络和预训练模型库](./docs/zh_CN/ImageNet_models_cn.md)
- [主体检测](./docs/zh_CN/application/mainbody_detection.md)
- 图像分类
- [Cifar100分类任务](./docs/zh_CN/tutorials/quick_start_professional.md)
- 特征学习
- [ImageNet分类任务](./docs/zh_CN/tutorials/quick_start_professional.md)
- [特征学习](./docs/zh_CN/application/feature_learning.md)
- [特征学习](./docs/zh_CN/application/feature_learning.md)
- [商品识别](./docs/zh_CN/application/product_recognition.md)
- [车辆识别](./docs/zh_CN/application/vehicle_reid.md)
- [车辆识别](./docs/zh_CN/application/vehicle_recognition.md)
- [logo识别](./docs/zh_CN/application/logo_recognition.md)
- [动漫人物识别](./docs/zh_CN/application/cartoon_character_recognition.md)
- [向量检索](./deploy/vector_search/README.md)
- 模型训练/评估
- [图像分类任务](./docs/zh_CN/tutorials/getting_started.md)
- [特征学习任务](./docs/zh_CN/application/feature_learning.md)
- 预测引擎推理
- [基于Python预测引擎预测推理](./docs/zh_CN/tutorials/getting_started.md#4-使用inference模型进行模型推理)
- [基于C++预测引擎预测推理(当前只支持图像分类任务,图像识别更新中)](./deploy/cpp_infer/readme.md)
- [特征学习任务](./docs/zh_CN/tutorials/getting_started_retrieval.md)
- 模型预测
- [基于Python预测引擎预测推理](./docs/zh_CN/inference.md)
- [基于C++预测引擎预测推理](./deploy/cpp/readme.md)(当前只支持图像分类任务,图像识别更新中)
- 模型部署(当前只支持图像分类任务,图像识别更新中)
- [服务化部署](./deploy/hubserving/readme.md)
- [端侧部署](./deploy/lite/readme.md)
......
......@@ -36,7 +36,7 @@ class RecPredictor(Predictor):
"transform_ops"])
self.postprocess = build_postprocess(config["RecPostProcess"])
def predict(self, images):
def predict(self, images, feature_normalize=True):
input_names = self.paddle_predictor.get_input_names()
input_tensor = self.paddle_predictor.get_input_handle(input_names[0])
......@@ -54,6 +54,12 @@ class RecPredictor(Predictor):
input_tensor.copy_from_cpu(image)
self.paddle_predictor.run()
batch_output = output_tensor.copy_to_cpu()
if feature_normalize:
feas_norm = np.sqrt(
np.sum(np.square(batch_output), axis=1, keepdims=True))
batch_output = np.divide(batch_output, feas_norm)
return batch_output
......
......@@ -106,6 +106,14 @@ Finetuning is carried out on ImageNet1k dataset to restore distribution between
* For image classsification tasks, The model accuracy can be further improved when the test scale is 1.15 times that of training[5]. For the 82.99% ResNet50_vd pretrained model, it comes to 83.7% using 320x320 for the evaluation. We use Fix strategy to finetune the model with the training scale set as 320x320. During the process, the pre-preocessing pipeline is same for both training and test. All the weights except the fully connected layer are freezed. Finally the top-1 accuracy comes to **84.0%**.
### Some phenomena during the experiment
In the prediction process, the average value and variance of the batch norm are obtained by loading the pretrained model (set its mode as test mode). In the training process, batch norm is obtained by counting the information of the current batch (set its mode as train mode) and calculating the moving average with the historical saved information. In the distillation task, we found that through the train mode, In the distillation task, we found that the real-time change of the bn parameter of the teacher model to guide the student model is better than the student model obtained through the test mode distillation. The following is a set of experimental results. Therefore, in this distillation scheme, we use train mode to get the soft label of the teacher model.
|Teacher Model | Teacher Top1 | Student Model | Student Top1|
|- |:-: |:-: | :-: |
| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 75.84% |
## Application of the distillation model
......@@ -113,7 +121,7 @@ Finetuning is carried out on ImageNet1k dataset to restore distribution between
* Adjust the learning rate of the middle layer. The middle layer feature map of the model obtained by distillation is more refined. Therefore, when the distillation model is used as the pretrained model in other tasks, if the same learning rate as before is adopted, it is easy to destroy the features. If the learning rate of the overall model training is reduced, it will bring about the problem of slow convergence. Therefore, we use the strategy of adjusting the learning rate of the middle layer. specifically:
    * For ResNet50_vd, we set up a learning rate list. The three conv2d convolution parameters before the resiual block have a uniform learning rate multiple, and the four resiual block conv2d have theirs own learning rate parameters, respectively. 5 values need to be set in the list. By the experiment, we find that when used for transfer learning finetune classification model, the learning rate list with `[0.1,0.1,0.2,0.2,0.3]` performs better in most tasks; while in the object detection tasks, `[0.05, 0.05, 0.05, 0.1, 0.15]` can bring greater accuracy gains.
    * For MoblileNetV3_large_1x0, because it contains 15 blocks, we set each 3 blocks to share a learning rate, so 5 learning rate values are required. We find that in classification and detection tasks, the learning rate list with `[0.25, 0.25, 0.5, 0.5, 0.75]` performs better in most tasks.
    * For MoblileNetV3_large_x1_0, because it contains 15 blocks, we set each 3 blocks to share a learning rate, so 5 learning rate values are required. We find that in classification and detection tasks, the learning rate list with `[0.25, 0.25, 0.5, 0.5, 0.75]` performs better in most tasks.
* Appropriate l2 decay. Different l2 decay values are set for different models during training. In order to prevent overfitting, l2 decay is ofen set as large for large models. L2 decay is set as `1e-4` for ResNet50, and `1e-5 ~ 4e-5` for MobileNet series models. L2 decay needs also to be adjusted when applied in other tasks. Taking Faster_RCNN_MobiletNetV3_FPN as an example, we found that only modifying l2 decay can bring up to 0.5% accuracy (mAP) improvement on the COCO2017 dataset.
......@@ -167,54 +175,52 @@ This section will introduce the SSLD distillation experiments in detail based on
#### Distill ResNet50_vd using ResNeXt101_32x16d_wsl
#### Distill MobileNetV3_small_x1_0 using MobileNetV3_large_x1_0
Configuration of distilling `ResNet50_vd` using `ResNeXt101_32x16d_wsl` is as follows.
An example of SSLD distillation is provided here. The configuration file of `MobileNetV3_large_x1_0` distilling `MobileNetV3_small_x1_0` is provided in `ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml`, and the user can directly replace the path of the configuration file in `tools/train.sh` to use it.
```yaml
ARCHITECTURE:
name: 'ResNeXt101_32x16d_wsl_distill_ResNet50_vd'
pretrained_model: "./pretrained/ResNeXt101_32x16d_wsl_pretrained/"
# pretrained_model:
# - "./pretrained/ResNeXt101_32x16d_wsl_pretrained/"
# - "./pretrained/ResNet50_vd_pretrained/"
use_distillation: True
```
#### Distill MobileNetV3_large_x1_0 using ResNet50_vd_ssld
The detailed configuration is as follows.
Configuration of distilling `MobileNetV3_large_x1_0` using `MobileNetV3_small_x1_0` is as follows.
```yaml
ARCHITECTURE:
name: 'ResNet50_vd_distill_MobileNetV3_large_x1_0'
pretrained_model: "./pretrained/ResNet50_vd_ssld_pretrained/"
# pretrained_model:
# - "./pretrained/ResNet50_vd_ssld_pretrained/"
# - "./pretrained/ResNet50_vd_pretrained/"
use_distillation: True
Arch:
name: "DistillationModel"
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
models:
- Teacher:
name: MobileNetV3_large_x1_0
pretrained: True
use_ssld: True
- Student:
name: MobileNetV3_small_x1_0
pretrained: False
infer_model_name: "Student"
```
In configuration file, the `freeze_params_list` needs to specify whether the model needs to freeze the parameters, the `models` needs to specify the teacher model and the student model, and the teacher model needs to load the pretrained model. The user can directly change the model here.
### Begin to train the network
If everything is ready, users can begin to train the network using the following command.
```bash
export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
--log_dir=R50_vd_distill_MV3_large_x1_0 \
--log_dir=mv3_large_x1_0_distill_mv3_small_x1_0 \
tools/train.py \
-c ./configs/Distillation/R50_vd_distill_MV3_large_x1_0.yaml
-c ./ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml
```
### Note
* Before using SSLD, users need to train a teacher model on the target dataset firstly. The teacher model is used to guide the training of the student model.
* When using SSLD, users need to set `use_distillation` in the configuration file to` True`. In addition, because the student model learns soft-label with knowledge information, you need to turn off the `label_smoothing` option.
* If the student model is not loaded with a pretrained model, the other hyperparameters of the training can refer to the hyperparameters trained by the student model on ImageNet-1k. If the student model is loaded with the pre-trained model, the learning rate can be adjusted to `1/100~1/10` of the standard learning rate.
* In the process of SSLD distillation, the student model only learns the soft label, which makes the training process more difficult. It is recommended that the value of `l2_decay` can be decreased appropriately to obtain higher accuracy of the validation set.
......
......@@ -69,10 +69,6 @@ Unlike conventional artificially designed image augmentation methods, AutoAugmen
In PaddleClas, `AutoAugment` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ImageNetPolicy
from ppcls.data.imaug import transform
size = 224
......@@ -107,10 +103,6 @@ In `RandAugment`, the author proposes a random augmentation method. Instead of u
In PaddleClas, `RandAugment` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import RandAugment
from ppcls.data.imaug import transform
size = 224
......@@ -153,10 +145,6 @@ Cutout is a kind of dropout, but occludes input image rather than feature map. I
In PaddleClas, `Cutout` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import Cutout
from ppcls.data.imaug import transform
size = 224
......@@ -188,11 +176,6 @@ RandomErasing is similar to the Cutout. It is also to solve the problem of poor
In PaddleClas, `RandomErasing` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import RandomErasing
from ppcls.data.imaug import transform
size = 224
......@@ -229,11 +212,6 @@ Images are divided into some patches for `HideAndSeek` and masks are generated w
In PaddleClas, `HideAndSeek` is used as follows.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import HideAndSeek
from ppcls.data.imaug import transform
size = 224
......@@ -283,11 +261,6 @@ It shows that the second method is better.
The usage of `GridMask` in PaddleClas is shown below.
```python
from data.imaug import DecodeImage
from data.imaug import ResizeImage
from data.imaug import ToCHWImage
from data.imaug import GridMask
from data.imaug import transform
size = 224
......@@ -329,11 +302,6 @@ Mixup is the first solution for image aliasing, it is easy to realize and perfor
The usage of `Mixup` in PaddleClas is shown below.
```python
from ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import transform
from ppcls.data.imaug import MixupOperator
size = 224
......@@ -373,11 +341,6 @@ Cutmix randomly cuts out an `ROI` from one image, and then covered onto the corr
```python
rom ppcls.data.imaug import DecodeImage
from ppcls.data.imaug import ResizeImage
from ppcls.data.imaug import ToCHWImage
from ppcls.data.imaug import transform
from ppcls.data.imaug import CutmixOperator
size = 224
......@@ -444,10 +407,9 @@ Configuration of `RandAugment` is shown as follows. `Num_layers`(default as 2) a
```yaml
transforms:
transform_ops:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
......@@ -457,11 +419,10 @@ Configuration of `RandAugment` is shown as follows. `Num_layers`(default as 2) a
num_layers: 2
magnitude: 5
- NormalizeImage:
scale: 1./255.
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
```
### Cutout
......@@ -469,24 +430,22 @@ Configuration of `RandAugment` is shown as follows. `Num_layers`(default as 2) a
Configuration of `Cutout` is shown as follows. `n_holes`(default as 1) and `n_holes`(default as 112) are two hyperparameters.
```yaml
transforms:
transform_ops:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- Cutout:
n_holes: 1
length: 112
- ToCHWImage:
```
### Mixup
......@@ -495,42 +454,39 @@ Configuration of `Cutout` is shown as follows. `n_holes`(default as 1) and `n_ho
Configuration of `Mixup` is shown as follows. `alpha`(default as 0.2) is hyperparameter which users need to care about. What's more, `use_mix` need to be set as `True` in the root of the configuration.
```yaml
transforms:
transform_ops:
- DecodeImage:
to_rgb: True
to_np: False
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
batch_transform_ops:
- MixupOperator:
alpha: 0.2
```
## 启动命令
## Start training
Users can use the following command to start the training process, which can also be referred to `tools/run.sh`.
Users can use the following command to start the training process, which can also be referred to `tools/train.sh`.
```bash
export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH
python -m paddle.distributed.launch \
python3 -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
--log_dir=ResNet50_Cutout \
tools/train.py \
-c ./configs/DataAugment/ResNet50_Cutout.yaml
-c ./ppcls/configs/ImageNet/DataAugment/ResNet50_Cutout.yaml
```
## Note
* When using augmentation methods based on image aliasing, users need to set `use_mix` in the configuration file as `True`. In addition, because the label needs to be aliased when the image is aliased, the accuracy of the training data cannot be calculated. The training accuracy rate was not printed during the training process.
* In addition, because the label needs to be aliased when the image is aliased, the accuracy of the training data cannot be calculated. The training accuracy rate was not printed during the training process.
* The training data is more difficult with data augmentation, so the training loss may be larger, the training set accuracy is relatively low, but it has better generalization ability, so the validation set accuracy is relatively higher.
......
# Cartoon Character Recognition
Since the 1970s, face recognition has become one of the most important topics in the field of computer vision and biometrics. In recent years, traditional face recognition methods have been replaced by the deep learning method based on convolutional neural network (CNN). At present, face recognition technology is widely used in security, commerce, finance, intelligent self-service terminal, entertainment and other fields. With the strong demand of industry application, animation media has been paid more and more attention, and face recognition of animation characters has become a new research field.
## 1 Pipeline
See the pipline of [feature learning](./feature_learning_en.md) for details. It is worth noting that the `Neck` module is not used in this process.
The config file: [ResNet50_icartoon.yaml](../../../ppcls/configs/Cartoonface/ResNet50_icartoon.yaml)
The details are as follows.
### 1.1 Data Augmentation
- `RandomCrop`: 224x224
- `RandomFlip`
- `Normlize`: normlize images to 0~1
### 1.2 Backbone
`ResNet50` is used as the backbone. And Large model was used for distillation.
### 1.3 Metric Learning Losses
`CELoss` is used for training.
## 2 Experiment
This method is validated on icartoonface [1] dataset. The dataset consists of 389678 images of 5013 cartoon characters with ID, bounding box, pose and other auxiliary attributes. The dataset is the largest cartoon media dataset in the field of image recognition.
Compared with other datasets, icartoonface has obvious advantages in both image quantity and entity number. Among them, training set inclues 5013 classes, 389678 images. The query dataset has 2500 images and gallery dataset has 20000 images.
![icartoon](../../images/icartoon1.png)
It is worth noting that, compared with the face recognition task, the accessories, props, hairstyle and other factors of cartoon characters' head portraits can significantly improve the recognition accuracy. Therefore, based on the annotation box of the original dataset, we double the length and width of bbox to get a more comprehensive cartoon character image.
On this dataset, the recall1 of this method reaches 83.24%.
## 3 References
[1] Cartoon Face Recognition: A Benchmark Dataset. 2020. [download](https://github.com/luxiangju-PersonAI/iCartoonFace)
# Feature Learning
This part mainly explains the training mode of feature learning, which is `RecModel` training mode in code. The main purpose of feature learning is to support the application, such as vehicle recognition (vehicle fine-grained classification, vehicle Reid), logo recognition, cartoon character recognition , product recognition, which needs to learn robust features to identify objects. Different from training classification network on Imagenet, this feature learning part mainly has the following features:
- Support to truncate the `backbone`, which means feature of any intermediate layer can be extracted
- Support to add configurable layers after `backbone` output, namely `Neck`
- Support `Arcface Loss` and other `metric learning`loss functions to improve feature learning ability
# 1 Pipeline
![](../../images/recognition/rec_pipeline.png)
The overall structure of feature learning is shown in the figure above, which mainly includes `Data Augmentation`, `Backbone`, `Neck`, `Metric Learning` and so on. The `Neck` part is a freely added layers, such as `Embedding layer`. Of course, this module can be omitted if not needed. During training, the loss of `Metric Learning` is used to optimize the model. Generally speaking, the output of the `Neck` is used as the feature output when in inference stage.
## 2 Config Description
The feature learning config file description can be found in [yaml description](../tutorials/config_en.md).
# Logo Recognition
Logo recognition is a field that is widely used in real life, such as whether the Adidas or Nike logo appears in a photo, or whether the Starbucks or Coca-Cola logo appears on a cup. Usually, when the number of logo categories is large, the two-stage method of detection and recognition is often used. The detection module is responsible for detecting the potential logo area, and then feed the logo area to the recognition module to identify the category. The recognition module mostly adopts retrieval-based method, and sorts the similarity of the query and the gallery to obtain the predicted category. This document mainly introduces the feature learning part.
## 1 Pipeline
See the pipline of [feature learning](./feature_learning_en.md) for details.
The config file of logo recognition: [ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml).
The details are as follows.
### 1.1 Data Augmentation
Different from classification, this part mainly uses the following methods:
- `Resize` to 224. The input image is already croped using bbox by a logo detector.
- [AugMix](https://arxiv.org/abs/1912.02781v1):Simulate lighting changes, camera position changes and other real scenes.
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):Simulate occlusion.
### 1.2 Backbone
Using `ResNet50` as backbone, and make the following modifications:
- Last stage stride = 1, keep the size of the final output feature map to 14x14. At the cost of increasing a small amount of calculation, the ability of feature representation is greatly improved.
- Use pretrained weights of ImageNet
code:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
### 1.3 Neck
In order to reduce the complexity of calculating feature distance in inference, an embedding convolution layer is added, and the feature dimension is set to 512.
### 1.4 Metric Learning Losses
[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py) , [CircleMargin](../../../ppcls/arch/gears/circlemargin.py) [1] are used. The weight ratio of two losses is 1:1.
## 2 Experiment
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
LogoDet-3K[2] dataset is used for experiments. The dataset is fully labeled, with 3000 logo categories, about 200,000 high-quality manually labeled logo objects and 158,652 images.
Since the dataset is original desigined for detection task, only the cropped logo area is used in the logo recognition stage. Therefore, the labeled bbox annotations are used to crop the logo area to form the training set, eliminating the influence of the background in the recognition stage. After cropping preprocessing, the dataset was splited to 155,427 images as training sets, covering 3000 logo categories (also used as the gallery during testing), and 3225 as test sets, which were used as query sets. The cropped dataset is available [download here](https://arxiv.org/abs/2008.05359)
On this data, the single model Recall@1 Acc: 89.8%.
## 3 References
[1] Circle loss: A unified perspective of pair similarity optimization. *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*. 2020.
[2] LogoDet-3K: A Large-Scale Image Dataset for Logo Detection[J]. arXiv preprint arXiv:2008.05359, 2020.
# Product Recognition
Product recogniton is now widely used . The way of shopping by taking a photo has been adopted by many people. And the unmanned settlement platform has entered the major supermarkets, which is also supported by product recognition technology. The technology is about the process of "product detection + product identification". The product detection module is responsible for detecting potential product areas, and the product identification model is responsible for identifying the main body detected by the product detection module. The recognition module uses the retrieval method to get the similarity rank of product in database and the query image . This document mainly introduces the feature extraction part of product pictures.
## 1 Pipeline
See the pipline of [feature learning](./feature_learning_en.md) for details.
The config file: [ResNet50_vd_Aliproduct.yaml](../../../ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml)
The details are as follows.
### 1.1 Data Augmentation
- `RandomCrop`: 224x224
- `RandomFlip`
- `Normlize`: normlize images to 0~1
### 1.2 Backbone
Using `ResNet50_vd` as the backbone, whicle is pretrained on ImageNet.
### 1.3 Neck
A 512 dimensional embedding FC layer without batchnorm and activation is used.
### 1.4 Metric Learning Losses
At present, `CELoss` is used. In order to obtain more robust features, other loss will be used for training in the future. Please look forward to it.
## 2 Experiment
This scheme is tested on Aliproduct [1] dataset. This dataset is an open source dataset of Tianchi competition, which is the largest open source product data set at present. It has more than 50000 identification categories and about 2.5 million training pictures.
On this data, the single model Top1 Acc: 85.67%.
## 3 References
[1] Weakly Supervised Learning with Side Information for Noisy Labeled Images. ECCV, 2020.
# Vehicle Recognition
This part mainly includes two parts: vehicle fine-grained classification and vehicle Reid.
The goal of fine-grained classification is to recognize images belonging to multiple subordinate categories of a super-category, e.g., different species of animals/plants, different models of cars, different kinds of retail products. Obviously, fine-grained vehicle classification is to classify different sub categories of vehicles.
Vehicle ReID aims to re-target vehicle images across non-overlapping camera views given a query image. It has many practical applications, such as for analyzing and managing the traffic flows in Intelligent Transport System. In this process, how to extract robust features is particularly important.
In this document, the same training scheme is used to try the two application respectively.
## 1 Pipeline
See the pipline of [feature learning](./feature_learning_en.md) for details.
The config file of Vehicle ReID: [ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml).
The config file of Vehicle fine-grained classification:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml).
The details are as follows.
### 1.1 Data Augmentation
Different from classification, this part mainly uses the following methods:
- `Resize` to 224. Especially for ReID, the vehicle image is already croped using bbox by detector. So if `CenterCrop` is used, more vehicle information will be lost.
- [AugMix](https://arxiv.org/abs/1912.02781v1):Simulation of lighting changes, camera position changes and other real scenes.
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf):Simulate occlusion.
### 1.2 Backbone
Using `ResNet50` as backbone, and make the following modifications:
- Last stage stride = 1, keep the size of the final output feature map to 14x14. At the cost of increasing a small amount of calculation, the ability of feature expression is greatly improved.
code:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
### 1.3 Neck
In order to reduce the complexity of calculating feature distance in inference, an embedding convolution layer is added, and the feature dimension is set to 512.
### 1.4 Metric Learning Losses
- In vehicle ReID,[SupConLoss](../../../ppcls/loss/supconloss.py) , [ArcLoss](../../../ppcls/arch/gears/arcmargin.py) are used. The weight ratio of two losses is 1:1.
- In vehicle fine-grained classification, [TtripLet Loss](../../../ppcls/loss/triplet.py), [ArcLoss](../../../ppcls/arch/gears/arcmargin.py) are used. The weight ratio of two losses is 1:1.
## 2 Experiment
### 2.1 Vehicle ReID
<img src="../../images/recognition/vehicle/cars.JPG" style="zoom:50%;" />
This method is used in VERI-Wild dataset. This dataset was captured in a large CCTV monitoring system in an unrestricted scenario for a month (30 * 24 hours). The system consists of 174 cameras, which are distributed in large area of more than 200 square kilometers. The original vehicle image set contains 12 million vehicle images. After data cleaning and labeling, 416314 images and 40671 vehicle ids are collected. [See the paper for details]( https://github.com/PKU-IMRE/VERI-Wild).
| **Methods** | **Small** | | |
| :--------------------------: | :-------: | :-------: | :-------: |
| | mAP | Top1 | Top5 |
| Strong baesline(Resnet50)[1] | 76.61 | 90.83 | 97.29 |
| HPGN(Resnet50+PGN)[2] | 80.42 | 91.37 | - |
| GLAMOR(Resnet50+PGN)[3] | 77.15 | 92.13 | 97.43 |
| PVEN(Resnet50)[4] | 79.8 | 94.01 | 98.06 |
| SAVER(VAE+Resnet50)[5] | 80.9 | 93.78 | 97.93 |
| PaddleClas baseline1 | 65.6 | 92.37 | 97.23 |
| PaddleClas baseline2 | 80.09 | **93.81** | **98.26** |
Baseline1 is the released, and baseline2 will be released soon.
### 2.2 Vehicle Fine-grained Classification
In this applications, we use [CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html) as train dataset.
![](../../images/recognition/vehicle/CompCars.png)
The images in the dataset mainly come from the network and monitoring data. The network data includes 163 automobile manufacturers and 1716 automobile models, which includes **136726** full vehicle images and **27618** partial vehicle images. The network car data includes the information of bounding box, perspective and five attributes (maximum speed, displacement, number of doors, number of seats and car type) for vehicles. The monitoring data includes **50000** front view images.
It is worth noting that this dataset needs to generate labels according to its own needs. For example, in this demo, vehicles of the same model produced in different years are regarded as the same category. Therefore, the total number of categories is 431.
| **Methods** | Top1 Acc |
| :-----------------------------: | :-------: |
| ResNet101-swp[6] | 97.6% |
| Fine-Tuning DARTS[7] | 95.9% |
| Resnet50 + COOC[8] | 95.6% |
| A3M[9] | 95.4% |
| PaddleClas baseline (ResNet50) | **97.1**% |
## 3 References
[1] Bag of Tricks and a Strong Baseline for Deep Person Re-Identification.CVPR workshop 2019.
[2] Exploring Spatial Significance via Hybrid Pyramidal Graph Network for Vehicle Re-identification. In arXiv preprint arXiv:2005.14684
[3] GLAMORous: Vehicle Re-Id in Heterogeneous Cameras Networks with Global and Local Attention. In arXiv preprint arXiv:2002.02256
[4] Parsing-based view-aware embedding network for vehicle re-identification. CVPR 2020.
[5] The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification. In ECCV 2020.
[6] Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition. IEEE Transactions on Intelligent Transportation Systems, 2017.
[7] Fine-Tuning DARTS for Image Classification. 2020.
[8] Fine-Grained Vehicle Classification with Unsupervised Parts Co-occurrence Learning. 2018
[9] Attribute-Aware Attention Model for Fine-grained Representation Learning. 2019.
......@@ -31,10 +31,10 @@ cd /home/Projects
# You need to create a docker container for the first run, and do not need to run the current command when you run it again
# Create a docker container named ppcls and map the current directory to the /paddle directory of the container
# It is recommended to set a shared memory greater than or equal to 8G through the --shm-size parameter
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0 /bin/bash
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0 /bin/bash
# Use the following command to create a container if you want to use GPU in the container
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
```
You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more docker images.
......
......@@ -113,7 +113,7 @@ SSLD的流程图如下图所示。
* 对于图像分类任务,在测试的时候,测试尺度为训练尺度的1.15倍左右时,往往在不需要重新训练模型的情况下,模型的精度指标就可以进一步提升[5],对于82.99%的ResNet50_vd在320x320的尺度下测试,精度可达83.7%,我们进一步使用Fix策略,即在320x320的尺度下进行训练,使用与预测时相同的数据预处理方法,同时固定除FC层以外的所有参数,最终在320x320的预测尺度下,精度可以达到**84.0%**
### 3.4 实验过程中的一些问题
### 3.5 实验过程中的一些问题
* 在预测过程中,batch norm的平均值与方差是通过加载预训练模型得到(设其模式为test mode)。在训练过程中,batch norm是通过统计当前batch的信息(设其模式为train mode),与历史保存信息进行滑动平均计算得到,在蒸馏任务中,我们发现通过train mode,即教师模型的bn实时变化的模式,去指导学生模型,比通过test mode蒸馏,得到的学生模型性能更好一些,下面是一组实验结果。因此我们在该蒸馏方案中,均使用train mode去得到教师模型的soft label。
......
......@@ -6,7 +6,7 @@
- 支持在`backbone`的feature输出层后,添加可配置的网络层,即`Neck`部分
- 支持`ArcFace Loss``metric learning` 相关loss函数,提升特征学习能力
## 整体流程
## 1 整体流程
![](../../images/recognition/rec_pipeline.png)
......@@ -14,6 +14,6 @@
针对不同的应用,可以根据需要,对每一部分自由选择。每一部分的具体配置,如数据增强、Backbone、Neck、Metric Learning相关Loss等设置,详见具体应用:[车辆识别](./vehicle_recognition.md)[Logo识别](./logo_recognition.md)[动漫人物识别](./cartoon_character_recognition.md)[商品识别](./product_recognition.md)
## 配置文件说明
## 2 配置文件说明
配置文件说明详见[yaml配置文件说明文档](../tutorials/config.md)。其中模型结构配置,详见文档中**识别模型结构配置**部分。
......@@ -46,8 +46,6 @@ ReID,也就是 Re-identification,其定义是利用算法,在图像库中
### 2.1 车辆ReID
<img src="../../images/recognition/vehicle/cars.JPG" style="zoom:50%;" />
此方法在VERI-Wild数据集上进行了实验。此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
......
......@@ -82,11 +82,11 @@ python3 tools/train.py \
-o Global.device=gpu
```
其中配置文件不需要做任何修改,只需要在继续训练时设置`checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
其中配置文件不需要做任何修改,只需要在继续训练时设置`Global.checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
**注意**
* `-o Global.checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`Global.checkpoints`参数只需设置为`"../output/MobileNetV3_large_x1_0/epoch_5"`,PaddleClas会自动补充后缀名。
* `-o Global.checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`Global.checkpoints`参数只需设置为`"../output/MobileNetV3_large_x1_0/epoch_5"`,PaddleClas会自动补充后缀名。output目录下的文件结构如下所示:
```shell
output
......@@ -117,7 +117,7 @@ python3 tools/eval.py \
可配置的部分评估参数说明如下:
* `Arch.name`:模型名称
* `Global.pretrained_model`:待评估的模型文件路径
* `Global.pretrained_model`:待评估的模型预训练模型文件路径
**注意:** 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas会自动补齐`.pdparams`的后缀,如[1.3 模型恢复训练](#1.3)
......@@ -175,11 +175,10 @@ python3 -m paddle.distributed.launch \
tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Optimizer.lr.last_epoch=5 \
-o Global.device=gpu
```
其中配置文件不需要做任何修改,只需要在训练时设置`Global.checkpoints`参数`Optimizer.lr.last_epoch`参数即可,该参数表示加载的断点权重文件路径,使用该参数会同时加载保存的模型参数权重和学习率、优化器等信息,详见[1.3 模型恢复训练](#1.3)
其中配置文件不需要做任何修改,只需要在训练时设置`Global.checkpoints`参数即可,该参数表示加载的断点权重文件路径,使用该参数会同时加载保存的模型参数权重和学习率、优化器等信息,详见[1.3 模型恢复训练](#1.3)
### 2.4 模型评估
......@@ -230,11 +229,6 @@ python3 tools/export_model.py \
其中,`Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[1.3 模型恢复训练](#1.3))。
**注意**
1. `--output_path`表示输出的inference模型文件夹路径,若`--output_path=./inference`,则会在`inference`文件夹下生成`inference.pdiparams``inference.pdmodel``inference.pdiparams.info`文件。
2. 可以通过设置参数`--img_size`指定模型输入图像的`shape`,默认为`224`,表示图像尺寸为`224*224`,请根据实际情况修改;如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸,需要设置参数`img_size=384`
上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理:
进入deploy目录下:
......
......@@ -33,10 +33,10 @@ cd /home/Projects
# 创建一个名字为ppcls的docker容器,并将当前目录映射到容器的/paddle目录下
如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker,设置docker容器共享内存shm-size为8G,建议设置8G以上
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0 /bin/bash
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0 /bin/bash
如果希望使用GPU版本的容器,请运行以下命令创建容器。
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
```
......
# 30分钟玩转PaddleClas(进阶版)
此处提供了专业用户在linux操作系统上使用PaddleClas的快速上手教程,主要内容包括基于CIFAR-100数据集和NUS-WIDE-SCENE数据集,快速体验不同模型的单标签训练及多标签训练、加载不同预训练模型、SSLD知识蒸馏方案和数据增广的效果。请事先参考[安装指南](install.md)配置运行环境和克隆PaddleClas代码。
此处提供了专业用户在linux操作系统上使用PaddleClas的快速上手教程,主要内容基于CIFAR-100数据集,快速体验不同模型的训练、加载不同预训练模型、SSLD知识蒸馏方案和数据增广的效果。请事先参考[安装指南](install.md)配置运行环境和克隆PaddleClas代码。
## 一、数据和模型准备
......@@ -125,7 +125,7 @@ python3 -m paddle.distributed.launch \
## 四、知识蒸馏
PaddleClas包含了自研的SSLD知识蒸馏方案,具体的内容可以参考[知识蒸馏章节](../advanced_tutorials/distillation/distillation.md)本小节将尝试使用知识蒸馏技术对MobileNetV3_large_x1_0模型进行训练,使用`2.1.2小节`训练得到的ResNet50_vd模型作为蒸馏所用的教师模型,首先将`2.1.2小节`训练得到的ResNet50_vd模型保存到指定目录,脚本如下。
PaddleClas包含了自研的SSLD知识蒸馏方案,具体的内容可以参考[知识蒸馏章节](../advanced_tutorials/distillation/distillation.md), 本小节将尝试使用知识蒸馏技术对MobileNetV3_large_x1_0模型进行训练,使用`2.1.2小节`训练得到的ResNet50_vd模型作为蒸馏所用的教师模型,首先将`2.1.2小节`训练得到的ResNet50_vd模型保存到指定目录,脚本如下。
```shell
mkdir pretrained
......
......@@ -43,7 +43,18 @@
| 商品识别模型 | 商品场景 | [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar) | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
**注意**:windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下;linux或者macOS用户可以右键点击,然后复制下载链接,即可通过`wget`命令下载。
**注意**
1. windows 环境下如果没有安装wget,可以按照下面的步骤安装wget与tar命令,也可以在,下载模型时将链接复制到浏览器中下载,并解压放置在相应目录下;linux或者macOS用户可以右键点击,然后复制下载链接,即可通过`wget`命令下载。
2. 如果macOS环境下没有安装`wget`命令,可以运行下面的命令进行安装。
```shell
# 安装 homebrew
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
# 安装wget
brew install wget
```
3. 如果希望在windows环境下安装wget,可以参考:[链接](https://www.cnblogs.com/jeshy/p/10518062.html);如果希望在windows环境中安装tar命令,可以参考:[链接](https://www.cnblogs.com/chooperman/p/14190107.html)
* 可以按照下面的命令下载并解压数据与模型
......@@ -69,6 +80,12 @@ cd ..
以商品识别为例,下载通用检测、识别模型以及商品识别demo数据,命令如下。
```shell
mkdir dataset
cd dataset
# 下载demo数据并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
cd ..
mkdir models
cd models
# 下载通用检测inference模型并解压
......@@ -76,11 +93,6 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/infere
# 下载识别inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd ..
mkdir dataset
cd dataset
# 下载demo数据并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar && tar -xf product_demo_data_v1.0.tar
cd ..
```
......
......@@ -84,7 +84,7 @@ DataLoader:
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
use_shared_memory: True
Eval:
Query:
......
......@@ -92,7 +92,7 @@ DataLoader:
loader:
num_workers: 6
use_shared_memory: False
use_shared_memory: True
Eval:
Query:
dataset:
......
......@@ -66,7 +66,7 @@ DataLoader:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
......
......@@ -69,7 +69,7 @@ Optimizer:
DataLoader:
Train:
dataset:
name: ImageNetDataset
name: VeriWild
image_root: ./dataset/Stanford_Online_Products/
cls_label_path: ./dataset/Stanford_Online_Products/train_list.txt
transform_ops:
......@@ -104,7 +104,7 @@ DataLoader:
Eval:
Query:
dataset:
name: ImageNetDataset
name: VeriWild
image_root: ./dataset/Stanford_Online_Products/
cls_label_path: ./dataset/Stanford_Online_Products/test_list.txt
transform_ops:
......@@ -129,7 +129,7 @@ DataLoader:
Gallery:
dataset:
name: ImageNetDataset
name: VeriWild
image_root: ./dataset/Stanford_Online_Products/
cls_label_path: ./dataset/Stanford_Online_Products/test_list.txt
transform_ops:
......
......@@ -100,7 +100,7 @@ DataLoader:
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
use_shared_memory: True
Eval:
Query:
dataset:
......@@ -125,7 +125,7 @@ DataLoader:
shuffle: False
loader:
num_workers: 6
use_shared_memory: False
use_shared_memory: True
Gallery:
dataset:
......@@ -150,7 +150,7 @@ DataLoader:
shuffle: False
loader:
num_workers: 6
use_shared_memory: False
use_shared_memory: True
Metric:
Eval:
......
Global:
checkpoints: null
pretrained_model: null
output_dir: "./output/"
device: "gpu"
class_num: 102
save_interval: 1
eval_mode: "retrieval"
eval_during_train: True
eval_interval: 1
epochs: 20
print_batch_step: 10
use_visualdl: False
image_shape: [3, 224, 224]
#inference related
save_inference_dir: "./inference"
Arch:
name: "RecModel"
infer_output_key: "features"
infer_add_softmax: "false"
Backbone:
name: "ResNet50_vd"
pretrained: False
BackboneStopLayer:
name: "flatten_0"
output_dim: 2048
Head:
name: "FC"
class_num: 102
embedding_size: 2048
Loss:
Train:
- CELoss:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: "./dataset/flowers102/"
cls_label_path: "./dataset/flowers102/train_list.txt"
transform_ops:
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 256
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
Eval:
Query:
dataset:
name: ImageNetDataset
image_root: "./dataset/flowers102/"
cls_label_path: "./dataset/flowers102/val_list.txt"
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: False
loader:
num_workers: 6
use_shared_memory: True
Gallery:
dataset:
name: ImageNetDataset
image_root: "./dataset/flowers102/"
cls_label_path: "./dataset/flowers102/train_list.txt"
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: False
loader:
num_workers: 6
use_shared_memory: True
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- Recallk:
topk: [1, 10]
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册