README.md 5.5 KB
Newer Older
1
# Deep Metric Learning
K
kbChen 已提交
2
Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-metric-learning-models), [finetuning](#finetuning), [evaluation](#evaluation), [inference](#inference) and [Performances](#performances).
3 4 5 6 7

---
## Table of Contents
- [Installation](#installation)
- [Data preparation](#data-preparation)
K
kbChen 已提交
8
- [Training metric learning models](#training-metric-learning-models)
9 10 11
- [Finetuning](#finetuning)
- [Evaluation](#evaluation)
- [Inference](#inference)
K
kbChen 已提交
12
- [Performances](#performances)
13 14 15 16 17 18 19

## Installation

Running sample code in this directory requires PaddelPaddle Fluid v0.14.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.

## Data preparation

K
kbChen 已提交
20
Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,551 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as:
21 22
```
cd data/
K
kbChen 已提交
23
sh download_sop.sh
24 25 26 27
```

## Training metric learning models

K
kbChen 已提交
28
To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or arcmargin loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, quadruplet and eml loss. One example of training using arcmargin loss is shown below:
K
kbChen 已提交
29

30 31

```
K
kbChen 已提交
32
python train_elem.py  \
33
        --model=ResNet50 \
K
kbChen 已提交
34 35 36 37
        --train_batch_size=256 \
        --test_batch_size=50 \
        --lr=0.01 \
        --total_iter_num=30000 \
38
        --use_gpu=True \
K
kbChen 已提交
39 40 41 42 43 44
        --pretrained_model=${path_to_pretrain_imagenet_model} \
        --model_save_dir=${output_model_path} \
        --loss_name=arcmargin \
        --arc_scale=80.0 \ 
        --arc_margin=0.15 \
        --arc_easy_margin=False
45 46
```
**parameter introduction:**
K
kbChen 已提交
47 48 49 50 51
* **model**: name model to use. Default: "ResNet50".
* **train_batch_size**: the size of each training mini-batch. Default: 256.
* **test_batch_size**: the size of each testing mini-batch. Default: 50.
* **lr**: initialized learning rate. Default: 0.01.
* **total_iter_num**: total number of training iterations. Default: 30000.
52 53
* **use_gpu**: whether to use GPU or not. Default: True.
* **pretrained_model**: model path for pretraining. Default: None.
K
kbChen 已提交
54
* **model_save_dir**: the directory to save trained model. Default: "output".
K
kbChen 已提交
55
* **loss_name**: loss for training model. Default: "softmax".
K
kbChen 已提交
56 57 58
* **arc_scale**: parameter of arcmargin loss. Default: 80.0.
* **arc_margin**: parameter of arcmargin loss. Default: 0.15.
* **arc_easy_margin**: parameter of arcmargin loss. Default: False.
59 60 61

## Finetuning

K
kbChen 已提交
62 63
Finetuning is to finetune model weights in a specific task by loading pretrained weights. After training model using softmax or arcmargin loss, one can finetune the model using triplet, quadruplet or eml loss. One example of fine-turned using eml loss is shown below:

64
```
K
kbChen 已提交
65
python train_pair.py  \
66
        --model=ResNet50 \
K
kbChen 已提交
67 68 69 70
        --train_batch_size=160 \
        --test_batch_size=50 \
        --lr=0.0001 \
        --total_iter_num=100000 \
71
        --use_gpu=True \
K
kbChen 已提交
72 73 74 75
        --pretrained_model=${path_to_pretrain_arcmargin_model} \
        --model_save_dir=${output_model_path} \
        --loss_name=eml \
        --samples_each_class=2
76 77 78 79 80 81 82
```

## Evaluation
Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models) and set its path to ```path_to_pretrain_model```. Then Recall@Rank-1 can be obtained by running the following command:
```
python eval.py \
       --model=ResNet50 \
K
kbChen 已提交
83
       --batch_size=50 \
84 85 86 87 88 89
       --pretrained_model=${path_to_pretrain_model} \
```

## Inference
Inference is used to get prediction score or image features based on trained models.
```
K
kbChen 已提交
90 91 92 93
python infer.py \
       --model=ResNet50 \
       --batch_size=1 \         
       --pretrained_model=${path_to_pretrain_model}
94 95 96 97 98 99
```

## Performances

For comparation, many metric learning models with different neural networks and loss functions are trained using corresponding experiential parameters. Recall@Rank-1 is used as evaluation metric and the performance is listed in the table. Pretrained models can be downloaded by clicking related model names.

K
kbChen 已提交
100
|pretrain model | softmax | arcmargin
101
|- | - | -:
K
kbChen 已提交
102 103 104 105
|without fine-tuned | 77.42% | 78.11%
|fine-tuned with triplet | 78.37% | 79.21%
|fine-tuned with quadruplet | 78.10% | 79.59%
|fine-tuned with eml | 79.32% | 80.11%
K
kbChen 已提交
106 107 108 109 110 111

## Reference

- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [link](https://arxiv.org/abs/1801.07698)
- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [link](https://arxiv.org/abs/1710.00478)
- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [link](https://arxiv.org/abs/1212.6094)