diff --git a/fluid/PaddleCV/metric_learning/README.md b/fluid/PaddleCV/metric_learning/README.md index c961bf4842727dab8e808ef016d10efa219bf2f6..71ecb5cf1cc10506abf2ce8c225a57b630b56d1a 100644 --- a/fluid/PaddleCV/metric_learning/README.md +++ b/fluid/PaddleCV/metric_learning/README.md @@ -1,15 +1,15 @@ # Deep Metric Learning -Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-a-model), [finetuning](#finetuning), [evaluation](#evaluation) and [inference](#inference). +Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-metric-learning-models), [finetuning](#finetuning), [evaluation](#evaluation), [inference](#inference) and [Performances](#performances). --- ## Table of Contents - [Installation](#installation) - [Data preparation](#data-preparation) -- [Training metric learning models](#training-a-model) +- [Training metric learning models](#training-metric-learning-models) - [Finetuning](#finetuning) - [Evaluation](#evaluation) - [Inference](#inference) -- [Performances](#supported-models) +- [Performances](#performances) ## Installation @@ -17,7 +17,7 @@ Running sample code in this directory requires PaddelPaddle Fluid v0.14.0 and la ## Data preparation -Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,5511 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as: +Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,551 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as: ``` cd data/ sh download_sop.sh @@ -25,7 +25,7 @@ sh download_sop.sh ## Training metric learning models -To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or [arcmargin](https://arxiv.org/abs/1801.07698) loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, [quadruplet](https://arxiv.org/abs/1710.00478) and [eml](https://arxiv.org/abs/1212.6094) loss. One example of training using arcmargin loss is shown below: +To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or arcmargin loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, quadruplet and eml loss. One example of training using arcmargin loss is shown below: ``` @@ -52,7 +52,7 @@ python train_elem.py \ * **use_gpu**: whether to use GPU or not. Default: True. * **pretrained_model**: model path for pretraining. Default: None. * **model_save_dir**: the directory to save trained model. Default: "output". -* **loss_name**: loss fortraining model. Default: "softmax". +* **loss_name**: loss for training model. Default: "softmax". * **arc_scale**: parameter of arcmargin loss. Default: 80.0. * **arc_margin**: parameter of arcmargin loss. Default: 0.15. * **arc_easy_margin**: parameter of arcmargin loss. Default: False. @@ -103,3 +103,9 @@ For comparation, many metric learning models with different neural networks and |fine-tuned with triplet | 78.37% | 79.21% |fine-tuned with quadruplet | 78.10% | 79.59% |fine-tuned with eml | 79.32% | 80.11% + +## Reference + +- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [link](https://arxiv.org/abs/1801.07698) +- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [link](https://arxiv.org/abs/1710.00478) +- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [link](https://arxiv.org/abs/1212.6094) diff --git a/fluid/PaddleCV/metric_learning/README_cn.md b/fluid/PaddleCV/metric_learning/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..c155200c64e8e21549a6d642f89d95fdcb0acd11 --- /dev/null +++ b/fluid/PaddleCV/metric_learning/README_cn.md @@ -0,0 +1,111 @@ +# 深度度量学习 +度量学习是一种为样本对学习具有区分性特征的方法,目的是在特征空间中,让同一个类别的样本具有较小的特征距离,不同类的样本具有较大的特征距离。随着深度学习技术的发展,基于深度神经网络的度量学习方法已经在许多视觉任务上提升了很大的性能,例如:人脸识别、人脸校验、行人重识别和图像检索等等。在本章节,介绍在PaddlePaddle Fluid里实现的几种度量学习方法和使用方法,具体包括[数据准备](#数据准备),[模型训练](#模型训练),[模型微调](#模型微调),[模型评估](#模型评估),[模型预测](#模型预测)。 + +--- +## 简介 +- [安装](#安装) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型微调](#模型微调) +- [模型评估](#模型评估) +- [模型预测](#模型预测) +- [模型性能](#模型性能) + +## 安装 + +运行本章节代码需要在PaddlePaddle Fluid v0.14.0 或更高的版本环境。如果你的设备上的PaddlePaddle版本低于v0.14.0,请按照此[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)进行安装和跟新。 + +## 数据准备 + +Stanford Online Product(SOP) 数据集下载自eBay,包含120053张商品图片,有22634个类别。我们使用该数据集进行实验。训练时,使用59551张图片,11318个类别的数据;测试时,使用60502张图片,11316个类别。首先,SOP数据集可以使用以下脚本下载: +``` +cd data/ +sh download_sop.sh +``` + +## 模型训练 + +为了训练度量学习模型,我们需要一个神经网络模型作为骨架模型(如ResNet50)和度量学习代价函数来进行优化。我们首先使用 softmax 或者 arcmargin 来进行训练,然后使用其它的代价函数来进行微调,例如:triplet,quadruplet和eml。下面是一个使用arcmargin训练的例子: + + +``` +python train_elem.py \ + --model=ResNet50 \ + --train_batch_size=256 \ + --test_batch_size=50 \ + --lr=0.01 \ + --total_iter_num=30000 \ + --use_gpu=True \ + --pretrained_model=${path_to_pretrain_imagenet_model} \ + --model_save_dir=${output_model_path} \ + --loss_name=arcmargin \ + --arc_scale=80.0 \ + --arc_margin=0.15 \ + --arc_easy_margin=False +``` +**参数介绍:** +* **model**: 使用的模型名字. 默认: "ResNet50". +* **train_batch_size**: 训练的 mini-batch大小. 默认: 256. +* **test_batch_size**: 测试的 mini-batch大小. 默认: 50. +* **lr**: 初始学习率. 默认: 0.01. +* **total_iter_num**: 总的训练迭代轮数. 默认: 30000. +* **use_gpu**: 是否使用GPU. 默认: True. +* **pretrained_model**: 预训练模型的路径. 默认: None. +* **model_save_dir**: 保存模型的路径. 默认: "output". +* **loss_name**: 优化的代价函数. 默认: "softmax". +* **arc_scale**: arcmargin的参数. 默认: 80.0. +* **arc_margin**: arcmargin的参数. 默认: 0.15. +* **arc_easy_margin**: arcmargin的参数. 默认: False. + +## 模型微调 + +网络微调是在指定的任务上加载已有的模型来微调网络。在用softmax和arcmargin训完网络后,可以继续使用triplet,quadruplet或eml来微调网络。下面是一个使用eml来微调网络的例子: + +``` +python train_pair.py \ + --model=ResNet50 \ + --train_batch_size=160 \ + --test_batch_size=50 \ + --lr=0.0001 \ + --total_iter_num=100000 \ + --use_gpu=True \ + --pretrained_model=${path_to_pretrain_arcmargin_model} \ + --model_save_dir=${output_model_path} \ + --loss_name=eml \ + --samples_each_class=2 +``` + +## 模型评估 +模型评估主要是评估模型的检索性能。这里需要设置```path_to_pretrain_model```。可以使用下面命令来计算Recall@Rank-1。 +``` +python eval.py \ + --model=ResNet50 \ + --batch_size=50 \ + --pretrained_model=${path_to_pretrain_model} \ +``` + +## 模型预测 +模型预测主要是基于训练好的网络来获取图像数据的特征,下面是模型预测的例子: +``` +python infer.py \ + --model=ResNet50 \ + --batch_size=1 \ + --pretrained_model=${path_to_pretrain_model} +``` + +## 模型性能 + +下面列举了几种度量学习的代价函数在SOP数据集上的检索效果,这里使用Recall@Rank-1来进行评估。 + +|预训练模型 | softmax | arcmargin +|- | - | -: +|未微调 | 77.42% | 78.11% +|使用triplet微调 | 78.37% | 79.21% +|使用quadruplet微调 | 78.10% | 79.59% +|使用eml微调 | 79.32% | 80.11% + +## 引用 + +- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [链接](https://arxiv.org/abs/1801.07698) +- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [链接](https://arxiv.org/abs/1710.00478) +- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [链接](https://arxiv.org/abs/1212.6094) diff --git a/fluid/PaddleCV/metric_learning/reader.py b/fluid/PaddleCV/metric_learning/reader.py index 9c5aaf396d7b535bc4729a31834e2ef0f8151a28..ac8f257ecbadbb08454ffc616b1c587455cc92b6 100644 --- a/fluid/PaddleCV/metric_learning/reader.py +++ b/fluid/PaddleCV/metric_learning/reader.py @@ -63,6 +63,7 @@ def common_iterator(data, settings): assert (batch_size % samples_each_class == 0) class_num = batch_size // samples_each_class def train_iterator(): + count = 0 labs = list(data.keys()) lab_num = len(labs) ind = list(range(0, lab_num)) @@ -79,6 +80,9 @@ def common_iterator(data, settings): for anchor_ind_i in anchor_ind: anchor_path = DATA_DIR + data_list[anchor_ind_i] yield anchor_path, lab + count += 1 + if count >= settings.total_iter_num + 1: + return return train_iterator @@ -86,6 +90,8 @@ def triplet_iterator(data, settings): batch_size = settings.train_batch_size assert (batch_size % 3 == 0) def train_iterator(): + total_count = settings.train_batch_size * (settings.total_iter_num + 1) + count = 0 labs = list(data.keys()) lab_num = len(labs) ind = list(range(0, lab_num)) @@ -108,16 +114,24 @@ def triplet_iterator(data, settings): yield pos_path, lab_pos neg_path = DATA_DIR + neg_data_list[neg_ind] yield neg_path, lab_neg + count += 3 + if count >= total_count: + return return train_iterator def arcmargin_iterator(data, settings): def train_iterator(): + total_count = settings.train_batch_size * (settings.total_iter_num + 1) + count = 0 while True: for items in data: path, label = items path = DATA_DIR + path yield path, label + count += 1 + if count >= total_count: + return return train_iterator def image_iterator(data, mode):