未验证 提交 d69a6e8f 编写于 作者: B Bin Lu 提交者: GitHub

Merge pull request #1534 from Intsigstephon/add_en_doc

add some en doc
# Metric Learning
## Catalogue
- [1.Introduction](#1)
- [2.Applications](#2)
- [3.Algorithms](#3)
- [3.1 Classification based](#3.1)
- [3.2 Pairwise based](#3.2)
<a name="1"></a>
## 1.Introduction
Measuring the distance between data is a common practice in machine learning. Generally speaking, Euclidean Distance, Inner Product, or Cosine Similarity are all available to calculate measurable data. However, the same operation can hardly be replicated on unstructured data, such as calculating the compatibility between a video and a piece of music. Despite the difficulty in performing the aforementioned vector operation directly due to varied data formats, priori knowledge tells that ED(laugh_video, laugh_music) < ED(laugh_video, blue_music). And how to effectively characterize this "distance"? This is exactly the focus of Metric Learning.
Metric learning, known as Distance Metric Learning, is to automatically construct a task-specific metric function based on training data in the form of machine learning. As shown in the figure below, the goal of Metric learning is to learn a transformation function (either linear or nonlinear) L that maps data points from the original vector space to a new one in which similar points are closer together and non-similar points are further apart, making the metric more task-appropriate. And Deep Metric Learning fits the transformation function by adopting a deep neural network. ![example](../../images/ml_illustration.jpg)
<a name="2"></a>
## 2.Applications
Metric Learning technologies are widely applied in real life, such as Face Recognition, Person ReID, Image Retrieval, Fine-grained classification, etc. With the growing prevalence of deep learning in industrial practice, Deep Metric Learning (DML) emerges as the current research direction.
Normally, DML consists of three parts: a feature extraction network for map embedding, a sampling strategy to combine samples in a mini-batch into multiple sub-sets, and a loss function to compute the loss on each sub-set. Please refer to the figure below: ![image](../../images/ml_pipeline.jpg)
<a name="3"></a>
## 3.Algorithms
Two learning paradigms are adopted in Metric Learning:
<a name="3.1"></a>
### 3.1 Classification based:
This refers to methods based on classification labels. They learn the effective feature representation by classifying each sample into the correct category and require the participation of the explicit labels of each sample in the Loss calculation during the learning process. Common algorithms include [L2-Softmax](https://arxiv.org/abs/1703.09507), [Large-margin Softmax](https://arxiv.org/abs/1612.02295), [Angular Softmax]( https://arxiv.org/pdf/1704.08063.pdf), [NormFace](https://arxiv.org/abs/1704.06369), [AM-Softmax](https://arxiv.org/abs/1801.05599), [CosFace](https://arxiv.org/abs/1801.09414), [ArcFace](https://arxiv.org/abs/1801.07698), etc. These methods are also called proxy-based, because what they optimize is essentially the similarity between a sample and a set of proxies.
<a name="3.2"></a>
### 3.2 Pairwise based:
This refers to the learning paradigm based on paired samples. It takes sample pairs as input and obtains an effective feature representation by directly learning the similarity between these pairs. Common algorithms include [Contrastive loss](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf), [ Triplet loss](https://arxiv.org/abs/1503.03832), [Lifted-Structure loss](https://arxiv.org/abs/1511.06452), [N-pair loss](https://), [Multi-Similarity loss](https://arxiv.org/pdf/1904.06627.pdf), etc.
[CircleLoss](https://arxiv.org/abs/2002.10857), released in 2020, unifies the two learning paradigms from a fresh perspective, prompting researchers and practitioners' further reflection on Metric Learning.
# Image Classification Datasets
This document elaborates on the dataset format adopted by PaddleClas for image classification tasks, as well as other common datasets in this field.
------
## Catalogue
- [1.Dataset Format](#1)
- [2.Common Datasets for Image Classification](#2)
- [2.1 ImageNet1k](#2.1)
- [2.2 Flowers102](#2.2)
- [2.3 CIFAR10 / CIFAR100](#2.3)
- [2.4 MNIST](#2.4)
- [2.5 NUS-WIDE](#2.5)
<a name="1"></a>
## 1.Dataset Format
PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats:
```
# Separate the image path and annotation with "space" for each line
# train_list.txt has the following format
train/n01440764/n01440764_10026.JPEG 0
...
# val_list.txt has the following format
val/ILSVRC2012_val_00000001.JPEG 65
...
```
<a name="2"></a>
## 2.Common Datasets for Image Classification
Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 ImageNet1k
[ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained.
| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note |
| ------------------------------------------------------------ | -------------------- | ---------------- | ------------------ | ---- |
| [ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/) | 1.2M | 50k | 1000 | |
After downloading the data from official sources, organize it in the following format to train with the ImageNet1k dataset in PaddleClas.
```
PaddleClas/dataset/ILSVRC2012/
|_ train/
| |_ n01440764
| | |_ n01440764_10026.JPEG
| | |_ ...
| |_ ...
| |
| |_ n15075141
| |_ ...
| |_ n15075141_9993.JPEG
|_ val/
| |_ ILSVRC2012_val_00000001.JPEG
| |_ ...
| |_ ILSVRC2012_val_00050000.JPEG
|_ train_list.txt
|_ val_list.txt
```
<a name="2.2"></a>
### 2.2 Flowers102
| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note |
| ------------------------------------------------------------ | -------------------- | ---------------- | ------------------ | ---- |
| [flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | 1k | 6k | 102 | |
Unzip the downloaded data to see the following directory.
```
jpg/
setid.mat
imagelabels.mat
```
Place the files above under `PaddleClas/dataset/flowers102/` .
Run `generate_flowers102_list.py` to generate `train_list.txt` and `val_list.txt`:
```
python generate_flowers102_list.py jpg train > train_list.txt
python generate_flowers102_list.py jpg valid > val_list.txt
```
Structure the data as follows:
```
PaddleClas/dataset/flowers102/
|_ jpg/
| |_ image_03601.jpg
| |_ ...
| |_ image_02355.jpg
|_ train_list.txt
|_ val_list.txt
```
<a name="2.3"></a>
### 2.3 CIFAR10 / CIFAR100
The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set.
Website:http://www.cs.toronto.edu/~kriz/cifar.html
<a name="2.4"></a>
### 2.4 MNIST
MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28.
Website:http://yann.lecun.com/exdb/mnist/
<a name="2.5"></a>
### 2.5 NUS-WIDE
NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories.
Website:https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html
# Image Recognition Datasets
This document elaborates on the dataset format adopted by PaddleClas for image recognition tasks, as well as other common datasets in this field.
------
## Catalogue
- [1.Dataset Format](#1)
- [2.Common Datasets for Image Recognition](#2)
- [2.1 General Datasets](#2.1)
- [2.2 Vertical Class Datasets](#2.2)
- [2.2.1 Animation Character Recognition](#2.2.1)
- [2.2.2 Product Recognition](#2.2.2)
- [2.2.3 Logo Recognition](#2.2.3)
- [2.2.4 Vehicle Recognition](#2.2.4)
<a name="1"></a>
## 1.Dataset Format
The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts:
- Train dataset: Used to train the model to learn the image features involved.
- Gallery dataset: Used to provide the gallery data in the vector search task. It can either be the same as the train or query datasets or different, and when it is the same as the train dataset, the category system of the query dataset and train dataset should be the same.
- Query dataset: Used to test the performance of the model. It usually extracts features from each query image of the dataset, followed by distance matching with those in the gallery dataset to get the recognition results, based on which the metrics of the whole query dataset are calculated.
The above three datasets all adopt `txt` files for assignment. Taking the `CUB_200_2011` dataset as an example, the `train_list.txt` of the train dataset has the following format:
```
# Use "space" as the separator
...
train/99/Ovenbird_0136_92859.jpg 99 2
...
train/99/Ovenbird_0128_93366.jpg 99 6
...
```
The `test_list.txt` of the query dataset (both gallery dataset and query dataset in`CUB_200_2011`) has the following format:
```
# Use "space" as the separator
...
test/200/Common_Yellowthroat_0126_190407.jpg 200 1
...
test/200/Common_Yellowthroat_0114_190501.jpg 200 6
...
```
Each row of data is separated by "space", and the three columns of data stand for the path, label information, and unique id of training data.
**Note**
1. When the gallery dataset and query dataset are the same, to remove the first retrieved data (the images themselves require no evaluation), each data should have its unique id (ensuring that each image has a different id, which can be represented by the row number) for subsequent evaluation of mAP, recall@1, and other metrics. The dataset of yaml configuration file is `VeriWild`.
2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`.
<a name="2"></a>
## 2.Common Datasets for Image Recognition
Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 General Datasets
- SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set.
Website: https://cvgl.stanford.edu/projects/lifted_struct/
- Cars196: The Cars dataset contains 16,185 images of 196 categories of cars. The data is classified into 8144 training images and 8041 query images, with each category split roughly in a 50-50 ratio. The classification is normally based on the manufacturing, model and year of the car, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe.
Website: https://ai.stanford.edu/~jkrause/cars/car_dataset.html
- CUB_200_2011: The CUB_200_2011 dataset is a fine-grained dataset proposed by the California Institute of Technology (Caltech) in 2010 and is currently the benchmark image dataset for fine-grained classification recognition research. There are 11788 bird images in this dataset with 200 subclasses, including 5994 images in the train dataset and 5794 images in the query dataset. Each image provides label information, the bounding box of the bird, the key part information of the bird, and the attribute of the bird. The dataset is shown in the figure below.
- In-shop Clothes: In-shop Clothes is one of the 4 subsets of the DeepFashion dataset. It is a seller show image dataset with multi-angle images of each product id being collected in the same folder. The dataset contains 7982 items with 52712 images, each with 463 attributes, Bbox, landmarks, and store descriptions.
Website: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
<a name="2.2"></a>
### 2.2 Vertical Class Datasets
<a name="2.2.1"></a>
#### 2.2.1 Animation Character Recognition
- iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition.
- Website: http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d
- Manga109: Manga109 is a dataset released in May 2020 for the study of cartoon character detection and recognition, which contains 21142 images and is officially banned from commercial use. Manga109-s, a subset of this dataset, is available for industrial use, mainly for tasks such as text detection, sketch line-based search, and character image generation.
Website:http://www.manga109.org/en/
- IIT-CFW: The IIF-CFW dataset contains a total of 8928 labeled cartoon portraits of celebrity characters, covering 100 characters with varying numbers of portraits for each. In addition, it also provides 1000 real face photos (10 real portraits for 100 public figures). This dataset can be employed to study both animation character recognition and cross-modal search tasks.
Website: http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces
<a name="2.2.2"></a>
#### 2.2.2 Product Recognition
- AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images.
Website: https://retailvisionworkshop.github.io/recognition_challenge_2020/
- Product-10k: Products-10k dataset has all its images from Jingdong Mall, covering 10,000 frequently purchased SKUs that are organized into a hierarchy. In total, there are nearly 190,000 images. In the real application scenario, the distribution of image volume is uneven. All images are manually checked/labeled by a team of production experts.
Website:https://www.kaggle.com/c/products-10k/data?select=train.csv
- DeepFashion-Inshop: The same as the common datasets In-shop Clothes.
<a name="2.2.3"></a>
### 2.2.3 Logo Recognition
- Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images.
Website: https://github.com/msn199959/Logo-2k-plus-Dataset
- Tsinghua-Tencent 100K: This dataset is a large traffic sign benchmark dataset based on 100,000 Tencent Street View panoramas. 30,000 traffic sign instances included, it provides 100,000 images covering a wide range of illumination, and weather conditions. Each traffic sign in the benchmark test is labeled with the category, bounding box and pixel mask. A total of 222 categories (0 background + 221 traffic signs) are incorporated.
Website: https://cg.cs.tsinghua.edu.cn/traffic-sign/
<a name="2.2.4"></a>
### 2.2.4 Vehicle Recognition
- CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images.
Website: http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/
- BoxCars: The dataset contains a total of 21,250 vehicles, 63,750 images, 27 vehicle manufacturers, and 148 subcategories. All of them are derived from surveillance data.
Website: https://github.com/JakubSochor/BoxCars
- PKU-VD Dataset: The dataset contains two large vehicle datasets (VD1 and VD2) that capture images from real-world unrestricted scenes in two cities. VD1 is obtained from high-resolution traffic cameras, while images in VD2 are acquired from surveillance videos. The authors have performed vehicle detection on the raw data to ensure that each image contains only one vehicle. Due to privacy constraints, all the license numbers have been obscured with black overlays. All images are captured from the front view, and diverse attribute annotations are provided for each image in the dataset, including identification numbers, accurate vehicle models, and colors. VD1 originally contained 1097649 images, 1232 vehicle models, and 11 vehicle colors, and remains 846358 images and 141756 vehicles after removing images with multiple vehicles inside and those taken from the rear of the vehicle. VD2 contains 807260 images, 79763 vehicles, 1112 vehicle models, and 11 vehicle colors.
Website: https://pkuml.org/resources/pku-vds.html
# Feature Extraction
## Catalogue
- [1.Introduction](#1)
- [2.Network Structure](#2)
- [3.General Recognition Models](#3)
- [4.Customized Feature Extraction](#4)
- [4.1 Data Preparation](#4.1)
- [4.2 Model Training](#4.2)
- [4.3 Model Evaluation](#4.3)
- [4.4 Model Inference](#4.4)
<a name="1"></a>
## 1.Introduction
Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning.
<a name="2"></a>
## 2.Network Structure
In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure:
![img](../../images/feature_extraction_framework_en.png)
Functions of the above modules :
- **Backbone**: Specifies the backbone network to be used. It is worth noting that the ImageNet-based pre-training model provided by PaddleClas has an output of 1000 for the last layer, which demands for customization according to the required feature dimensions.
- **Neck**: Used for feature augmentation and feature dimension transformation. Here it can be a simple Linear Layer for feature dimension transformation, or a more complex FPN structure for feature augmentation.
- **Head**: Used to transform features into logits. In addition to the common Fc Layer, cosmargin, arcmargin, circlemargin and other modules are all available choices.
- **Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss.
<a name="3"></a>
## 3.General Recognition Models
In PP-Shitu, we have [PP_LCNet_x2_5](../models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](../../../ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in [General Recognition_configuration files](../.././ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets:
| Datasets | Data Size | Class Number | Scenarios | URL |
| ------------ | --------- | ------------ | ------------------ | ------------------------------------------------------------ |
| Aliproduct | 2498771 | 50030 | Commodities | [URL](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
| GLDv2 | 1580470 | 81313 | Landmarks | [URL](https://github.com/cvdfoundation/google-landmark) |
| VeRI-Wild | 277797 | 30671 | Vehicle | [URL](https://github.com/PKU-IMRE/VERI-Wild) |
| LogoDet-3K | 155427 | 3000 | Logo | [URL](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| iCartoonFace | 389678 | 5013 | Cartoon Characters | [URL](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
| SOP | 59551 | 11318 | Commodities | [URL](https://cvgl.stanford.edu/projects/lifted_struct/) |
| Inshop | 25882 | 3997 | Commodities | [URL](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
| **Total** | **5M** | **185K** | ---- | ---- |
The results are shown in the table below:
| Model | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP | Inshop | Latency(ms) |
| ------------- | ---------- | --------- | ---------- | ------------ | ----- | ------ | ----------- |
| PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 |
- Evaluation metric: `Recall@1`
- CPU of the speed evaluation machine: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.
- Evaluation conditions for the speed metric: MKLDNN enabled, number of threads set to 10
- Address of the pre-training model: [General recognition pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
<a name="4"></a>
# 4.Customized Feature Extraction
Customized feature extraction refers to retraining the feature extraction model based on one's own task. It consists of four main steps: 1) data preparation, 2) model training, 3) model evaluation, and 4) model inference.
<a name="4.1"></a>
## 4.1 Data Preparation
To start with, customize your dataset based on the task (See [Format description](../data_preparation/recognition_dataset_en.md#1) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below:
```
Head:
name: ArcMargin
embedding_size: 512
class_num: 185341 #Number of class
```
```
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ #The directory where the train dataset is located
cls_label_path: ./dataset/train_reg_all_data.txt #The address of label file for train dataset
```
```
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/. #The directory where the query dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for query dataset
```
```
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/ #The directory where the gallery dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for gallery dataset
```
<a name="4.2"></a>
## 4.2 Model Training
- Single machine single card training
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
- Single machine multi card training
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
**Note:** The configuration file adopts `online evaluation` by default, if you want to speed up the training and remove `online evaluation`, just add `-o eval_during_train=False` after the above command. After training, the final model files `latest`, `best_model` and the training log file `train.log` will be generated under the directory output. Among them, `best_model` is utilized to store the best model under the current evaluation metrics while`latest` is adopted to store the latest generated model, making it convenient to resume the training from where it was interrupted.
- Resumption of Training:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.checkpoint="output/RecModel/latest"
```
<a name="4.3"></a>
## 4.3 Model Evaluation
- Single Card Evaluation
```
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
- Multi Card Evaluation
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
**Recommendation:** It is suggested to employ multi-card evaluation, which can quickly obtain the feature set of the overall dataset using multi-card parallel computing, accelerating the evaluation process.
<a name="4.4"></a>
## 4.4 Model Inference
Two steps are included in the inference: 1)exporting the inference model; 2)obtaining the feature vector.
### 4.4.1 Export Inference Model
```
python tools/export_model \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
The generated inference models are under the directory `inference`, which comprises three files, namely, `inference.pdmodel``inference.pdiparams``inference.pdiparams.info`. Among them, `inference.pdmodel` serves to store the structure of inference model while `inference.pdiparams` and `inference.pdiparams.info` are mobilized to store model-related parameters.
### 4.4.2 Obtain Feature Vector
```
cd deploy
python python/predict_rec.py \
-c configs/inference_rec.yaml \
-o Global.rec_inference_model_dir="../inference"
```
The output format of the obtained features is shown in the figure below:![img](../../images/feature_extraction_output.png)
In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](./vector_search_en.md).
# Model Service Deployment
## Catalogue
- [1. Introduction](#1)
- [2. Installation of Serving](#2)
- [3. Service Deployment for Image Classification](#3)
- [3.1 Model Transformation](#3.1)
- [3.2 Service Deployment and Request](#3.2)
- [4. Service Deployment for Image Recognition](#4)
- [4.1 Model Transformation](#4.1)
- [4.2 Service Deployment and Request](#4.2)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) is designed to provide easy deployment of on-line prediction services for deep learning developers, it supports one-click deployment of industrial-grade services, highly concurrent and efficient communication between client and server, and multiple programming languages for client development.
This section, exemplified by HTTP deployment of prediction service, describes how to deploy model services in PaddleClas with PaddleServing. Currently, only deployment on Linux platform is supported. Windows platform is not supported.
<a name="2"></a>
## 2. Installation of Serving
It is officially recommended to use docker for the installation and environment deployment of Serving. First, pull the docker and create a Serving-based one.
```
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
```
Once you are in docker, install the Serving-related python packages.
```
pip3 install paddle-serving-client==0.7.0
pip3 install paddle-serving-server==0.7.0 # CPU
pip3 install paddle-serving-app==0.7.0
pip3 install paddle-serving-server-gpu==0.7.0.post102 #GPU with CUDA10.2 + TensorRT6
# For other GPU environemnt, confirm the environment before choosing which one to execute
pip3 install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
pip3 install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
```
- Speed up the installation process by replacing the source with `-i https://pypi.tuna.tsinghua.edu.cn/simple`.
- For other environment configuration and installation, please refer to [Install Paddle Serving using docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
- To deploy CPU services, please install the CPU version of serving-server with the following command.
```
pip install paddle-serving-server
```
<a name="3"></a>
## 3. Service Deployment for Image Classification
<a name="3.1"></a>
### 3.1 Model Transformation
When adopting PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part takes the classic ResNet50_vd model as an example to introduce the deployment of image classification service.
- Enter the working directory:
```
cd deploy/paddleserving
```
- Download the inference model of ResNet50_vd:
```
# Download and decompress the ResNet50_vd model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
```
- Convert the downloaded inference model into a format that is readily deployable by Server with the help of paddle_serving_client.
```
# Convert the ResNet50_vd model
python3 -m paddle_serving_client.convert --dirname ./ResNet50_vd_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ResNet50_vd_serving/ \
--serving_client ./ResNet50_vd_client/
```
After the transformation, `ResNet50_vd_serving` and `ResNet50_vd_client` will be added to the current folder in the following format:
```
|- ResNet50_vd_server/
|- __model__
|- __params__
|- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt
|- ResNet50_vd_client
|- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt
```
Having obtained the model file, modify the alias name in `serving_server_conf.prototxt` under directory `ResNet50_vd_server` by changing `alias_name` in `fetch_var` to `prediction`.
**Notes**: Serving supports input and output renaming to ensure its compatibility with the deployment of different models. In this case, modifying the alias_name of the configuration file is the only step needed to complete the inference and deployment of all kinds of models. The modified serving_server_conf.prototxt is shown below:
```
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
<a name="3.2"></a>
### 3.2 Service Deployment and Request
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # Configuration file for starting the service
pipeline_http_client.py # Script for sending pipeline prediction requests by http
pipeline_rpc_client.py # Script for sending pipeline prediction requests by rpc
classification_web_service.py # Script for starting the pipeline server
```
- Start the service:
```
# Start the service and the run log is saved in log.txt
python3 classification_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../../../deploy/paddleserving/imgs/start_server.png)
- Send request:
```
# Send service request
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example:![img](../../../deploy/paddleserving/imgs/results.png)
<a name="4"></a>
## 4. Service Deployment for Image Recognition
When using PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part, exemplified by the ultra-lightweight model for image recognition in PP-ShiTu, details the deployment of image recognition service.
<a name="4.1"></a>
## 4.1 Model Transformation
- Download inference models for general detection and general recognition
```
cd deploy
# Download and decompress general recogntion models
wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
cd models
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
# Download and decompress general detection models
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
```
- Convert the inference model for recognition into a Serving model:
```
# Convert the recognition model
python3 -m paddle_serving_client.convert --dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \
--serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/
```
After the transformation, `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_serving/` will be added to the current folder. Modify the alias name in serving_server_conf.prototxt under the directory `general_PPLCNet_x2_5_lite_v1.0_serving/` by changing `alias_name` to `features` in `fetch_var`. The modified serving_server_conf.prototxt is similar to the following:
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
- Convert the inference model for detection into a Serving model:
```
# Convert the general detection model
python3 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
```
After the transformation, `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_ mainbody_lite_v1.0_client/` will be added to the current folder.
**Note:** The alias name in the serving_server_conf.prototxt under the directory`picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` requires no modification.
- Download and decompress the constructed search library index
```
cd ../
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar
```
<a name="4.2"></a>
## 4.2 Service Deployment and Request
**Note:** Since the recognition service involves multiple models, PipeLine is adopted for better performance. This deployment method does not support the windows platform for now.
- Enter the working directory
```
cd ./deploy/paddleserving/recognition
```
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # Configuration file for starting the service
pipeline_http_client.py # Script for sending pipeline prediction requests by http
pipeline_rpc_client.py # Script for sending pipeline prediction requests by rpc
recognition_web_service.py # Script for starting the pipeline server
```
- Start the service:
```
# Start the service and the run log is saved in log.txt
python3 recognition_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../../../deploy/paddleserving/imgs/start_server_shitu.png)
- Send request:
```
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example: ![img](../../../deploy/paddleserving/imgs/results_shitu.png)
<a name="5"></a>
## 5.FAQ
**Q1**: After sending a request, no result is returned or the output is prompted with a decoding error.
**A1**: Please turn off the proxy before starting the service and sending requests, try the following command:
```
unset https_proxy
unset http_proxy
```
For more types of service deployment, such as `RPC prediction services`, you can refer to the [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.7.0/examples) of Serving.
......@@ -5,20 +5,20 @@
* [1. 简介](#1)
* [2. 应用](#2)
* [3. 算法](#3)
* [3.1 Classification based](#3.1)
* [3.2 Pairwise based](#3.2)
* [3.1 Classification based](#3.1)
* [3.2 Pairwise based](#3.2)
<a name='1'></a>
## 1. 简介
在机器学习中,我们经常会遇到度量数据间距离的问题。一般来说,对于可度量的数据,我们可以直接通过欧式距离(Euclidean Distance),向量内积(Inner Product)或者是余弦相似度(Cosine Similarity)来进行计算。但对于非结构化数据来说,我们却很难进行这样的操作,如计算一段视频和一首音乐的匹配程度。由于数据格式的不同,我们难以直接进行上述的向量运算,但先验知识告诉我们 ED(laugh_video, laugh_music) < ED(laugh_video, blue_music), 如何去有效得表征这种”距离”关系呢? 这就是 Metric Learning 所要研究的课题。
Metric learning 全称是 Distance Metric Learning,它是通过机器学习的形式,根据训练数据,自动构造出一种基于特定任务的度量函数。Metric Learning 的目标是学习一个变换函数(线性非线性均可)L,将数据点从原始的向量空间映射到一个新的向量空间,在新的向量空间里相似点的距离更近,非相似点的距离更远,使得度量更符合任务的要求,如下图所示。 Deep Metric Learning,就是用深度神经网络来拟合这个变换函数。
![example](../../images/ml_illustration.jpg)
<a name='2'></a>
## 2. 应用
Metric Learning 技术在生活实际中应用广泛,如我们耳熟能详的人脸识别(Face Recognition)、行人重识别(Person ReID)、图像检索(Image Retrieval)、细粒度分类(Fine-gained classification)等。随着深度学习在工业实践中越来越广泛的应用,目前大家研究的方向基本都偏向于 Deep Metric Learning(DML).
Metric Learning 技术在生活实际中应用广泛,如我们耳熟能详的人脸识别(Face Recognition)、行人重识别(Person ReID)、图像检索(Image Retrieval)、细粒度分类(Fine-grained classification)等。随着深度学习在工业实践中越来越广泛的应用,目前大家研究的方向基本都偏向于 Deep Metric Learning(DML).
一般来说, DML 包含三个部分: 特征提取网络来 map embedding, 一个采样策略来将一个 mini-batch 里的样本组合成很多个 sub-set, 最后 loss function 在每个 sub-set 上计算 loss. 如下图所示:
![image](../../images/ml_pipeline.jpg)
......@@ -27,11 +27,10 @@
Metric Learning 主要有如下两种学习范式:
<a name='3.1'></a>
### 3.1 Classification based:
这是一类基于分类标签的 Metric Learning 方法。这类方法通过将每个样本分类到正确的类别中,来学习有效的特征表示,学习过程中需要每个样本的显式标签参与 Loss 计算。常见的算法有 [L2-Softmax](https://arxiv.org/abs/1703.09507), [Large-margin Softmax](https://arxiv.org/abs/1612.02295), [Angular Softmax](https://arxiv.org/pdf/1704.08063.pdf), [NormFace](https://arxiv.org/abs/1704.06369), [AM-Softmax](https://arxiv.org/abs/1801.05599), [CosFace](https://arxiv.org/abs/1801.09414), [ArcFace](https://arxiv.org/abs/1801.07698)等。
这是一类基于分类标签的 Metric Learning 方法。这类方法通过将每个样本分类到正确的类别中,来学习有效的特征表示,学习过程中需要每个样本的显式标签参与 Loss 计算。常见的算法有 [L2-Softmax](https://arxiv.org/abs/1703.09507), [Large-margin Softmax](https://arxiv.org/abs/1612.02295), [Angular Softmax](https://arxiv.org/pdf/1704.08063.pdf), [NormFace](https://arxiv.org/abs/1704.06369), [AM-Softmax](https://arxiv.org/abs/1801.05599), [CosFace](https://arxiv.org/abs/1801.09414), [ArcFace](https://arxiv.org/abs/1801.07698)等。
这类方法也被称作是 proxy-based, 因为其本质上优化的是样本和一堆 proxies 之间的相似度。
<a name='3.2'></a>
### 3.2 Pairwise based:
### 3.2 Pairwise based:
这是一类基于样本对的学习范式。他以样本对作为输入,通过直接学习样本对之间的相似度来得到有效的特征表示,常见的算法包括:[Contrastive loss](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf), [Triplet loss](https://arxiv.org/abs/1503.03832), [Lifted-Structure loss](https://arxiv.org/abs/1511.06452), [N-pair loss](https://papers.nips.cc/paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.pdf), [Multi-Similarity loss](https://arxiv.org/pdf/1904.06627.pdf)
2020 年发表的[CircleLoss](https://arxiv.org/abs/2002.10857),从一个全新的视角统一了两种学习范式,让研究人员和从业者对 Metric Learning 问题有了更进一步的思考。
......@@ -13,13 +13,13 @@
- [4.4.1 导出推理模型](#4.4.1)
- [4.4.2 获取特征向量](#4.4.2)
<a name="1"></a>
<a name="1"></a>
## 1. 简介
特征提取是图像识别中的关键一环,它的作用是将输入的图片转化为固定维度的特征向量,用于后续的[向量检索](./vector_search.md)。好的特征需要具备相似度保持性,即在特征空间中,相似度高的图片对其特征相似度要比较高(距离比较近),相似度低的图片对,其特征相似度要比较小(距离比较远)。[Deep Metric Learning](../algorithm_introduction/metric_learning.md)用以研究如何通过深度学习的方法获得具有强表征能力的特征。
<a name="2"></a>
<a name="2"></a>
## 2. 网络结构
为了图像识别任务的灵活定制,我们将整个网络分为 Backbone、 Neck、 Head 以及 Loss 部分,整体结构如下图所示:
......@@ -29,9 +29,9 @@
- **Backbone**: 指定所使用的骨干网络。 值得注意的是,PaddleClas 提供的基于 ImageNet 的预训练模型,最后一层的输出为 1000,我们需要依据所需的特征维度定制最后一层的输出。
- **Neck**: 用以特征增强及特征维度变换。这儿的 Neck,可以是一个简单的 Linear Layer,用来做特征维度变换;也可以是较复杂的 FPN 结构,用以做特征增强。
- **Head**: 用来将 feature 转化为 logits。除了常用的 Fc Layer 外,还可以替换为 cosmargin, arcmargin, circlemargin 等模块。
- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式,可以方便将 Classification Loss 和 Pair_wise Loss 组合在一起。
- **Loss**: 指定所使用的 Loss 函数。我们将 Loss 设计为组合 loss 的形式,可以方便将 Classification Loss 和 Pair_wise Loss 组合在一起。
<a name="3"></a>
<a name="3"></a>
## 3. 通用识别模型
......@@ -56,20 +56,20 @@ PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0
* 速度指标的评测条件为: 开启 MKLDNN, 线程数设置为 10
* 预训练模型地址:[通用识别预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
<a name="4"></a>
<a name="4"></a>
# 4. 自定义特征提取
自定义特征提取,是指依据自己的任务,重新训练特征提取模型。主要包含四个步骤:1)数据准备;2)模型训练;3)模型评估;4)模型推理。
<a name="4.1"></a>
<a name="4.1"></a>
## 4.1 数据准备
首先,需要基于任务定制自己的数据集。数据集格式参见[格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/recognition_dataset.md#%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)。在启动模型训练之前,需要在配置文件中修改数据配置相关的内容, 主要包括数据集的地址以及类别数量。对应到配置文件中的位置如下所示:
```
Head:
name: ArcMargin
name: ArcMargin
embedding_size: 512
class_num: 185341 #此处表示类别数
```
......@@ -82,20 +82,20 @@ PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0
```
```
Query:
dataset:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/. #此处表示query数据集所在的目录
cls_label_path: ./dataset/Aliproduct/val_list.txt. #此处表示query数据集label文件的地址
```
```
Gallery:
dataset:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/ #此处表示gallery数据集所在的目录
cls_label_path: ./dataset/Aliproduct/val_list.txt. #此处表示gallery数据集label文件的地址
```
<a name="4.2"></a>
<a name="4.2"></a>
## 4.2 模型训练
......@@ -111,7 +111,7 @@ python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
**注意:**
**注意:**
配置文件中默认采用`在线评估`的方式,如果你想加快训练速度,去除`在线评估`,只需要在上述命令后面,增加 `-o eval_during_train=False`。训练完毕后,在 output 目录下会生成最终模型文件 `latest``best_model` 和训练日志文件 `train.log`。其中,`best_model` 用来存储当前评测指标下的最佳模型;`latest` 用来存储最新生成的模型, 方便在任务中断的情况下从断点位置启动训练。
- 断点续训:
......@@ -123,7 +123,7 @@ python -m paddle.distributed.launch \
-o Global.checkpoint="output/RecModel/latest"
```
<a name="4.3"></a>
<a name="4.3"></a>
## 4.3 模型评估
......@@ -145,13 +145,13 @@ python -m paddle.distributed.launch \
```
**推荐:** 建议使用多卡评估。多卡评估方式可以利用多卡并行计算快速得到整体数据集的特征集合,能够加速评估的过程。
<a name="4.4"></a>
<a name="4.4"></a>
## 4.4 模型推理
推理过程包括两个步骤: 1)导出推理模型; 2)获取特征向量
<a name="4.4.1"></a>
<a name="4.4.1"></a>
### 4.4.1 导出推理模型
......@@ -163,7 +163,7 @@ python tools/export_model \
生成的推理模型位于 `inference` 目录,里面包含三个文件,分别为 `inference.pdmodel``inference.pdiparams``inference.pdiparams.info`
其中: `inference.pdmodel` 用来存储推理模型的结构, `inference.pdiparams``inference.pdiparams.info` 用来存储推理模型相关的参数信息。
<a name="4.4.2"></a>
<a name="4.4.2"></a>
### 4.4.2 获取特征向量
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册