Measuring the distance between data is a common practice in machine learning. Generally speaking, Euclidean Distance, Inner Product, or Cosine Similarity are all available to calculate measurable data. However, the same operation can hardly be replicated on unstructured data, such as calculating the compatibility between a video and a piece of music. Despite the difficulty in performing the aforementioned vector operation directly due to varied data formats, priori knowledge tells that ED(laugh_video, laugh_music) < ED(laugh_video, blue_music). And how to effectively characterize this "distance"? This is exactly the focus of Metric Learning.
Metric learning, known as Distance Metric Learning, is to automatically construct a task-specific metric function based on training data in the form of machine learning. As shown in the figure below, the goal of Metric learning is to learn a transformation function (either linear or nonlinear) L that maps data points from the original vector space to a new one in which similar points are closer together and non-similar points are further apart, making the metric more task-appropriate. And Deep Metric Learning fits the transformation function by adopting a deep neural network. [![example](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/ml_illustration.jpg)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/ml_illustration.jpg)
Metric learning, known as Distance Metric Learning, is to automatically construct a task-specific metric function based on training data in the form of machine learning. As shown in the figure below, the goal of Metric learning is to learn a transformation function (either linear or nonlinear) L that maps data points from the original vector space to a new one in which similar points are closer together and non-similar points are further apart, making the metric more task-appropriate. And Deep Metric Learning fits the transformation function by adopting a deep neural network. ![example](../../images/ml_illustration.jpg)
## Applications
<aname="2"></a>
## 2.Applications
Metric Learning technologies are widely applied in real life, such as Face Recognition, Person ReID, Image Retrieval, Fine-grained classification, etc. With the growing prevalence of deep learning in industrial practice, Deep Metric Learning (DML) emerges as the current research direction.
Normally, DML consists of three parts: a feature extraction network for map embedding, a sampling strategy to combine samples in a mini-batch into multiple sub-sets, and a loss function to compute the loss on each sub-set. Please refer to the figure below: [![image](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/ml_pipeline.jpg)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/ml_pipeline.jpg)
Normally, DML consists of three parts: a feature extraction network for map embedding, a sampling strategy to combine samples in a mini-batch into multiple sub-sets, and a loss function to compute the loss on each sub-set. Please refer to the figure below: ![image](../../images/ml_pipeline.jpg)
## Algorithms
<aname="3"></a>
## 3.Algorithms
Two learning paradigms are adopted in Metric Learning:
### 1. Classification based:
<aname="3.1"></a>
### 3.1 Classification based:
This refers to methods based on classification labels. They learn the effective feature representation by classifying each sample into the correct category and require the participation of the explicit labels of each sample in the Loss calculation during the learning process. Common algorithms include [L2-Softmax](https://arxiv.org/abs/1703.09507), [Large-margin Softmax](https://arxiv.org/abs/1612.02295), [Angular Softmax](https://arxiv.org/pdf/1704.08063.pdf), [NormFace](https://arxiv.org/abs/1704.06369), [AM-Softmax](https://arxiv.org/abs/1801.05599), [CosFace](https://arxiv.org/abs/1801.09414), [ArcFace](https://arxiv.org/abs/1801.07698), etc. These methods are also called proxy-based, because what they optimize is essentially the similarity between a sample and a set of proxies.
### 2. Pairwise based:
<aname="3.2"></a>
### 3.2 Pairwise based:
This refers to the learning paradigm based on paired samples. It takes sample pairs as input and obtains an effective feature representation by directly learning the similarity between these pairs. Common algorithms include [Contrastive loss](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf), [ Triplet loss](https://arxiv.org/abs/1503.03832), [Lifted-Structure loss](https://arxiv.org/abs/1511.06452), [N-pair loss](https://), [Multi-Similarity loss](https://arxiv.org/pdf/1904.06627.pdf), etc.
@@ -6,7 +6,7 @@ This document elaborates on the dataset format adopted by PaddleClas for image c
## Contents
-[Dataset Format](#1)
-[1.Dataset Format](#1)
-[Common Datasets for Image Classification](#2)
-[2.1 ImageNet1k](#2.1)
-[2.2 Flowers102](#2.2)
...
...
@@ -16,7 +16,7 @@ This document elaborates on the dataset format adopted by PaddleClas for image c
<aname="1"></a>
## 1 Dataset Format
## 1. Dataset Format
PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats:
Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/image_recognition_pipeline/vector_search.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/algorithm_introduction/metric_learning.md) is applied to explore how to obtain features with high representational power through deep learning.
-[1.Introduction](#1)
-[2.Network Structure](#2)
-[3.General Recognition Models](#3)
-[4.Customized Feature Extraction](#4)
-[4.1 Data Preparation](#4.1)
-[4.2 Model Training](#4.2)
-[4.3 Model Evaluation](#4.3)
-[4.4 Model Inference](#4.4)
## 2. Network Structure
<aname="1"></a>
## 1.Introduction
Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning.
<aname="2"></a>
## 2.Network Structure
In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure:
@@ -17,9 +30,10 @@ Functions of the above modules :
-**Head**: Used to transform features into logits. In addition to the common Fc Layer, cosmargin, arcmargin, circlemargin and other modules are all available choices.
-**Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss.
## 3. General Recognition Models
<aname="3"></a>
## 3.General Recognition Models
In PP-Shitu, we have [PP_LCNet_x2_5](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in [General Recognition_configuration files](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets:
In PP-Shitu, we have [PP_LCNet_x2_5](../models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](../../../ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in [General Recognition_configuration files](../.././ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets:
| Datasets | Data Size | Class Number | Scenarios | URL |
@@ -43,13 +57,15 @@ The results are shown in the table below:
- Evaluation conditions for the speed metric: MKLDNN enabled, number of threads set to 10
- Address of the pre-training model: [General recognition pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
# 4. Customized Feature Extraction
<aname="4"></a>
# 4.Customized Feature Extraction
Customized feature extraction refers to retraining the feature extraction model based on one's own task. It consists of four main steps: 1) data preparation, 2) model training, 3) model evaluation, and 4) model inference.
<aname="4.1"></a>
## 4.1 Data Preparation
To start with, customize your dataset based on the task (See [Format description](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/recognition_dataset.md#数据集格式说明) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below:
To start with, customize your dataset based on the task (See [Format description](../data_preparation/recognition_dataset_en.md#1) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below:
```
Head:
...
...
@@ -82,6 +98,7 @@ Train:
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for gallery dataset
**Recommendation:** It is suggested to employ multi-card evaluation, which can quickly obtain the feature set of the overall dataset using multi-card parallel computing, accelerating the evaluation process.
<aname="4.4"></a>
## 4.4 Model Inference
Two steps are included in the inference: 1)exporting the inference model; 2)obtaining the feature vector.
The output format of the obtained features is shown in the figure below:[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/feature_extraction_output.png)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/feature_extraction_output.png)
The output format of the obtained features is shown in the figure below:![img](../../images/feature_extraction_output.png)
In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/image_recognition_pipeline/vector_search.md).
In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](./vector_search_en.md).
-[2. Installation of Serving ](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#2)
-[3. Service Deployment for Image Classification](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#3)
-[3.1 Model Transformation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#3.1)
-[3.2 Service Deployment and Request](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#3.2)
-[4. Service Deployment for Image Recognition](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#4)
-[4.1 Model Transformation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#4.1)
-[4.2 Service Deployment and Request](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/inference_deployment/paddle_serving_deploy.md#4.2)
-[3. Service Deployment for Image Classification](#3)
-[3.1 Model Transformation](#3.1)
-[3.2 Service Deployment and Request](#3.2)
-[4. Service Deployment for Image Recognition](#4)
-[4.1 Model Transformation](#4.1)
-[4.2 Service Deployment and Request](#4.2)
-[5. FAQ](#5)
<aname="1"></a>
## 1. Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) is designed to provide easy deployment of on-line prediction services for deep learning developers, it supports one-click deployment of industrial-grade services, highly concurrent and efficient communication between client and server, and multiple programming languages for client development.
This section, exemplified by HTTP deployment of prediction service, describes how to deploy model services in PaddleClas with PaddleServing. Currently, only deployment on Linux platform is supported. Windows platform is not supported.
<aname="2"></a>
## 2. Installation of Serving
It is officially recommended to use docker for the installation and environment deployment of Serving. First, pull the docker and create a Serving-based one.
- Speed up the installation process by replacing the source with `-i https://pypi.tuna.tsinghua.edu.cn/simple`.
- For other environment configuration and installation, please refer to [Install Paddle Serving using docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
- For other environment configuration and installation, please refer to [Install Paddle Serving using docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
- To deploy CPU services, please install the CPU version of serving-server with the following command.
```
pip install paddle-serving-server
```
<aname="3"></a>
## 3. Service Deployment for Image Classification
<aname="3.1"></a>
### 3.1 Model Transformation
When adopting PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part takes the classic ResNet50_vd model as an example to introduce the deployment of image classification service.
...
...
@@ -118,8 +118,7 @@ fetch_var {
}
```
<aname="3.2"></a>
### 3.2 Service Deployment and Request
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
...
...
@@ -139,7 +138,7 @@ classification_web_service.py # Script for starting the pipeline server
python3 classification_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following [![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/deploy/paddleserving/imgs/start_server.png)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/paddleserving/imgs/start_server.png)
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../imgs/start_server.png)
- Send request:
...
...
@@ -148,14 +147,16 @@ Once the service is successfully started, a log will be printed in log.txt simil
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example:[![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/deploy/paddleserving/imgs/results.png)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/paddleserving/imgs/results.png)
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example:![img](../imgs/results.png)
<aname="4"></a>
## 4. Service Deployment for Image Recognition
When using PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part, exemplified by the ultra-lightweight model for image recognition in PP-ShiTu, details the deployment of image recognition service.
<aname="4.1"></a>
## 4.1 Model Transformation
- Download inference models for general detection and general recognition
...
...
@@ -225,8 +226,7 @@ cd ../
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar
```
<aname="4.2"></a>
## 4.2 Service Deployment and Request
**Note:** Since the recognition service involves multiple models, PipeLine is adopted for better performance. This deployment method does not support the windows platform for now.
...
...
@@ -254,7 +254,7 @@ recognition_web_service.py # Script for starting the pipeline server
python3 recognition_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following [![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/deploy/paddleserving/imgs/start_server_shitu.png)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/paddleserving/imgs/start_server_shitu.png)
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../imgs/start_server_shitu.png)
- Send request:
...
...
@@ -262,10 +262,10 @@ Once the service is successfully started, a log will be printed in log.txt simil
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example: [![img](https://github.com/PaddlePaddle/PaddleClas/raw/develop/deploy/paddleserving/imgs/results_shitu.png)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/deploy/paddleserving/imgs/results_shitu.png)
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example: ![img](../imgs/results_shitu.png)
<aname="5"></a>
## 5.FAQ
**Q1**: After sending a request, no result is returned or the output is prompted with a decoding error.