update for en docs

3254addf · HydrogenSulfate · a56366a9 · 3254addf · 3254addf · 3254addf
6 changed file
--- a/README_ch.md
+++ b/README_ch.md
@@ -110,6 +110,7 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
  - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
  - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
  - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
+  - [图像分类FAQ第三季](docs/zh_CN/faq_series/faq_2022_s1.md)
 - [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
 - [许可证书](#许可证书)
 - [贡献代码](#贡献代码)
@@ -123,7 +124,7 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
 </div>
-PP-ShiTuV2是一个实用的轻量级通用图像识别系统，主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面，采用多种策略，对各个模块的模型进行优化，PP-ShiTuV2相比V1而已，Recall1提升10+个点。更多细节请参考[PP-ShiTuV2详细介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)。
+PP-ShiTuV2是一个实用的轻量级通用图像识别系统，主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面，采用多种策略，对各个模块的模型进行优化，PP-ShiTuV2相比V1，Recall1提升近8个点。更多细节请参考[PP-ShiTuV2详细介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)。
 <a name="识别效果展示"></a>

--- a/README_en.md
+++ b/README_en.md
@@ -7,20 +7,22 @@
 PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
 <div align="center">
-<img src="./docs/images/class_simple_en.gif"  width = "600" />
+<img src="./docs/images/recognition.gif"  width = "400" />
+<p>PP-ShiTuV2 demo images</p>
-PULC demo images
 </div>
-&nbsp;
 <div align="center">
-<img src="./docs/images/recognition.gif"  width = "400" />
+<img src="./docs/images/class_simple_en.gif"  width = "600" />
-PP-ShiTu demo images
+PULC demo images
 </div>
 **Recent updates**
+- 🔥️ Release [PP-ShiTuV2](./docs/en/PPShiTu/PPShiTuV2_introduction.md), recall1 is improved by nearly 8 points, covering 20+ recognition scenarios, with [index management tool](./deploy/shitu_index_manager/README.md) and [Android Demo](./docs/en/quick_start/quick_start_recognition_en.md) for better experience.
 - 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario.
 - 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
@@ -58,6 +60,18 @@ Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassific
 - [Install Paddle](./docs/en/installation/install_paddle_en.md)
 - [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md)
+- [PP-ShiTuV2 Image Recognition Systems Introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md)
+  - [Submodule Introduction]
+    - [Mainbody Detection](./docs/en/image_recognition_pipeline/mainbody_detection.md)
+    - [Feature Extraction](./docs/en/image_recognition_pipeline/feature_extraction.md)
+    - [Vector Search](./docs/en/image_recognition_pipeline/vector_search.md)
+    - [Hash Encoding](docs/en/image_recognition_pipeline/deep_hashing.md)
+  - PipeLine 推理部署
+    - [Python Inference](docs/en/inference_deployment/python_deploy.md#2)
+    - [C++ Inference](deploy/cpp_shitu/readme.md)
+    - [Serving Deployment](docs/en/inference_deployment/recognition_serving_deploy.md)
+    - [Lite Deployment](docs/en/inference_deployment/lite_shitu.md)
+    - [Shitu Gallery Manager Tool](docs/en/inference_deployment/shitu_gallery_manager.md)
 - [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md)
  - [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md)
  - [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md)
@@ -108,41 +122,55 @@ PULC models inference within 3ms on CPU devices, with accuracy comparable with S
 <img src="./docs/images/structure.jpg"  width = "800" />
 </div>
-Image recognition can be divided into three steps:
+PP-ShiTuV2 is a practical lightweight general image recognition system, which is mainly composed of three modules: mainbody detection model, feature extraction model and vector search tool. The system adopts a variety of strategies including backbone network, loss function, data augmentations, optimal hyperparameters, pre-training model, model pruning and quantization. Compared to V1, PP-ShiTuV2, Recall1 is improved by nearly 8 points. For more details, please refer to [PP-ShiTuV2 introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md).
- （1）Identify region proposal for target objects through a detection model；
- （2）Extract features for each region proposal;
- （3）Search features in the retrieval database and output results;
 For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
-<a name="Clas_Demo_images"></a>
+<a name="Rec_Demo_images"></a>
-## PULC demo images
+## PP-ShiTuV2 Demo images
+- Drinks recognition
 <div align="center">
-<img src="docs/images/classification_en.gif">
+<img src="docs/images/drink_demo.gif">
 </div>
-<a name="Rec_Demo_images"></a>
-## Image Recognition Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
 - Product recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif"  width = "400" />
 </div>
 - Cartoon character recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769746-6b019700-d2d7-11eb-86df-f1d710999ba6.gif"  width = "400" />
 </div>
 - Logo recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769837-7fde2a80-d2d7-11eb-9b69-04140e9d785f.gif"  width = "400" />
 </div>
 - Car recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769916-8ec4dd00-d2d7-11eb-8c60-42d89e25030c.gif"  width = "400" />
 </div>
+<a name="Clas_Demo_images"></a>
+## PULC demo images
+<div align="center">
+<img src="docs/images/classification_en.gif">
+</div>
 <a name="License"></a>
 ## License
 PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>

--- a/docs/en/PPShiTu/PPShiTuV2_introduction.md
+++ b/docs/en/PPShiTu/PPShiTuV2_introduction.md
+## PP-ShiTuV2 Image Recognition System
+## Content
+- [PP-ShiTuV2 Introduction](#pp-shituv2-introduction)
+  - [Dataset](#dataset)
+  - [Model Training](#model-training)
+  - [Model Evaluation](#model-evaluation)
+  - [Model Inference](#model-inference)
+  - [Model Deployment](#model-deployment)
+- [Module introduction](#module-introduction)
+  - [Mainbody Detection](#mainbody-detection)
+  - [Feature Extraction](#feature-extraction)
+    - [Dataset](#dataset-1)
+    - [Backbone](#backbone)
+    - [Network Structure](#network-structure)
+    - [Data Augmentation](#data-augmentation)
+- [references](#references)
+## PP-ShiTuV2 Introduction
+PP-ShiTuV2 is a practical lightweight general image recognition system based on PP-ShiTuV1. Compared with PP-ShiTuV1, it has higher recognition accuracy, stronger generalization ability and similar inference speed<sup>*</sup >. The system is mainly optimized for training data set and feature extraction, with a better backbone, loss function and training strategy. The retrieval performance of PP-ShiTuV2 in multiple practical application scenarios is significantly improved.
+<div align="center">
+<img src="../../images/structure.jpg" />
+</div>
+### Dataset
+We remove some uncommon datasets add more common datasets in training stage. For more details, please refer to [PP-ShiTuV2 dataset](../image_recognition_pipeline/feature_extraction.md#4-实验部分).
+The following takes the dataset of [PP-ShiTuV2](../image_recognition_pipeline/feature_extraction.md#4-实验部分) as an example to introduce the training, evaluation and inference process of the PP-ShiTuV2 model.
+### Model Training
+Download the 17 datasets in [PP-ShiTuV2 dataset](../image_recognition_pipeline/feature_extraction.md#4-实验部分) and merge them manually, then generate the annotation text file `train_reg_all_data_v2.txt`, and finally place them in `dataset` directory.
+The merged 17 datasets structure is as follows:
+```python
+dataset/
+├── Aliproduct/ # Aliproduct dataset folder
+├── SOP/ # SOPt dataset folder
+├── ...
+├── Products-10k/ # Products-10k dataset folder
+├── ...
+└── train_reg_all_data_v2.txt # Annotation text file
+```
+The content of the generated `train_reg_all_data_v2.txt` is as follows:
+```log
+...
+Aliproduct/train/50029/1766228.jpg 50029
+Aliproduct/train/50029/1764348.jpg 50029
+...
+Products-10k/train/88823.jpg 186440
+Products-10k/train/88824.jpg 186440
+...
+```
+Then run the following command to train:
+```shell
+# Use GPU 0 for single-card training
+export CUDA_VISIBLE_DEVICES=0
+python3.7 tools/train.py \
+-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
+# Use 8 GPUs for distributed training
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python3.7 -m paddle.distributed.launch tools/train.py \
+-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
+```
+**Note:** `eval_during_train` will be enabled by default during training. After each `eval_interval` epoch, the model will be evaluated on the data set specified by `Eval` in the configuration file (the default is Aliproduct) and calculated for reference. index.
+### Model Evaluation
+Reference [Model Evaluation](../image_recognition_pipeline/feature_extraction_en.md#43-model-evaluation)
+### Model Inference
+Refer to [Python Model Reasoning](../quick_start/quick_start_recognition.md#22-Image Recognition Experience) and [C++ Model Reasoning](../../../deploy/cpp_shitu/readme_en.md)
+### Model Deployment
+Reference [Model Deployment](../inference_deployment/recognition_serving_deploy_en.md#32-service-deployment-and-request)
+## Module introduction
+### Mainbody Detection
+The main body detection model uses `PicoDet-LCNet_x2_5`, for details refer to: [picodet_lcnet_x2_5_640_mainbody](../image_recognition_pipeline/mainbody_detection.md).
+### Feature Extraction
+#### Dataset
+On the basis of the training data set used in PP-ShiTuV1, we removed the iCartoonFace data set, and added more widely used data sets, such as bird400, Cars, Products-10k, fruits- 262.
+#### Backbone
+We replaced the backbone network from `PPLCNet_x2_5` to [`PPLCNetV2_base`](../models/PP-LCNetV2.md). Compared with `PPLCNet_x2_5`, `PPLCNetV2_base` basically maintains a higher classification accuracy and reduces the 40% of inference time <sup>*</sup>.
+**Note:** <sup>*</sup>The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.
+#### Network Structure
+We adjust the `PPLCNetV2_base` structure, and added more general and effective optimizations for retrieval tasks such as pedestrian re-detection, landmark retrieval, and face recognition. It mainly includes the following points:
+1. `PPLCNetV2_base` structure adjustment: The experiment found that [`ReLU`](../../../ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py#L322) at the end of the network has a great impact on the retrieval performance, [`FC`](../../../ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py#L325) also causes a slight drop in retrieval performance, so we removed `ReLU` and `FC` at the end of BackBone.
+2. `last stride=1`: No downsampling is performed at last stage, so as to increase the semantic information of the final output feature map, without having much more computational cost.
+3. `BN Neck`: Add a `BatchNorm1D` layer after `BackBone` to normalize each dimension of the feature vector, bringing faster convergence.
+    | Model                                                              | training data      | recall@1%(mAP%) |
+    | :----------------------------------------------------------------- | :----------------- | :-------------- |
+    | GeneralRecognition_PPLCNet_x2_5                                                         | PP-ShiTuV1 dataset | 65.9(54.3)      |
+    | GeneralRecognitionV2_PPLCNetV2_base(TripletLoss) | PP-ShiTuV1 dataset | 72.3(60.5)      |
+4. `TripletAngularMarginLoss`: We improved on the original `TripletLoss` (difficult triplet loss), changed the optimization objective from L2 Euclidean space to cosine space, and added an additional space between anchor and positive/negtive The hard distance constraint makes the training and testing goals closer and improves the generalization ability of the model.
+    | Model | training data | recall@1%(mAP%) |
+    | :---- | :------------ |: -------------- |
+    | GeneralRecognitionV2_PPLCNetV2_base(TripletLoss) | PP-ShiTuV2 dataset | 71.9(60.2) |
+    | GeneralRecognitionV2_PPLCNetV2_base(TripletAngularMarginLoss) | PP-ShiTuV2 dataset | 73.7(61.0) |
+#### Data Augmentation
+The target object may rotate to a certain extent and may not maintain an upright state when the actual camera is shot, so we add [random rotation augmentation](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117) in the data augmentation to make retrieval more robust in real scenes.
+Combining the above strategies, the final experimental results on multiple data sets are as follows:
+  | Model      | product<sup>*</sup> |
+  | :--------- | :------------------ |
+  | -          | recall@1%(mAP%)     |
+  | GeneralRecognition_PPLCNet_x2_5 | 65.9(54.3)          |
+  | GeneralRecognitionV2_PPLCNetV2_base | 73.7(61.0)          |
+  | Models     | Aliproduct      | VeRI-Wild       | LogoDet-3k      | iCartoonFace    | SOP             | Inshop           |
+  | :--------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :--------------- |
+  | -          | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@ 1%(mAP%) |
+  | GeneralRecognition_PPLCNet_x2_5 | 83.9(83.2)      | 88.7(60.1)      | 86.1(73.6)      | 84.1(72.3)      | 79.7(58.6)      | 89.1(69.4)       |
+  | GeneralRecognitionV2_PPLCNetV2_base | 84.2(83.3)      | 87.8(68.8)      | 88.0(63.2)      | 53.6(27.5)      | 77.6(55.3)      | 90.8(74.3)       |
+  | model      | gldv2           | imdb_face       | iNat            | instre          | sketch          | sop<sup>*</sup>  |
+  | :--------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :--------------- |
+  | -          | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@ 1%(mAP%) |
+  | GeneralRecognition_PPLCNet_x2_5 | 98.2(91.6)      | 28.8(8.42)      | 12.6(6.1)       | 72.0(50.4)      | 27.9(9.5)       | 97.6(90.3)       |
+  | GeneralRecognitionV2_PPLCNetV2_base | 98.1(90.5)      | 35.9(11.2)      | 38.6(23.9)      | 87.7(71.4)      | 39.3(15.6)      | 98.3(90.9)       |
+**Note:** The product dataset is made to verify the generalization performance of PP-ShiTu, and all the data are not present in the training and testing sets. The data contains 7 categories ( cosmetics, landmarks, wine, watches, cars, sports shoes, beverages) and 250 sub-categories. When testing, use the labels of 250 small classes for testing; the sop dataset comes from [GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122), which can be regarded as " SOP" dataset.
+## references
+1. Schall, Konstantin, et al. "GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval." International Conference on Multimedia Modeling. Springer, Cham, 2022.
+2. Luo, Hao, et al. "A strong baseline and batch normalization neck for deep person re-identification." IEEE Transactions on Multimedia 22.10 (2019): 2597-2609.
--- a/docs/en/image_recognition_pipeline/feature_extraction_en.md
+++ b/docs/en/image_recognition_pipeline/feature_extraction_en.md
@@ -12,12 +12,15 @@
    - [4.4 Model Inference](#4.4)
 <a name="1"></a>
-## 1.Introduction
+## 1. Abstract
 Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning.
 <a name="2"></a>
-## 2.Network Structure
+## 2. Introduction
 In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure:
@@ -31,152 +34,239 @@ Functions of the above modules :
 - **Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss.
 <a name="3"></a>
-## 3.General Recognition Models
-In PP-Shitu, we have [PP_LCNet_x2_5](../models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](../../../ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in  [General Recognition_configuration files](../../../ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets:
-| Datasets     | Data Size | Class Number | Scenarios          | URL                                                          |
-| ------------ | --------- | ------------ | ------------------ | ------------------------------------------------------------ |
-| Aliproduct   | 2498771   | 50030        | Commodities        | [URL](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
-| GLDv2        | 1580470   | 81313        | Landmarks          | [URL](https://github.com/cvdfoundation/google-landmark)      |
-| VeRI-Wild    | 277797    | 30671        | Vehicle            | [URL](https://github.com/PKU-IMRE/VERI-Wild)                 |
-| LogoDet-3K   | 155427    | 3000         | Logo               | [URL](https://github.com/Wangjing1551/LogoDet-3K-Dataset)    |
-| iCartoonFace | 389678    | 5013         | Cartoon Characters | [URL](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
-| SOP          | 59551     | 11318        | Commodities        | [URL](https://cvgl.stanford.edu/projects/lifted_struct/)     |
-| Inshop       | 25882     | 3997         | Commodities        | [URL](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
-| **Total**    | **5M**    | **185K**     | ----               | ----                                                         |
-The results are shown in the table below:
-| Model         | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP   | Inshop | Latency(ms) |
-| ------------- | ---------- | --------- | ---------- | ------------ | ----- | ------ | ----------- |
-| PP-LCNet-2.5x | 0.839      | 0.888     | 0.861      | 0.841        | 0.793 | 0.892  | 5.0         |
- Evaluation metric: `Recall@1`
- CPU of the speed evaluation machine: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.
- Evaluation conditions for the speed metric: MKLDNN enabled, number of threads set to 10
- Address of the pre-training model: [General recognition pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
-<a name="4"></a>
-## 4.Customized Feature Extraction
-Customized feature extraction refers to retraining the feature extraction model based on one's own task. It consists of four main steps: 1) data preparation, 2) model training, 3) model evaluation, and 4) model inference.
-<a name="4.1"></a>
-### 4.1 Data Preparation
-To start with, customize your dataset based on the task (See [Format description](../data_preparation/recognition_dataset_en.md#1) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below:
+## 3. Methods
-```
+#### 3.1 Backbone
- Head:
-    name: ArcMargin
-    embedding_size: 512
-    class_num: 185341    #Number of class
-```
-```
+The Backbone part adopts [PP-LCNetV2_base](../models/PP-LCNetV2.md), which is based on `PPLCNet_V1`, including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement After several optimization points, the final classification accuracy is similar to `PPLCNet_x2_5`, and the inference delay is reduced by 40%<sup>*</sup>. During the experiment, we made appropriate improvements to `PPLCNetV2_base`, so that it can achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing `ReLU` and ` at the end of `PPLCNetV2_base` FC`, change the stride of the last stage (RepDepthwiseSeparable) to 1.
-Train:
-    dataset:
-      name: ImageNetDataset
-      image_root: ./dataset/     #The directory where the train dataset is located
-      cls_label_path: ./dataset/train_reg_all_data.txt  #The address of label file for train dataset
-```
-```
+**Note:** <sup>*</sup>The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.
- Query:
-      dataset:
-        name: VeriWild
-        image_root: ./dataset/Aliproduct/.    #The directory where the query dataset is located
-        cls_label_path: ./dataset/Aliproduct/val_list.txt.    #The address of label file for query dataset
-```
-```
+#### 3.2 Neck
- Gallery:
-      dataset:
-        name: VeriWild
-        image_root: ./dataset/Aliproduct/    #The directory where the gallery dataset is located
-        cls_label_path: ./dataset/Aliproduct/val_list.txt.   #The address of label file for gallery dataset
-```
-<a name="4.2"></a>
+We use [BN Neck](../../../ppcls/arch/gears/bnneck.py) to standardize each dimension of the features extracted by Backbone, reducing difficulty of optimizing metric learning loss and identification  loss simultaneously.
-### 4.2 Model Training
- Single machine single card training
+#### 3.3 Head
-```
+We use [FC Layer](../../../ppcls/arch/gears/fc.py) as the classification head to convert features into logits for classification loss.
-export CUDA_VISIBLE_DEVICES=0
-python tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
-```
- Single machine multi card training
+#### 3.4 Loss
-```
+We use [Cross entropy loss](../../../ppcls/loss/celoss.py) and [TripletAngularMarginLoss](../../../ppcls/loss/tripletangularmarginloss.py), and we improved the original TripletLoss(TriHard Loss), replacing the optimization objective from L2 Euclidean space to cosine space, adding a hard distance constraint between anchor and positive/negtive, so the generalization ability of the model is improved. For detailed configuration files, see [GeneralRecognitionV2_PPLCNetV2_base.yaml](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-77).
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
-    --gpus="0,1,2,3" tools/train.py \
-    -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
-```
-**Note:** The configuration file adopts `online evaluation` by default, if you want to speed up the training and remove `online evaluation`, just add `-o eval_during_train=False` after the above command. After training, the final model files `latest`, `best_model` and the training log file `train.log` will be generated under the directory output. Among them, `best_model` is utilized to store the best model under the current evaluation metrics while`latest` is adopted to store the latest generated model, making it convenient to resume the training from where it was interrupted.
+#### 3.5 Data Augmentation
- Resumption of Training：
+We consider that the object may rotate to a certain extent and can not maintain an upright state in real scenes, so we add an appropriate [random rotation](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117) in the data augmentation to improve the retrieval performance in real scenes.
-```
+<a name="4"></a>
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
-    --gpus="0,1,2,3" tools/train.py \
-    -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-    -o Global.checkpoint="output/RecModel/latest"
-```
-<a name="4.3"></a>
+## 4. Experimental
-### 4.3 Model Evaluation
+We reasonably expanded and optimized the original training data, and finally used a summary of the following 17 public datasets:
+| Dataset                | Data Amount | Number of Categories |  Scenario   |                                     Dataset Address                                     |
+| :--------------------- | :---------: | :------------------: | :---------: | :-------------------------------------------------------------------------------------: |
+| Aliproduct             |   2498771   |        50030         | Commodities |      [Address](https://retailvisionworkshop.github.io/recognition_challenge_2020/)      |
+| GLDv2                  |   1580470   |        81313         |  Landmark   |               [address](https://github.com/cvdfoundation/google-landmark)               |
+| VeRI-Wild              |   277797    |        30671         |  Vehicles   |                    [Address](https://github.com/PKU-IMRE/VERI-Wild)                     |
+| LogoDet-3K             |   155427    |         3000         |    Logo     |              [Address](https://github.com/Wangjing1551/LogoDet-3K-Dataset)              |
+| SOP                    |    59551    |        11318         | Commodities |              [Address](https://cvgl.stanford.edu/projects/lifted_struct/)               |
+| Inshop                 |    25882    |         3997         | Commodities |            [Address](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html)             |
+| bird400                |    58388    |         400          |    birds    |          [address](https://www.kaggle.com/datasets/gpiosenka/100-bird-species)          |
+| 104flows               |    12753    |         104          |   Flowers   |              [Address](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)              |
+| Cars                   |    58315    |         112          |  Vehicles   |            [Address](https://ai.stanford.edu/~jkrause/cars/car_dataset.html)            |
+| Fashion Product Images |    44441    |          47          |  Products   | [Address](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) |
+| flowerrecognition      |    24123    |          59          |   flower    |         [address](https://www.kaggle.com/datasets/aymenktari/flowerrecognition)         |
+| food-101               |   101000    |         101          |    food     |         [address](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/)          |
+| fruits-262             |   225639    |         262          |   fruits    |            [address](https://www.kaggle.com/datasets/aelchimminut/fruits262)            |
+| inaturalist            |   265213    |         1010         |   natural   |           [address](https://github.com/visipedia/inat_comp/tree/master/2017)            |
+| indoor-scenes          |    15588    |          67          |   indoor    |       [address](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019)       |
+| Products-10k           |   141931    |         9691         |  Products   |                       [Address](https://products-10k.github.io/)                        |
+| CompCars               |    16016    |         431          |  Vehicles   |     [Address](http://http://ai.stanford.edu/~jkrause/cars/car_dataset.html)      |
+| **Total**              |   **6M**    |       **192K**       |      -      |                                            -                                            |
+The final model accuracy metrics are shown in the following table:
+| Model                  | Latency (ms) | Storage (MB) | product<sup>*</sup> |      | Aliproduct |      | VeRI-Wild |      | LogoDet-3k |      | iCartoonFace |      | SOP      |           | Inshop |          | gldv2 |          | imdb_face |          | iNat |          | instre |          | sketch |          | sop |     |
+| :--------------------- | :----------- | :----------- | :------------------ | :--- | ---------- | ---- | --------- | ---- | ---------- | ---- | ------------ | ---- | -------- | --------- | ------ | -------- | ----- | -------- | --------- | -------- | ---- | -------- | ------ | -------- | ------ | -------- | --- | --- |
+|                        |              |              | recall@1            | mAP  | recall@1   | mAP  | recall@1  | mAP  | recall@1   | mAP  | recall@1     | mAP  | recall@1 | mrecall@1 | mAP    | recall@1 | mAP   | recall@1 | mAP       | recall@1 | mAP  | recall@1 | mAP    | recall@1 | mAP    | recall@1 | mAP |
+| PP-ShiTuV1_general_rec | 5.0          | 34           | 65.9                | 54.3 | 83.9       | 83.2 | 88.7      | 60.1 | 86.1       | 73.6 |              | 50.4 | 27.9     | 9.5       | 97.6   | 90.3     |
+| PP-ShiTuV2_general_rec | 6.1          | 19           | 73.7                | 61.0 | 84.2       | 83.3 | 87.8      | 68.8 | 88.0       | 63.2 | 53.6         | 27.5 |          | 71.4      | 39.3   | 15.6     | 98.3  | 90.9     |
+*The product dataset is a dataset made to verify the generalization performance of PP-ShiTu, and all the data are not present in the training and testing sets. The data contains 7 major categories (cosmetics, landmarks, wine, watches, cars, sports shoes, beverages) and 250 subcategories. When testing, use the labels of 250 small classes for testing; the sop dataset comes from [GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122), which can be regarded as " SOP" dataset.
+* Pre-trained model address: [general_PPLCNetV2_base_pretrained_v1.0.pdparams](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams)
+* The evaluation metrics used are: `Recall@1` and `mAP`
+* The CPU specific information of the speed test machine is: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`
+* The evaluation conditions of the speed indicator are: MKLDNN is turned on, and the number of threads is set to 10
+<a name="5"></a>
+## 5. Custom Feature Extraction
+Custom feature extraction refers to retraining the feature extraction model according to your own task.
+Based on the `GeneralRecognitionV2_PPLCNetV2_base.yaml` configuration file, the following describes the main four steps: 1) data preparation; 2) model training; 3) model evaluation; 4) model inference
+<a name="5.1"></a>
+### 5.1 Data Preparation
+First you need to customize your own dataset based on the task. Please refer to [Dataset Format Description](../data_preparation/recognition_dataset.md) for the dataset format and file structure.
+After the preparation is complete, it is necessary to modify the content related to the data configuration in the configuration file, mainly including the path of the dataset and the number of categories. As is as shown below:
+- Modify the number of classes:
+  ```yaml
+  Head:
+    name: FC
+    embedding_size: *feat_dim
+    class_num: 192612 # This is the number of classes
+    weight_attr:
+      initializer:
+        name: Normal
+        std: 0.001
+    bias_attr: False
+  ```
+- Modify the training dataset configuration:
+  ```yaml
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ # Here is the directory where the train dataset is located
+      cls_label_path: ./dataset/train_reg_all_data_v2.txt # Here is the path of the label file corresponding to the train dataset
+      relabel: True
+  ```
+- Modify the query data configuration in the evaluation dataset:
+  ```yaml
+  Query:
+    dataset:
+      name: VeriWild
+      image_root: ./dataset/Aliproduct/ # Here is the directory where the query dataset is located
+      cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the query dataset
+  ```
+- Modify the gallery data configuration in the evaluation dataset:
+  ```yaml
+  Gallery:
+    dataset:
+      name: VeriWild
+      image_root: ./dataset/Aliproduct/ # This is the directory where the gallery dataset is located
+      cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the gallery dataset
+  ```
+<a name="5.2"></a>
+### 5.2 Model training
+Model training mainly includes the starting training and restoring training from checkpoint
+- Single machine and single card training
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0
+  python3.7 tools/train.py \
+  -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
+  ```
+- Single machine multi-card training
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3
+  python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
+  tools/train.py \
+  -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
+  ```
+**Notice:**
+The online evaluation method is used by default in the configuration file. If you want to speed up the training, you can turn off the online evaluation function, just add `-o Global.eval_during_train=False` after the above scripts.
+After training, the final model files `latest.pdparams`, `best_model.pdarams` and the training log file `train.log` will be generated in the output directory. Among them, `best_model` saves the best model under the current evaluation index, and `latest` is used to save the latest generated model, which is convenient to resume training from the checkpoint when training task is interrupted. Training can be resumed from a checkpoint by adding `-o Global.checkpoint="path_to_resume_checkpoint"` to the end of the above training scripts, as shown below.
+- Single machine and single card checkpoint recovery training
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0
+  python3.7 tools/train.py \
+  -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
+  -o Global.checkpoint="output/RecModel/latest"
+  ```
+- Single-machine multi-card checkpoint recovery training
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3
+  python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
+  tools/train.py \
+  -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
+  -o Global.checkpoint="output/RecModel/latest"
+  ```
+<a name="5.3"></a>
+### 5.3 Model Evaluation
+In addition to the online evaluation of the model during training, the evaluation program can also be started manually to obtain the specified model's accuracy metrics.
 - Single Card Evaluation
+  ```shell
-```
+  export CUDA_VISIBLE_DEVICES=0
-export CUDA_VISIBLE_DEVICES=0
+  python3.7 tools/eval.py \
-python tools/eval.py \
+  -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
+  -o Global.pretrained_model="output/RecModel/best_model"
-o Global.pretrained_model="output/RecModel/best_model"
+  ```
-```
 - Multi Card Evaluation
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3
+  python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
+  tools/eval.py \
+  -c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
+  -o Global.pretrained_model="output/RecModel/best_model"
+  ```
+**Note:** Multi Card Evaluation is recommended. This method can quickly obtain the metric cross all the data by using multi-card parallel computing, which can speed up the evaluation.
-```
+<a name="5.4"></a>
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -m paddle.distributed.launch \
-    --gpus="0,1,2,3" tools/eval.py \
-    -c  ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-    -o  Global.pretrained_model="output/RecModel/best_model"
-```
-**Recommendation:** It is suggested to employ multi-card evaluation, which can quickly obtain the feature set of the overall dataset using multi-card parallel computing, accelerating the evaluation process.
-<a name="4.4"></a>
+### 5.4 Model Inference
-### 4.4 Model Inference
-Two steps are included in the inference: 1)exporting the inference model; 2)obtaining the feature vector.
+The inference process consists of two steps: 1) Export the inference model; 2) Model inference to obtain feature vectors
-#### 4.4.1 Export Inference Model
+#### 5.4.1 Export inference model
-```
+First, you need to convert the `*.pdparams` model file into inference format. The conversion script is as follows.
-python tools/export_model.py \
+```shell
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
+python3.7 tools/export_model.py \
+-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
 -o Global.pretrained_model="output/RecModel/best_model"
 ```
+The generated inference model is located in the `PaddleClas/inference` directory by default, which contains three files, `inference.pdmodel`, `inference.pdiparams`, `inference.pdiparams.info`.
+Where `inference.pdmodel` is used to store the structure of the inference model, `inference.pdiparams` and `inference.pdiparams.info` are used to store parameter information related to the inference model.
-The generated inference models are under the directory `inference`, which comprises three files, namely, `inference.pdmodel`、`inference.pdiparams`、`inference.pdiparams.info`. Among them, `inference.pdmodel` serves to store the structure of inference model while  `inference.pdiparams` and `inference.pdiparams.info` are mobilized to store model-related parameters.
+#### 5.4.2 Get feature vector
-#### 4.4.2 Obtain Feature Vector
+Use the inference model converted in the previous step to convert the input image into corresponding feature vector. The inference script is as follows.
-```
+```shell
 cd deploy
-python python/predict_rec.py \
+python3.7 python/predict_rec.py \
 -c configs/inference_rec.yaml \
 -o Global.rec_inference_model_dir="../inference"
 ```
+The resulting feature output format is as follows:
+```log
+wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02
+  4.39076796e-02 -2.34078392e-02 -9.49947070e-03 1.28221214e-02
+  5.53947650e-02 1.01355985e-02 -1.06436480e-02 4.97181974e-02
+ -2.21862812e-02 -1.75557341e-02 1.55848479e-02 -3.33278324e-03
+ ...
+ -3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02]
+```
+In most cases, just getting the features may not meet the users' requirements. If you want to go further on the image recognition task, you can refer to the document [Vector Search](./vector_search.md).
+<a name="6"></a>
+## 6. Summary
+As a key part of image recognition, the feature extraction module has a lot of points for improvement in the network structure and the the loss function. Different datasets have their own characteristics, such as person re-identification, commodity recognition, face recognition. According to these characteristics, the academic community has proposed various methods, such as PCB, MGN, ArcFace, CircleLoss, TripletLoss, etc., which focus on the ultimate goal of increasing the gap between classes and reducing the gap within classes, so as to make a retrieval model robust enough in most scenes.
+<a name="7"></a>
-The output format of the obtained features is shown in the figure below:![img](../../images/feature_extraction_output.png)
+## 7. References
-In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](./vector_search_en.md).
+1. [PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf)
+2. [Bag of Tricks and A Strong Baseline for Deep Person Re-identification](https://openaccess.thecvf.com/content_CVPRW_2019/papers/TRMTMCT/Luo_Bag_of_Tricks_and_a_Strong_Baseline_for_Deep_Person_CVPRW_2019_paper.pdf)
--- a/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md
+++ b/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md
@@ -33,7 +33,7 @@ PP-ShiTuV2 是基于 PP-ShiTuV1 改进的一个实用轻量级通用图像识别
 ### 模型训练
-首先下载好 [PP-ShiTuV2 数据集](../image_recognition_pipeline/feature_extraction.md#4-实验部分) 中的16个数据集并手动进行合并、生成标注文本文件 `train_reg_all_data_v2.txt`，最后放置到 `dataset` 目录下。
+首先下载好 [PP-ShiTuV2 数据集](../image_recognition_pipeline/feature_extraction.md#4-实验部分) 中的17个数据集并手动进行合并、生成标注文本文件 `train_reg_all_data_v2.txt`，最后放置到 `dataset` 目录下。
 合并后的文件夹结构如下所示：

--- a/docs/zh_CN/PPShiTu/PPShiTu_introduction.md
+++ b/docs/zh_CN/PPShiTu/PPShiTu_introduction.md
-TODO