update for en docs

3254addf · HydrogenSulfate · a56366a9 · 3254addf · 3254addf · 3254addf
6 changed file
--- a/README_ch.md
+++ b/README_ch.md
@@ -110,6 +110,7 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
  - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
  - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
  - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
+  - [图像分类FAQ第三季](docs/zh_CN/faq_series/faq_2022_s1.md)
 - [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
 - [许可证书](#许可证书)
 - [贡献代码](#贡献代码)
@@ -123,7 +124,7 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
 </div>
-PP-ShiTuV2是一个实用的轻量级通用图像识别系统，主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面，采用多种策略，对各个模块的模型进行优化，PP-ShiTuV2相比V1而已，Recall1提升10+个点。更多细节请参考[PP-ShiTuV2详细介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)。
+PP-ShiTuV2是一个实用的轻量级通用图像识别系统，主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化多个方面，采用多种策略，对各个模块的模型进行优化，PP-ShiTuV2相比V1，Recall1提升近8个点。更多细节请参考[PP-ShiTuV2详细介绍](./docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md)。
 <a name="识别效果展示"></a>

--- a/README_en.md
+++ b/README_en.md
@@ -7,20 +7,22 @@
 PaddleClas is an image classification and image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
 <div align="center">
-<img src="./docs/images/class_simple_en.gif"  width = "600" />
+<img src="./docs/images/recognition.gif"  width = "400" />
+<p>PP-ShiTuV2 demo images</p>
-PULC demo images
 </div>
-&nbsp;
 <div align="center">
-<img src="./docs/images/recognition.gif"  width = "400" />
+<img src="./docs/images/class_simple_en.gif"  width = "600" />
-PP-ShiTu demo images
+PULC demo images
 </div>
 **Recent updates**
+- 🔥️ Release [PP-ShiTuV2](./docs/en/PPShiTu/PPShiTuV2_introduction.md), recall1 is improved by nearly 8 points, covering 20+ recognition scenarios, with [index management tool](./deploy/shitu_index_manager/README.md) and [Android Demo](./docs/en/quick_start/quick_start_recognition_en.md) for better experience.
 - 2022.6.15 Release [**P**ractical **U**ltra **L**ight-weight image **C**lassification solutions](./docs/en/PULC/PULC_quickstart_en.md). PULC models inference within 3ms on CPU devices, with accuracy on par with SwinTransformer. We also release 9 practical classification models covering pedestrian, vehicle and OCR scenario.
 - 2022.4.21 Added the related [code](https://github.com/PaddlePaddle/PaddleClas/pull/1820/files) of the CVPR2022 oral paper [MixFormer](https://arxiv.org/pdf/2204.02557.pdf).
@@ -58,6 +60,18 @@ Quick experience of **P**ractical **U**ltra **L**ight-weight image **C**lassific
 - [Install Paddle](./docs/en/installation/install_paddle_en.md)
 - [Install PaddleClas Environment](./docs/en/installation/install_paddleclas_en.md)
+- [PP-ShiTuV2 Image Recognition Systems Introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md)
+  - [Submodule Introduction]
+    - [Mainbody Detection](./docs/en/image_recognition_pipeline/mainbody_detection.md)
+    - [Feature Extraction](./docs/en/image_recognition_pipeline/feature_extraction.md)
+    - [Vector Search](./docs/en/image_recognition_pipeline/vector_search.md)
+    - [Hash Encoding](docs/en/image_recognition_pipeline/deep_hashing.md)
+  - PipeLine 推理部署
+    - [Python Inference](docs/en/inference_deployment/python_deploy.md#2)
+    - [C++ Inference](deploy/cpp_shitu/readme.md)
+    - [Serving Deployment](docs/en/inference_deployment/recognition_serving_deploy.md)
+    - [Lite Deployment](docs/en/inference_deployment/lite_shitu.md)
+    - [Shitu Gallery Manager Tool](docs/en/inference_deployment/shitu_gallery_manager.md)
 - [Practical Ultra Light-weight image Classification solutions](./docs/en/PULC/PULC_train_en.md)
  - [PULC Quick Start](docs/en/PULC/PULC_quickstart_en.md)
  - [PULC Model Zoo](docs/en/PULC/PULC_model_list_en.md)
@@ -108,41 +122,55 @@ PULC models inference within 3ms on CPU devices, with accuracy comparable with S
 <img src="./docs/images/structure.jpg"  width = "800" />
 </div>
-Image recognition can be divided into three steps:
+PP-ShiTuV2 is a practical lightweight general image recognition system, which is mainly composed of three modules: mainbody detection model, feature extraction model and vector search tool. The system adopts a variety of strategies including backbone network, loss function, data augmentations, optimal hyperparameters, pre-training model, model pruning and quantization. Compared to V1, PP-ShiTuV2, Recall1 is improved by nearly 8 points. For more details, please refer to [PP-ShiTuV2 introduction](./docs/en/PPShiTu/PPShiTuV2_introduction.md).
- （1）Identify region proposal for target objects through a detection model；
- （2）Extract features for each region proposal;
- （3）Search features in the retrieval database and output results;
 For a new unknown category, there is no need to retrain the model, just prepare images of new category, extract features and update retrieval database and the category can be recognised.
-<a name="Clas_Demo_images"></a>
+<a name="Rec_Demo_images"></a>
-## PULC demo images
+## PP-ShiTuV2 Demo images
+- Drinks recognition
 <div align="center">
-<img src="docs/images/classification_en.gif">
+<img src="docs/images/drink_demo.gif">
 </div>
-<a name="Rec_Demo_images"></a>
-## Image Recognition Demo images [more](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.2/docs/images/recognition/more_demo_images)
 - Product recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769644-51604f80-d2d7-11eb-8290-c53b12a5c1f6.gif"  width = "400" />
 </div>
 - Cartoon character recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769746-6b019700-d2d7-11eb-86df-f1d710999ba6.gif"  width = "400" />
 </div>
 - Logo recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769837-7fde2a80-d2d7-11eb-9b69-04140e9d785f.gif"  width = "400" />
 </div>
 - Car recognition
 <div align="center">
 <img src="https://user-images.githubusercontent.com/18028216/122769916-8ec4dd00-d2d7-11eb-8c60-42d89e25030c.gif"  width = "400" />
 </div>
+<a name="Clas_Demo_images"></a>
+## PULC demo images
+<div align="center">
+<img src="docs/images/classification_en.gif">
+</div>
 <a name="License"></a>
 ## License
 PaddleClas is released under the Apache 2.0 license <a href="https://github.com/PaddlePaddle/PaddleCLS/blob/master/LICENSE">Apache 2.0 license</a>

--- a/docs/en/PPShiTu/PPShiTuV2_introduction.md
+++ b/docs/en/PPShiTu/PPShiTuV2_introduction.md
+## PP-ShiTuV2 Image Recognition System
+## Content
+- [PP-ShiTuV2 Introduction](#pp-shituv2-introduction)
+  - [Dataset](#dataset)
+  - [Model Training](#model-training)
+  - [Model Evaluation](#model-evaluation)
+  - [Model Inference](#model-inference)
+  - [Model Deployment](#model-deployment)
+- [Module introduction](#module-introduction)
+  - [Mainbody Detection](#mainbody-detection)
+  - [Feature Extraction](#feature-extraction)
+    - [Dataset](#dataset-1)
+    - [Backbone](#backbone)
+    - [Network Structure](#network-structure)
+    - [Data Augmentation](#data-augmentation)
+- [references](#references)
+## PP-ShiTuV2 Introduction
+PP-ShiTuV2 is a practical lightweight general image recognition system based on PP-ShiTuV1. Compared with PP-ShiTuV1, it has higher recognition accuracy, stronger generalization ability and similar inference speed<sup>*</sup >. The system is mainly optimized for training data set and feature extraction, with a better backbone, loss function and training strategy. The retrieval performance of PP-ShiTuV2 in multiple practical application scenarios is significantly improved.
+<div align="center">
+<img src="../../images/structure.jpg" />
+</div>
+### Dataset
+We remove some uncommon datasets add more common datasets in training stage. For more details, please refer to [PP-ShiTuV2 dataset](../image_recognition_pipeline/feature_extraction.md#4-实验部分).
+The following takes the dataset of [PP-ShiTuV2](../image_recognition_pipeline/feature_extraction.md#4-实验部分) as an example to introduce the training, evaluation and inference process of the PP-ShiTuV2 model.
+### Model Training
+Download the 17 datasets in [PP-ShiTuV2 dataset](../image_recognition_pipeline/feature_extraction.md#4-实验部分) and merge them manually, then generate the annotation text file `train_reg_all_data_v2.txt`, and finally place them in `dataset` directory.
+The merged 17 datasets structure is as follows:
+```python
+dataset/
+├── Aliproduct/ # Aliproduct dataset folder
+├── SOP/ # SOPt dataset folder
+├── ...
+├── Products-10k/ # Products-10k dataset folder
+├── ...
+└── train_reg_all_data_v2.txt # Annotation text file
+```
+The content of the generated `train_reg_all_data_v2.txt` is as follows:
+```log
+...
+Aliproduct/train/50029/1766228.jpg 50029
+Aliproduct/train/50029/1764348.jpg 50029
+...
+Products-10k/train/88823.jpg 186440
+Products-10k/train/88824.jpg 186440
+...
+```
+Then run the following command to train:
+```shell
+# Use GPU 0 for single-card training
+export CUDA_VISIBLE_DEVICES=0
+python3.7 tools/train.py \
+-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
+# Use 8 GPUs for distributed training
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python3.7 -m paddle.distributed.launch tools/train.py \
+-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml
+```
+**Note:** `eval_during_train` will be enabled by default during training. After each `eval_interval` epoch, the model will be evaluated on the data set specified by `Eval` in the configuration file (the default is Aliproduct) and calculated for reference. index.
+### Model Evaluation
+Reference [Model Evaluation](../image_recognition_pipeline/feature_extraction_en.md#43-model-evaluation)
+### Model Inference
+Refer to [Python Model Reasoning](../quick_start/quick_start_recognition.md#22-Image Recognition Experience) and [C++ Model Reasoning](../../../deploy/cpp_shitu/readme_en.md)
+### Model Deployment
+Reference [Model Deployment](../inference_deployment/recognition_serving_deploy_en.md#32-service-deployment-and-request)
+## Module introduction
+### Mainbody Detection
+The main body detection model uses `PicoDet-LCNet_x2_5`, for details refer to: [picodet_lcnet_x2_5_640_mainbody](../image_recognition_pipeline/mainbody_detection.md).
+### Feature Extraction
+#### Dataset
+On the basis of the training data set used in PP-ShiTuV1, we removed the iCartoonFace data set, and added more widely used data sets, such as bird400, Cars, Products-10k, fruits- 262.
+#### Backbone
+We replaced the backbone network from `PPLCNet_x2_5` to [`PPLCNetV2_base`](../models/PP-LCNetV2.md). Compared with `PPLCNet_x2_5`, `PPLCNetV2_base` basically maintains a higher classification accuracy and reduces the 40% of inference time <sup>*</sup>.
+**Note:** <sup>*</sup>The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.
+#### Network Structure
+We adjust the `PPLCNetV2_base` structure, and added more general and effective optimizations for retrieval tasks such as pedestrian re-detection, landmark retrieval, and face recognition. It mainly includes the following points:
+1. `PPLCNetV2_base` structure adjustment: The experiment found that [`ReLU`](../../../ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py#L322) at the end of the network has a great impact on the retrieval performance, [`FC`](../../../ppcls/arch/backbone/legendary_models/pp_lcnet_v2.py#L325) also causes a slight drop in retrieval performance, so we removed `ReLU` and `FC` at the end of BackBone.
+2. `last stride=1`: No downsampling is performed at last stage, so as to increase the semantic information of the final output feature map, without having much more computational cost.
+3. `BN Neck`: Add a `BatchNorm1D` layer after `BackBone` to normalize each dimension of the feature vector, bringing faster convergence.
+    | Model                                                              | training data      | recall@1%(mAP%) |
+    | :----------------------------------------------------------------- | :----------------- | :-------------- |
+    | GeneralRecognition_PPLCNet_x2_5                                                         | PP-ShiTuV1 dataset | 65.9(54.3)      |
+    | GeneralRecognitionV2_PPLCNetV2_base(TripletLoss) | PP-ShiTuV1 dataset | 72.3(60.5)      |
+4. `TripletAngularMarginLoss`: We improved on the original `TripletLoss` (difficult triplet loss), changed the optimization objective from L2 Euclidean space to cosine space, and added an additional space between anchor and positive/negtive The hard distance constraint makes the training and testing goals closer and improves the generalization ability of the model.
+    | Model | training data | recall@1%(mAP%) |
+    | :---- | :------------ |: -------------- |
+    | GeneralRecognitionV2_PPLCNetV2_base(TripletLoss) | PP-ShiTuV2 dataset | 71.9(60.2) |
+    | GeneralRecognitionV2_PPLCNetV2_base(TripletAngularMarginLoss) | PP-ShiTuV2 dataset | 73.7(61.0) |
+#### Data Augmentation
+The target object may rotate to a certain extent and may not maintain an upright state when the actual camera is shot, so we add [random rotation augmentation](../../../ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117) in the data augmentation to make retrieval more robust in real scenes.
+Combining the above strategies, the final experimental results on multiple data sets are as follows:
+  | Model      | product<sup>*</sup> |
+  | :--------- | :------------------ |
+  | -          | recall@1%(mAP%)     |
+  | GeneralRecognition_PPLCNet_x2_5 | 65.9(54.3)          |
+  | GeneralRecognitionV2_PPLCNetV2_base | 73.7(61.0)          |
+  | Models     | Aliproduct      | VeRI-Wild       | LogoDet-3k      | iCartoonFace    | SOP             | Inshop           |
+  | :--------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :--------------- |
+  | -          | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@ 1%(mAP%) |
+  | GeneralRecognition_PPLCNet_x2_5 | 83.9(83.2)      | 88.7(60.1)      | 86.1(73.6)      | 84.1(72.3)      | 79.7(58.6)      | 89.1(69.4)       |
+  | GeneralRecognitionV2_PPLCNetV2_base | 84.2(83.3)      | 87.8(68.8)      | 88.0(63.2)      | 53.6(27.5)      | 77.6(55.3)      | 90.8(74.3)       |
+  | model      | gldv2           | imdb_face       | iNat            | instre          | sketch          | sop<sup>*</sup>  |
+  | :--------- | :-------------- | :-------------- | :-------------- | :-------------- | :-------------- | :--------------- |
+  | -          | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@1%(mAP%) | recall@ 1%(mAP%) |
+  | GeneralRecognition_PPLCNet_x2_5 | 98.2(91.6)      | 28.8(8.42)      | 12.6(6.1)       | 72.0(50.4)      | 27.9(9.5)       | 97.6(90.3)       |
+  | GeneralRecognitionV2_PPLCNetV2_base | 98.1(90.5)      | 35.9(11.2)      | 38.6(23.9)      | 87.7(71.4)      | 39.3(15.6)      | 98.3(90.9)       |
+**Note:** The product dataset is made to verify the generalization performance of PP-ShiTu, and all the data are not present in the training and testing sets. The data contains 7 categories ( cosmetics, landmarks, wine, watches, cars, sports shoes, beverages) and 250 sub-categories. When testing, use the labels of 250 small classes for testing; the sop dataset comes from [GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval](https://arxiv.org/abs/2111.13122), which can be regarded as " SOP" dataset.
+## references
+1. Schall, Konstantin, et al. "GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval." International Conference on Multimedia Modeling. Springer, Cham, 2022.
+2. Luo, Hao, et al. "A strong baseline and batch normalization neck for deep person re-identification." IEEE Transactions on Multimedia 22.10 (2019): 2597-2609.
--- a/docs/en/image_recognition_pipeline/feature_extraction_en.md
+++ b/docs/en/image_recognition_pipeline/feature_extraction_en.md
--- a/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md
+++ b/docs/zh_CN/PPShiTu/PPShiTuV2_introduction.md
@@ -33,7 +33,7 @@ PP-ShiTuV2 是基于 PP-ShiTuV1 改进的一个实用轻量级通用图像识别
 ### 模型训练
-首先下载好 [PP-ShiTuV2 数据集](../image_recognition_pipeline/feature_extraction.md#4-实验部分) 中的16个数据集并手动进行合并、生成标注文本文件 `train_reg_all_data_v2.txt`，最后放置到 `dataset` 目录下。
+首先下载好 [PP-ShiTuV2 数据集](../image_recognition_pipeline/feature_extraction.md#4-实验部分) 中的17个数据集并手动进行合并、生成标注文本文件 `train_reg_all_data_v2.txt`，最后放置到 `dataset` 目录下。
 合并后的文件夹结构如下所示：

--- a/docs/zh_CN/PPShiTu/PPShiTu_introduction.md
+++ b/docs/zh_CN/PPShiTu/PPShiTu_introduction.md
-TODO