docs: fix link

307424b6 · gaotingquan · Tingquan Gao · f29b3bca · 307424b6 · 307424b6
9 changed file
--- a/docs/en/data_preparation/classification_dataset_en.md
+++ b/docs/en/data_preparation/classification_dataset_en.md
@@ -6,21 +6,21 @@ This document elaborates on the dataset format adopted by PaddleClas for image c

 ## Contents

- [Dataset Format](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#数据集格式说明)
- [Common Datasets for Image Classification](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#图像分类任务常见数据集介绍)
-  - [2.1 ImageNet1k](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#ImageNet1k)
-  - [2.2 Flowers102](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#Flowers102)
-  - [2.3 CIFAR10 / CIFAR100](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#CIFAR10/CIFAR100)
-  - [2.4 MNIST](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#MNIST)
-  - [2.5 NUS-WIDE](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/classification_dataset.md#NUS-WIDE)
+- [1. Dataset Format](#1)
+- [2. Common Datasets for Image Classification](#2)
+  - [2.1 ImageNet1k](#2.1)
+  - [2.2 Flowers102](#2.2)
+  - [2.3 CIFAR10 / CIFAR100](#2.3)
+  - [2.4 MNIST](#2.4)
+  - [2.5 NUS-WIDE](#2.5)

+<a name="1"></a>

-
-## 1 Dataset Format
+## 1. Dataset Format

 PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats:

-```
+```shell
 # Separate the image path and annotation with "space" for each line

 # train_list.txt has the following format
@@ -32,12 +32,14 @@ val/ILSVRC2012_val_00000001.JPEG 65
 ...
 ```

+<a name="2"></a>

-
-## 2 Common Datasets for Image Classification
+## 2. Common Datasets for Image Classification

 Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement.

+<a name="2.1"></a>
+
 ### 2.1 ImageNet1k

 [ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained.
@@ -67,7 +69,7 @@ PaddleClas/dataset/ILSVRC2012/
 |_ val_list.txt
 ```

-
+<a name="2.2"></a>

 ### 2.2 Flowers102

@@ -104,7 +106,7 @@ PaddleClas/dataset/flowers102/
 |_ val_list.txt
 ```

-
+<a name="2.3"></a>

 ### 2.3 CIFAR10 / CIFAR100

@@ -112,7 +114,7 @@ The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 imag

 Website：http://www.cs.toronto.edu/~kriz/cifar.html

-
+<a name="2.4"></a>

 ### 2.4 MNIST

@@ -120,7 +122,7 @@ MMNIST is a renowned dataset for handwritten digit recognition and is used as an

 Website：http://yann.lecun.com/exdb/mnist/

-
+<a name="2.5"></a>

 ### 2.5 NUS-WIDE


--- a/docs/en/data_preparation/recognition_dataset_en.md
+++ b/docs/en/data_preparation/recognition_dataset_en.md
@@ -6,18 +6,18 @@ This document elaborates on the dataset format adopted by PaddleClas for image r

 ## Contents

- [Dataset Format](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#数据集格式说明)
- [Common Datasets for Image Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#图像识别任务常见数据集介绍)
-  - [2.1 General Datasets](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#通用图像识别数据集)
-  - [2.2 Vertical Datasets](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#垂类图像识别数据集)
-    - [2.2.1 Animation Character Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#动漫人物识别)
-    - [2.2.2 Product Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#商品识别)
-    - [2.2.3 Logo Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#Logo识别)
-    - [2.2.4 Vehicle Recognition](https://github.com/paddlepaddle/paddleclas/blob/release%2F2.3/docs/zh_CN/data_preparation/recognition_dataset.md#车辆识别)
+- [1. Dataset Format](#1)
+- [2. Common Datasets for Image Recognition](#2)
+  - [2.1 General Datasets](#2.1)
+  - [2.2 Vertical Datasets](#2.2)
+    - [2.2.1 Animation Character Recognition](#2.2.1)
+    - [2.2.2 Product Recognition](#2.2.2)
+    - [2.2.3 Logo Recognition](#2.2.3)
+    - [2.2.4 Vehicle Recognition](#2.2.4)

+<a name="1"></a>

-
-## 1 Dataset Format
+## 1. Dataset Format

 The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts:

@@ -27,7 +27,7 @@ The dataset for the vector search, unlike those for classification tasks, is div

 The above three datasets all adopt  `txt` files for assignment. Taking the `CUB_200_2011` dataset as an example, the `train_list.txt` of the train dataset has the following format：

-```
+```shell
 # Use "space" as the separator
 ...
 train/99/Ovenbird_0136_92859.jpg 99 2
@@ -38,7 +38,7 @@ train/99/Ovenbird_0128_93366.jpg 99 6

 The `test_list.txt` of the query dataset (both gallery dataset and query dataset in`CUB_200_2011`) has the following format：

-```
+```shell
 # Use "space" as the separator
 ...
 test/200/Common_Yellowthroat_0126_190407.jpg 200 1
@@ -55,12 +55,14 @@ Each row of data is separated by "space", and the three columns of data stand fo

 2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`.

-
+<a name="2"></a>

 ## 2. Common Datasets for Image Recognition

 Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement.

+<a name="2.1"></a>
+
 ### 2.1 General Datasets

 - SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set.
@@ -77,11 +79,11 @@ Here we present a compilation of commonly used image recognition datasets, which

  Website： http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html

-
+<a name="2.2"></a>

 ### 2.2 Vertical Datasets

-
+<a name="2.2.1"></a>

 #### 2.2.1 Animation Character Recognition

@@ -97,7 +99,7 @@ Here we present a compilation of commonly used image recognition datasets, which

  Website： http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces

-
+<a name="2.2.2"></a>

 #### 2.2.2 Product Recognition

@@ -111,7 +113,7 @@ Here we present a compilation of commonly used image recognition datasets, which

 - DeepFashion-Inshop: The same as the common datasets In-shop Clothes.

-
+<a name="2.2.3"></a>

 ### 2.2.3 Logo Recognition

@@ -123,6 +125,8 @@ Here we present a compilation of commonly used image recognition datasets, which

  Website： https://cg.cs.tsinghua.edu.cn/traffic-sign/

+<a name="2.2.4"></a>
+
 ### 2.2.4 Vehicle Recognition

 - CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images.

--- a/docs/en/faq_series/faq_2020_s1_en.md
+++ b/docs/en/faq_series/faq_2020_s1_en.md
--- a/docs/en/faq_series/faq_2021_s1_en.md
+++ b/docs/en/faq_series/faq_2021_s1_en.md
--- a/docs/en/faq_series/faq_selected_30_en.md
+++ b/docs/en/faq_series/faq_selected_30_en.md
@@ -5,21 +5,19 @@
 - We collect some frequently asked questions in issues and user groups since PaddleClas is open-sourced and provide brief answers, aiming to give some reference for the majority to save you from twists and turns.
 - There are many talents in the field of image classification, recognition and retrieval with quickly updated models and papers, and the answers here mainly rely on our limited project practice, so it is not possible to cover all facets. We sincerely hope that the man of insight will help to supplement and correct the content, thanks a lot.

-## PaddleClas FAQ Summary
-
- [1. 30 Questions About Image Classification](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md#1)
-  - [1.1 Basic Knowledge](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md#1.1)
-  - [1.2 Model Training](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md#1.2)
-  - [1.3 Data](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md#1.3)
-  - [1.4 Model Inference and Prediction](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md#1.4)
- [2. Application of PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md#2)
-
+## Contents

+- [1. 30 Questions About Image Classification](#1)
+  - [1.1 Basic Knowledge](#1.1)
+  - [1.2 Model Training](#1.2)
+  - [1.3 Data](#1.3)
+  - [1.4 Model Inference and Prediction](#1.4)
+- [2. Application of PaddleClas](#2)

+<a name="1"></a>
 ## 1. 30 Questions About Image Classification

-
-
+<a name="1.1"></a>
 ### 1.1 Basic Knowledge

 - Q: How many classification metrics are commonly used in the field of image classification?
@@ -30,7 +28,7 @@
 > >

 - Q: 怎样根据自己的任务选择合适的模型进行训练？How to choose the right training model?
- A: If you want to deploy on the server with a high requirement for accuracy but not model storage size or prediction speed, then it is recommended to use ResNet_vd, Res2Net_vd, DenseNet, Xception, etc., which are suitable for server-side models. If you want to deploy on the mobile side, then it is recommended to use MobileNetV3 and GhostNet. Meanwhile, we suggest you refer to the speed-accuracy metrics chart in [Model Library](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/models_intro.md) when choosing models.
+- A: If you want to deploy on the server with a high requirement for accuracy but not model storage size or prediction speed, then it is recommended to use ResNet_vd, Res2Net_vd, DenseNet, Xception, etc., which are suitable for server-side models. If you want to deploy on the mobile side, then it is recommended to use MobileNetV3 and GhostNet. Meanwhile, we suggest you refer to the speed-accuracy metrics chart in [Model Library](../models/models_intro_en.md) when choosing models.

 > >

@@ -56,7 +54,7 @@
 - A: The Attention Mechanism (AM) originated from the study of human vision. Using the mechanism on computer vision tasks can effectively capture the useful regions in the images and thus improve the overall network performance. Currently, the most commonly used ones are [SE block](https://arxiv.org/abs/1709.01507), [SK-block](https://arxiv.org/abs/1903.06586), [Non-local block](https://arxiv. org/abs/1711.07971), [GC block](https://arxiv.org/abs/1904.11492), [CBAM](https://arxiv.org/abs/1807.06521), etc. The core idea is to learn the importance of feature maps in different regions or different channels, so that the network can pay more attention to the regions of salience.


-
+<a name="1.2"></a>
 ### 1.2 Model Training

 > >
@@ -67,7 +65,7 @@
 > >

 - Q: What are the possible reasons if the model converges poorly during the training process?
- A: There are several points that can be investigated: (1) The data annotation should be checked to ensure that there are no problems with the labeling of the training and validation sets. (2) Try to adjust the learning rate (initially by a factor of 10). A learning rate that is too large (training oscillation) or too small (slow convergence) may lead to poor convergence. (3) Huge amount of data and an overly small model may prevent it from learning all the features of the data. (4) See if normalization is used in the data preprocessing process. It may be slower without normalization operation. (5) If the amount of data is relatively small, you can try to load the pre-trained model based on ImageNet-1k dataset provided in PaddleClas, which can greatly improve the training convergence speed. (6) There is a long tail problem in the dataset, you can refer to the [solution to the long tail problem of data](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/faq_series/faq_selected_30.md #long_tail).
+- A: There are several points that can be investigated: (1) The data annotation should be checked to ensure that there are no problems with the labeling of the training and validation sets. (2) Try to adjust the learning rate (initially by a factor of 10). A learning rate that is too large (training oscillation) or too small (slow convergence) may lead to poor convergence. (3) Huge amount of data and an overly small model may prevent it from learning all the features of the data. (4) See if normalization is used in the data preprocessing process. It may be slower without normalization operation. (5) If the amount of data is relatively small, you can try to load the pre-trained model based on ImageNet-1k dataset provided in PaddleClas, which can greatly improve the training convergence speed. (6) There is a long tail problem in the dataset, you can refer to the [solution to the long tail problem of data](#long_tail).

 > >

@@ -75,7 +73,9 @@
 - A: Since the emergence of deep learning, there has been a lot of research on optimizers, which aim to minimize the loss function to find the right weights for a given task. Currently, the main optimizers used in the industry are SGD, RMSProp, Adam, AdaDelt, etc. Among them, since the SGD optimizer with momentum is widely used in academia and industry (only for classification tasks), most of the models we published also adopt this optimizer to achieve gradient descent of the loss function. It has two disadvantages, one is the slow convergence speed, and the other is the reliance on experiences of the initial learning rate setting. However, if the initial learning rate is set properly with a sufficient number of iterations, the optimizer will also stand out among many other optimizers, obtaining higher accuracy on the validation set. Some optimizers with adaptive learning rates, such as Adam and RMSProp, tend to converge fast, but the final convergence accuracy will be slightly worse. If you pursue faster convergence speed, we recommend using these adaptive learning rate optimizers, and SGD optimizers with momentum for higher convergence accuracy.

 - Q: What are the current mainstream learning rate decay strategies? How to choose?
- A: The learning rate is the speed at which the hyperparameters of the network weights are adjusted by the gradient of the loss function. The lower the learning rate, the slower the loss function will change. While using a low learning rate ensures that no local minimal values are missed, it also means that it takes longer to converge, especially if trapped in a plateau region. Throughout the whole training process, we cannot adopt the same learning rate to update the weights, otherwise, the optimal point cannot be reached, So we need to adjust the learning rate during the training. In the initial stage of training, since the weights are in a random initialization state and the loss function decreases fast, a larger learning rate can be set. And in the later stage of training, since the weights are close to the optimal value, a larger learning rate cannot further find the optimal value, so a smaller learning rate needs is a better choice. As for the learning rate decay strategy, many researchers or practitioners use piecewise_decay (step_decay), which is a stepwise decay learning rate. In addition, there are also other methods proposed by researchers, such as polynomial_decay, exponential_ decay, cosine_decay, etc. Among them, cosine_decay requires no adjustment of hyperparameters and has higher robustness, thus emerging as the preferred learning rate decay method to improve model accuracy. The learning rates of cosine_decay and piecewise_decay are shown in the following figure. It is easy to observe that cosine_decay keeps a large learning rate throughout the training, so it is slow in convergence, but its final effect is better than peicewise_decay.[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/models/lr_decay.jpeg)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/models/lr_decay.jpeg)
+- A: The learning rate is the speed at which the hyperparameters of the network weights are adjusted by the gradient of the loss function. The lower the learning rate, the slower the loss function will change. While using a low learning rate ensures that no local minimal values are missed, it also means that it takes longer to converge, especially if trapped in a plateau region. Throughout the whole training process, we cannot adopt the same learning rate to update the weights, otherwise, the optimal point cannot be reached, So we need to adjust the learning rate during the training. In the initial stage of training, since the weights are in a random initialization state and the loss function decreases fast, a larger learning rate can be set. And in the later stage of training, since the weights are close to the optimal value, a larger learning rate cannot further find the optimal value, so a smaller learning rate needs is a better choice. As for the learning rate decay strategy, many researchers or practitioners use piecewise_decay (step_decay), which is a stepwise decay learning rate. In addition, there are also other methods proposed by researchers, such as polynomial_decay, exponential_ decay, cosine_decay, etc. Among them, cosine_decay requires no adjustment of hyperparameters and has higher robustness, thus emerging as the preferred learning rate decay method to improve model accuracy. The learning rates of cosine_decay and piecewise_decay are shown in the following figure. It is easy to observe that cosine_decay keeps a large learning rate throughout the training, so it is slow in convergence, but its final effect is better than peicewise_decay.
+
+![](../../images/models/lr_decay.jpeg)

 > >

@@ -119,6 +119,7 @@
 - Q: How to improve the accuracy of my own dataset by pre-training the model?
 - A: At this stage, it has become a common practice in the image recognition field to load pre-trained models to train their own tasks, which can often improve the accuracy of a particular task compared to training from random initialization. In general, the pre-training model widely used in the industry is obtained by training the ImageNet-1k dataset of 1.28 million images of 1000 classes. The fc layer weights of this pre-training model are a matrix of k*1000, where k is the number of neurons before the fc layer, and it is not necessary to load the fc layer weights when loading the pre-training weights. In terms of the learning rate, if your dataset is particularly small (e.g., less than 1,000), we recommend you to adopt a small initial learning rate, e.g., 0.001 (batch_size:256, the same below), so as not to corrupt the pre-training weights with a larger learning rate. If your training dataset is relatively large (>100,000), we suggest you try a larger initial learning rate, such as 0.01 or above.

+<a name="1.3"></a>
 ### 1.3 Data

 > >
@@ -139,7 +140,7 @@
 > >

 - Q: What are the common data augmentation methods currently available to increase the richness of training samples when the amount of data is insufficient?
- A: PaddleClas classifies data augmentation methods into three categories, which are image transformation, image cropping and image aliasing. Image transformation mainly includes AutoAugment and RandAugment, image cropping contains CutOut, RandErasing, HideAndSeek and GridMask, and image aliasing comprises Mixup and Cutmix. More detailed introduction to data augmentation can be found in the chapter of [Data Augmentation ](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/ algorithm_introduction/DataAugmentation.md).
+- A: PaddleClas classifies data augmentation methods into three categories, which are image transformation, image cropping and image aliasing. Image transformation mainly includes AutoAugment and RandAugment, image cropping contains CutOut, RandErasing, HideAndSeek and GridMask, and image aliasing comprises Mixup and Cutmix. More detailed introduction to data augmentation can be found in the chapter of [Data Augmentation ](../algorithm_introduction/DataAugmentation_en.md).

 > >

@@ -164,12 +165,12 @@


 > >
-
+<a name="long_tail"></a>
 - Q: What are the common methods currently used for datasets with long-tailed distributions?
 - A:(1) the categories with fewer data can be resampled to increase the probability of their occurrence; (2) the loss can be modified to increase the loss weight of images in categories corresponding to fewer images; (3) the method of transfer learning can be borrowed to learn generic knowledge from common categories and then migrate to the categories with fewer samples.


-
+<a name="1.4"></a>
 ### 1.4 Model Inference and Prediction

 > >
@@ -198,7 +199,7 @@
 - A: (1) Using a GPU with better performance; (2) increasing the batch size; (3) using TenorRT and FP16 half-precision floating-point methods.


-
+<a name="2"></a>
 ## 2. Application of PaddleClas

 > >

--- a/docs/en/introduction/function_intro_en.md
+++ b/docs/en/introduction/function_intro_en.md
@@ -8,6 +8,6 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 - SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
 - Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.

-[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/recognition.gif)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/recognition.gif)
+![](../../images/recognition.gif)

-For more information about the quick start of image recognition, algorithm details, model training and evaluation, and prediction and deployment methods, please refer to the [README Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/README_ch.md) on home page.
+For more information about the quick start of image recognition, algorithm details, model training and evaluation, and prediction and deployment methods, please refer to the [README Tutorial](../../../README_ch.md) on home page.
--- a/docs/en/others/feature_visiualization_en.md
+++ b/docs/en/others/feature_visiualization_en.md
@@ -4,26 +4,32 @@

 ## Contents

- [1. Overview](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/feature_visiualization.md#1)
- [2. Prepare Work](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/feature_visiualization.md#2)
- [3. Model Modification](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/feature_visiualization.md#3)
- [4. Results](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/feature_visiualization.md#4)
+- [1. Overview](#1)
+- [2. Prepare Work](#2)
+- [3. Model Modification](#3)
+- [4. Results](#4)



+<a name='1'></a>
+
 ## 1. Overview

 The feature graph is the feature representation of the input image in the convolutional network, and the study of which can be beneficial to our understanding and design of the model. Therefore, we employ this tool to visualize the feature graph based on the dynamic graph.

+<a name='2'></a>
+
 ## 2. Prepare Work

-The first step is to select the model to be studied, here we choose ResNet50. Copy the model networking code [resnet.py](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/arch/backbone/) to [directory](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/utils/feature_maps_ visualization) and download the [ResNet50 pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) or follow the command below.
+The first step is to select the model to be studied, here we choose ResNet50. Copy the model networking code [resnet.py](../../../ppcls/arch/backbone/legendary_models/resnet.py) to [directory](../../../ppcls/utils/feature_maps_visualization/) and download the [ResNet50 pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) or follow the command below.

-```
+```bash
 wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams
 ```

-For other pre-training models and codes of network structure, please download [model library](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/arch/backbone) and [pre-training models](https:// github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/models/models_intro.md).
+For other pre-training models and codes of network structure, please download [model library](../../../ppcls/arch/backbone/) and [pre-training models](../models/models_intro_en.md).
+
+<a name='3'></a>

 ## 3. Model Modification

@@ -31,7 +37,7 @@ Having found the location of the needed feature graph, set self.fm to fetch it o

 Specify the feature graph to be visualized in the forward function of ResNet50

-```
+```python
    def forward(self, x):
        with paddle.static.amp.fp16_guard():
            if self.data_format == "NHWC":
@@ -47,7 +53,7 @@ Specify the feature graph to be visualized in the forward function of ResNet50
        return x, fm
 ```

-Then modify the code [fm_vis.py](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/ppcls/utils/feature_maps_visualization/fm_vis.py) to import `ResNet50`，instantiating the  `net` object:
+Then modify the code [fm_vis.py](../../../ppcls/utils/feature_maps_visualization/fm_vis.py) to import `ResNet50`，instantiating the  `net` object:

 ```
 from resnet import ResNet50
@@ -75,13 +81,13 @@ Parameters：
 - `--save_path`: save path, such as `./tools/`
 - `--use_gpu`: whether to enable GPU inference, default value: True

-
+<a name='4'></a>

 ## 4. Results

 - Import the Image：

-[![img](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.3/docs/images/feature_maps/feature_visualization_input.jpg)](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/images/feature_maps/feature_visualization_input.jpg)
+![](../../images/feature_maps/feature_visualization_input.jpg)

 - Run the following script of feature graph visualization

@@ -97,3 +103,5 @@ python tools/feature_maps_visualization/fm_vis.py \
 ```

 - Save the output feature graph as `output.png`, as shown below.
+
+![](../../images/feature_maps/feature_visualization_output.jpg)
--- a/docs/en/others/train_on_xpu_en.md
+++ b/docs/en/others/train_on_xpu_en.md
@@ -4,26 +4,26 @@

 ## Contents

- [1. Foreword](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/train_on_xpu.md#1)
- [2. Training of Kunlun](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/train_on_xpu.md#2)
-  - [2.1 ResNet50](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/train_on_xpu.md#2.1)
-  - [2.2 MobileNetV3](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/train_on_xpu.md#2.2)
-  - [2.3 HRNet](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/train_on_xpu.md#2.3)
-  - [2.4 VGG16/19](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/train_on_xpu.md#2.4)
-
+- [1. Foreword](#1)
+- [2. Training of Kunlun](#2)
+  - [2.1 ResNet50](#2.1)
+  - [2.2 MobileNetV3](#2.2)
+  - [2.3 HRNet](#2.3)
+  - [2.4 VGG16/19](#2.4)

+<a name='1'></a>

 ## 1. Forword

- This document describes the models currently supported by Kunlun and how to train these models on Kunlun devices. To install PaddlePaddle that supports Kunlun, please refer to install_kunlun(https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/paddle/install/install_Kunlun_zh.md)
-
+- This document describes the models currently supported by Kunlun and how to train these models on Kunlun devices. To install PaddlePaddle that supports Kunlun, please refer to [install_kunlun](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/09_hardware_support/xpu_docs/paddle_install_cn.html)

+<a name='2'></a>

 ## 2. Training of Kunlun

- See [quick_start](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.3/docs/zh_CN/quick_start/quick_start_ classification_new_user.md) for data sources and pre-trained models. The training effect of Kunlun is aligned with CPU/GPU.
-
+- See [quick_start](../quick_start/quick_start_classification_new_user_en.md)for data sources and pre-trained models. The training effect of Kunlun is aligned with CPU/GPU.

+<a name='2.1'></a>

 ### 2.1 ResNet50

@@ -39,7 +39,7 @@ python3.7 ppcls/static/train.py \

 The difference with cpu/gpu training lies in the addition of -o use_xpu=True, indicating that the execution is on a Kunlun device.

-
+<a name='2.2'></a>

 ### 2.2 MobileNetV3

@@ -53,7 +53,7 @@ python3.7 ppcls/static/train.py \
    -o is_distributed=False
 ```

-
+<a name='2.3'></a>

 ### 2.3 HRNet

@@ -67,7 +67,7 @@ python3.7 ppcls/static/train.py \
    -o use_gpu=False
 ```

-
+<a name='2.4'></a>

 ### 2.4 VGG16/19


--- a/docs/en/others/versions_en.md
+++ b/docs/en/others/versions_en.md
@@ -4,10 +4,10 @@

 ## Contents

- [1. v2.3](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/versions.md#1)
- [2. v2.2](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.3/docs/zh_CN/others/versions.md#2)
-
+- [1. v2.3](#1)
+- [2. v2.2](#2)

+<a name='1'></a>

 ## 1. v2.3

@@ -30,7 +30,7 @@
  - PaddleSlim: 2.2.0
  - PaddleServing: 0.6.1

-
+<a name='2'></a>

 ## 2. v2.2